Is your algorithm running at peak performance? The roofline model

There are worlds where performance matters. For example HPC world: faster software means less wait time for the scientists. Embedded world: faster software means we can use cheaper silicon to build our product. Game world: faster software means that our game will run on slower CPUs, thus making our game more interesting to people with low-end hardware. When we are investigating a performance-critical piece of code, we might ask ourselves: is it possible to go faster?

In this post we will talk about exactly that: how to determine if our algorithm is running at peak performance. For this, we will introduce the roofline model, a theoretical model that allows us to understand how loops in our program consume CPU’s computational resources and memory bandwidth.

The roofline model

When it comes to peak software performance, there are theoretical limits on the performance that depend on the hardware. Some programs use the provided resources optimally, others don’t. To figure out if we are running at peak performance, let us introduce the roofline model.

The best way to introduce the model is to start with parts that make it. For this reason, here are two example codes:

float calculate_sum(float a[], int N) {
    float sum = 0;
    for (int i = 0; i < N; i++) {
        sum += a[i];
    }
    return sum;
}

void calculate_distance(float x[], float y[], int N, float distance[]) {
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            float x_dist = x[i]- x[j];
            float y_dist = y[i] - y[j];
            distance[j + i * N] = sqrt(x_dist*x_dist + y_dist*y_dist);
        }
    }
}

The loop in calculate_sum runs through the array a and sums all the elements of the array. The loop in calculate_distance runs through a list of 2D points and calculates the distance between each pair of them.

In terms of number of operations, the calculate_sum loop, for each iteration of the loop, loads one float from the memory and performs one floating-point addition operation. The calculate_distance loop loads four floats and stores one float, and also performs thirteen floating-point operations (two subtractions, two additions, three multiplications, and a square root which counts as six floating-point operations).

The ratio between the number of executed operations and the number of bytes transferred between the CPU and the memory is called arithmetic intensity. Assuming the size of a float is 4 bytes, in the case of calculate_sum loop, we have a computational intensity of 0.25. In the case of calculate_distance loop, we have a computational intensity of 13 / (5 * 4) = 0.65.

Smaller arithmetic intensity means a larger pressure on the memory subsystem, and conversely, larger arithmetic intensity means a larger pressure on the CPUs computational resources. There is no exact point where this transition happens; it will depend on the machine you are running your code. In our example, the function calculate_sum puts higher pressure on the memory subsystem, and the function calculate_distance puts higher pressure on the computational resources of the CPU. But how does that relate to the peak performance?

This is where the roofline model steps in. For a loop with a fixed arithmetic intensity there is an upper limit on the number of floating-point operations per second (FLOPS). This is conveniently represented as a two-dimensional graph:

The X-axis represents the arithmetic intensity in FLOP/byte, and the Y-axis represents the number of floating-point operations per second. The performance of a loop is represented as a dot in the diagram: the X-axis represents the loop’s arithmetic intensity (which can be calculated easily, as shown earlier) and Y-axis represents the number of floating-point operations per second (abbreviated with FLOPS)¹.

There are two rooflines visible in the graph: the memory bandwidth roofline which comes from the limited memory bandwidth between the memory and the CPU, alternatively, the computation roofline comes from the limited amount of computational resources in the CPU.

The closer the point representing the loop is to the roofline, the better it is at utilizing the available hardware resources. For loops with a low arithmetic intensity, the limit is the memory bandwidth roofline, for the loops with a high arithmetic intensity, the limit is determined by CPU’s computation roofline.

Your loop is reaching its peak performance if the dot representing it is close to the roofline. Ideally you want to come close to this limit line as much as possible. But beware! While you are tuning your loop you will certainly experience that after some modification you get the better FLOPS metrics, yet the loop is slower. This is because the FLOPS metric represents speed. Your loop might become indeed faster, but if your modification has increased the amount of work the loop has to do, the end result will be a loop that executes in more time. For example, a binary search algorithm will have worse speed in FLOPS metrics than a linear search, however, binary search is faster because it executes much less instructions than linear search.

The rooflines in the roofline model

When we presented the roofline model, we said there are two rooflines in the model, the memory bandwidth roofline and the computation roofline. However, the story is more complicated than that.

Modern CPUs have a complicated memory hierarchy consisting of several levels of data caches (L1, L2, L3) and DRAM. Each of these memories has its own maximum bandwidth: L1 cache has the highest bandwidth, followed by L2 and L3 cache, followed by DRAM. Each of this memories has its own memory bandwidth roofline.

The computation roofline is also not fixed. There is one roofline for scalar code, another for scalar code with fused multiply-add, yet another for vectorized code, and another if you are running your algorithm on several CPU cores.

Here is the roofline for our two loops from the previous example. The source code is available in our repository:

There are different rooflines for different types of memory: DRAM memory has the lowest roofline, followed by L3, L2 and L1 caches. Any effort to increase data cache locality will result in a higher FLOPS metric, i.e. the point will move upward and you should expect the increase in speed.

There are also different rooflines when it comes to computations: scalar add peak, single precision vector peak and double precision vector peak. These are related to the type of instructions the compiler generated: scalar instructions, vector instructions with doubles or vector instructions with floats. In case you are using a multicore system, your roofline model will have additional memory and computation rooflines to reflect the additional hardware resources.

In the above diagram, the yellow dot represents the measurement for the loop in calculate_sum, the red dot represents the loop in calculate_distance. As you can see both loops are close to the DRAM bandwidth roofline for their arithmetic intensity, and since the input array is 100 million floats, both loops are using the resources very efficiently.

Plotting the roofline diagram

When it comes to plotting the roofline diagram, there are two approaches: manual and automatic.

The manual approach consists of calculating the arithmetic intensity manually, running a set of prepared benchmarks to determine the memory bandwidth roofline and computation roofline and finally, measuring the FLOPS for the loop of interest and plotting it on the diagram. Arithmetic intensity is calculated manually and measurements are made using LIKWID Performance Tools that allows you to perform the necessary measurements. You can find information about manual roofline plotting in LIKWID’s documentation.

Subscribe to our newsletter

and receive in-depth technical articles, white papers, videos, webinars, product announcements, and more.

For automatic measuring and plotting there is Intel’s Advisor, a graphical user interface that is simple to use. It will perform all the steps: arithmetic intensity calculation, measurements, limit calculation and create a very nice diagram that contains a dot for each significant loop in your program. Loops that take a lot of time will be bigger and in brighter colors. But, alas, it only works on Intel’s CPU!

Intel’s Advisor takes some time to master, if you are interested in a tutorial on how to use it to plot a roofline diagram, check out here.

Performance improvement tips

In case your loop is not reaching its full potential, a question is what kind of source code optimization can you do to speed it up. The answer to that question depends on the arithmetic intensity.

Memory roofline

If the arithmetic intensity of the loop is low, the loop is memory bound. Typically, techniques used to improve memory access pattern and data locality will result in better usage of available memory bandwidth and speed-ups. Some of these are:

Loop interchange: an optimization where we interchange two nested loops to improve the memory access pattern. A very nice demonstration of that technique is presented in our post about the improvements we did in the Canny edge detector algorithm.
Loop tiling: a technique that, although it increases the amount of computations that needs to be done, also increases the data locality.
Class/struct splitting: in case your class or struct has members you are not using in your performance critical loop, moving those to another class/struct will increase the data locality and the performance
Moving from arrays-of-structs to struct-of arrays: this is an extreme of the above recommendation of class/struct splitting. We get rid of the array of struct altogether; each of the struct’s members will be stored in a separate array. This guarantees the best data cache usage and allows the compiler to efficiently vectorize our code.
Use cache-friendly data structures. For instance, instead of AVL trees, consider using red-black trees; instead of regular hash maps, consider using open addressing hash maps.

This is not an exclusive list. There are many other ways to increase data locality, some of them are really clever. For example, if you have two loops running over the same set of data, where the first one is arithmetically intensive and the other one is not, fusing the loops might give you better runtime altogether because of the more balanced use of resources.

Computation roofline

As the arithmetic intensity grows, fetching data from memory becomes less and less of a bottleneck. Instead, the computational resources become the limit. Common techniques to increase the computation roofline include:

Break instruction dependency chain: if the current iteration of the loop depends on the previous iteration, this can severely limit the available instruction level parallelism ² (abbreviated ILP). Every effort in breaking the dependency chain will lead to better usage of ILP, and allow for other techniques to increase the performance of your loop.
Techniques that balance the use of CPU’s resources: doing three load instructions, followed by three add instructions, followed by three sqrt instructions uses the CPU’s resources in an unbalanced way. Any approach to move to a more balanced usage of different CPU’s resources will help the GFLOPS metric. For example, compilers often use a technique called software pipelining, where they unroll a loop several times and shuffle the instructions in order to use the CPU’s resources in a more balanced way.
Enable auto-vectorization: the CPU contains special registers and circuitry to manipulate vectors of data. Compilers will use these instructions when possible, but in many occasions the compiler will fail to vectorize a loop. In that case it is worth looking at the compiler vectorization report to see what loops are not vectorized and try to fix any problems. Intel has a very good vectorization instruction manual; also, information on autovectorization with CLANG can be found in Denis Bakhvalov’s Performance Analysis book ³.
Explicit vectorization: Sometimes, a particular loop cannot be expressed in such a way that auto-vectorization is possible, yet it can profit from vectorization. In that case, developers can use explicit vectorization, either by using SIMD intrinsics, vectorization libraries (Agner’s VCL library, EVE or std::experimental::simd), compiler pragmas (OpenMP SIMD pragmas or compiler specific pragmas) or languages with explicit vectorization, such as ISPC. Our tool Codee allows the developers to automatically add vectorization pragmas to their source code.
Distribute the loop to multiple CPU cores: nowadays most CPUs, including embedded CPUs, have more than one core. Distributing the loop will lead to greater FLOPS and greater speed, albeit with increased power consumption. Standard choice for this method is OpenMP, a set of compiler pragmas that allows developers to simply distribute a loop to multiple CPU cores. Our Codee allows the developers to automate this task by detecting parallelization opportunities and emitting OpenMP pragmas. Alternatively, if more control is needed, the developers have a choice of POSIX C library for multithreading, Intel’s TBB or std::thread in C++.

Our loop is optimal, is that all?

In case your loop is close to its full potential, you might be asking yourself: is that all? The roofline model shows if you are close to the hardware limits, but it doesn’t count for unnecessary spent cycles, so it can be misleading. Take for example linear search versus binary search. Linear search can use hardware resources very efficiently, but binary search will be much faster because the number of executed instructions is much much lower.

Another example is strength reduction, a technique where we replace slow instructions with faster ones. For example, replacing division by power of two with shifting or using faster but less precise instructions will not be suggested by the roofline model.

Sometimes the loop does unnecessary computations. The roofline model will not tell you if you are running computation needlessly, as it will report an optimal resource usage. It is for the developer to pay attention to the details.

Conclusion

The roofline model is very good when you need to determine if your loop is running in its full potential. It allows you to get a quick overview of loops that have the most optimization potential. The model is one of the tools in the performance developer’s toolbox, along others, that will help them write fast and efficient code.

In the upcoming post we will go into details with different versions of the Canny algorithm we optimized in posts 1 and post 2 and see how changes we made reflect in the roofline model.

Resources

Case Study: How we made the Canny edge detector algorithm run faster? (part 1), where we optimized the memory access pattern of critical loops in order to get better speed.
Case Study: How we made the Canny edge detector algorithm run faster? (part 2), where we used OpenMP compiler directives to distribute the processing to several CPU cores.
LIKWID Performance Tools, a set of tools used for manually plotting the roofline model
Tutorial: Empirical Roofline Model using LIKWID
Intel® Advisor Tutorial for Using the Automated Roofline Chart to Make Optimization Decisions
Denis Bakhvalov, Performance Analysis and Tuning on Modern CPUs, chapter 8.2.3.1 “Compiler Autovectorization”
A Guide to Vectorization with Intel® C++ Compilers
Agner’s VCL Library, library used for explicit vectorization
EVE – the Expressive Vector Engine, another C++ library used for explicit vectorization
std::experimental::simd, part of C++ STL used for explicit vectorization
Effective Vectorization with OpenMP 4.5
Intel® Implicit SPMD Program Compiler, a compiler for ISPC language which supports explicit vectorization natively

1 In this article we are dealing with processing floating-point numbers. In case your program processes integers (which is the case in many applications, e.g. databases), the CPU uses a completely different set of hardware resources. The performance of integer operations is measured in INTOPS. Performance analysis for integer operations is still a work in progress. More information in Intel’s Advisors documentation.

2 Modern CPUs can execute instructions out-of-order. If the operands of the current instruction do not depend on the result of the previous execution, the CPU can execute the current instruction immediately, without waiting for the previous instruction to complete. Code where the instructions do not depend a lot on one another have a higher ILP. The compilers can rearrange the instructions to allow more ILP for the CPU.

3 Denis Bakhvalov, Performance Analysis and Tuning on Modern CPUs, chapter 8.2.3.1

Build correct, secure, modern and fast Fortran, C and C++ scientific software

See our plans

Book a demo

Cookie	Type	Duration	Description
	0
__asc	0	30 minutes
__auc	0	1 year
__bs_id	0	1 year
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__lxGr__ses	0	15 minutes
__lxGr__var_654116	0	15 minutes
__lxGr__var_654122	0	15 minutes
__lxGr__var_654124	0	15 minutes
__lxGr__var_654130	0	15 minutes
__lxGr__var_654134	0	15 minutes
__lxGr__var_654146	0	15 minutes
__lxGr__var_654157	0	15 minutes
__lxGr__var_654161	0	15 minutes
__lxGr__var_654163	0	15 minutes
__lxGr__var_654165	0	15 minutes
__lxGr__var_654333	0	15 minutes
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
__stidv	0	1 year
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmt_onm	0	10 minutes
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_abck	0	1 year
_cb	0	1 year
_cb_ls	0	1 year
_cb_svref	0	30 minutes
_chartbeat2	0	1 year
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_ga_Y5Q8GRTQY9	0	2 years
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_bgs	0	1 minute
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gat_hearst	0	1 minute
_gat_tDelegDominio	0	1 minute
_gat_tDominio	0	1 minute
_gat_tRollupComscore	0	1 minute
_gat_tRollupDelegacion	0	1 minute
_gat_tRollupGlobal	0	1 minute
_gat_tRollupLvgTotal	0	1 minute
_gat_tRollupNivel1	0	1 minute
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
_li_ss	0	1 month

Cookie	Type	Duration	Description
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
cookielawinfo-checkbox-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given the consent to the usage of cookies under the category 'Necessary'.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given their consent to the usage of cookies under the category 'Non-Necessary'.
DSID	1	1 hour	To note specific user identity. Contains hashed/encrypted unique ID.
JSESSIONID	1		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	0		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
pmpro_visit	0		The cookie is set by the Paid Membership Pro plugin. The cookie is used to manage user memberships.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	0	1 hour	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Type	Duration	Description
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
AMP_TOKEN	0	1 hour	This cookie is set by Google Analytics - This cookie contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
audit	0	1 year	This cookie is set by Rubicon Project and is used for recording cookie consent data.
bcookie	0	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	0		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	0	1 day	This cookie is set by LinkedIn and used for routing.
mailchimp_landing_site	0	4 weeks	The cookie is set by the email marketing service MailChimp.
na_id	0	1 year	This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
ouid	0	1 year	The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
pid	0	1 year	Helps users identify the users and lets the users use twitter related features from the webpage they are visiting.
PugT	0	1 month	This cookie is set by pubmatic.com. The purpose of the cookie is to check when the cookies were last updated on the browser in order to limit the number of calls to the server-side cookie store.
sid	0		This cookie is very common and is used for session state management.
test_cookie	0	11 months	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
YSC	1		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Type	Duration	Description
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
ad-id	1	7 months	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content
demdex	0	5 months	This cookie is set under the domain demdex.net and is used by Adobe Audience Manager to help identify a unique visitor across domains.
GPS	0	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
pardot	0		The cookie is set when the visitor is logged in as a Pardot user.
tk_lr	0	1 year	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_or	0	5 years	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_r3d	0	3 days	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience

Cookie	Type	Duration	Description
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
ad-privacy	1	5 years	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content to the users.
ATN	1	2 years	The cookie is set by atdmt.com. The cookies stores data about the user behavior on multiple websites. The data is then used to serve relevant advertisements to the users on the website.
dpm	0	5 months	The cookie is set by demdex.net. This cookie assigns a unique ID to each visiting user that allows third-party advertisers target that users with relevant ads.
everest_g_v2	0	1 year	The cookie is set under eversttech.net domain. The purpose of the cookie is to map clicks to other events on the client's website.
fr	1	2 months	The cookie is set by Facebook to show relevant advertisements to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
khaos	0	1 year	This cookie is set by rubiconproject.com. The cookie is used to store user data in an anonymous form such as the IP address, geographical location, websites visited, and the ads clicked. The purpose of the cookie is to tailor the ads displayed to the users based on the users movement on other site in the same ad network.
ljt_reader	0	1 year	This is a Lijit Advertising Platform cookie. The cookie is used for recognizing the browser or device when users return to their site or one of their partner's site.
mako_uid	0	1 year	This cookie is set under the domain ps.eyeota.net. The cookies is used to collect data about the users' visit to the website such as the pages visited. The data is used to create a users' profile in terms of their interest and demographic. This data is used for targeted advertising and marketing.
mc	0	1 year	This cookie is associated with Quantserve to track anonymously how a user interact with the website.
NID	1	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
p2	0	1 week	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
personalization_id	0	2 years	This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
PUBMDCID	0	2 months	This cookie is set by pubmatic.com. The cookie stores an ID that is used to display ads on the users' browser.
si	0	5 years	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
TDCPM	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
TDID	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
test_cookie	0	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
tuuid	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
tuuid_lu	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
uid	0	1 month	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisements.
uuid	0	1 month	To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE	1	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Is your algorithm running at peak performance? The roofline model

The roofline model

The rooflines in the roofline model

Plotting the roofline diagram

Performance improvement tips

Memory roofline

Computation roofline

Our loop is optimal, is that all?

Conclusion

PRODUCT

COMPANY

SEARCH

NEWSLETTER

The roofline model

The rooflines in the roofline model

Plotting the roofline diagram

Performance improvement tips

Memory roofline

Computation roofline

Our loop is optimal, is that all?

Conclusion

Reader Interactions

Comments

Leave a Reply Cancel reply

Footer

PRODUCT

COMPANY

SEARCH

NEWSLETTER