Many ways to speed up your program

There are many approaches to make your program run faster. Some approaches are based on using more efficient libraries, others rely on using the standard library and the language in an efficient manner. Other approaches include using more efficient algorithms. But sometimes, even after we’ve applied all the possible optimizations, the code performance is not satisfactory. This is the time where we need to investigate our code and make sure we are using all the available hardware resources in the most efficient manner.

In this post we present software development techniques aimed at using the hardware in an efficient fashion, and we will explore how our Codee tool can help you boost the performance of your code by getting the most out of your hardware.

We will talk about several aspects of efficient hardware utilization:

using all available CPU cores,
using accelerators such as GPUs,
vectorization,
memory optimizations and
advanced hardware features.

Distributing workload to multiple CPU cores

CPUs with several cores are now ubiquitous. Even the cheapest phones have at least two CPU cores, laptops will have anywhere between two and eight CPU cores and high-end CPUs will have hundreds of cores. There is a huge potential for parallelism that many programs never exploit.

A very convenient way of making a serial code run on multiple cores is OpenMP. OpenMP is an open standard supported by all major compilers (eg. GCC, CLANG, Intel’s compiler or Microsoft’s compiler). It is supported in C, C++ and Fortran. It allows the user to decorate their code with compiler pragmas that tell the compiler how to perform the distribution to multiple CPU cores.

Here is an example of a small loop decorated with OpenMP pragmas. The pragmas were added by Codee automatically:

#pragma omp parallel default(none) shared(a, n, result) private(i)
{
    #pragma omp for reduction(+: result) schedule(auto)
    for (i = 0; i < n; i++) {
        result += sqrt(a[i]) / n;
    }
} // end parallel

When the code is executed and the section of the code with the compiler pragmas is reached, the code gets executed by several CPU cores. The compiler takes care of everything needed to run the code – thread creation, data synchronization and thread synchronization.

We wrote extensively about distributing your critical loop to several CPU cores using OpenMP. In the example of the Canny image processing algorithm, the OpenMP version of the program ran 3.4 times faster than the serial version on a system with eight threads. In another example of NP CG benchmark, the program ran 2.7 times faster than the original on a system with four threads.

Codee guides you through the process of optimizing your code by providing human-readable actionable hints. For instance, it detects computations in your code that can be distributed to multiple CPU cores. If you are interested in doing so using OpenMP, it can also help you speed up those computations by inserting the proper pragmas automatically.

Distributing workload to accelerators

Many modern computer systems also contain accelerator devices that can be used to speed up computation. For example, modern desktops and laptops have GPUs, programmable graphic accelerators that are used for 2D or 3D rendering. High-performance computers often feature an accelerator device to speed up scientific calculations.

Accelerators are part of massively parallel architectures and they are really efficient at solving many types of problems. There are several ways of programming accelerators, low-level such as CUDA and OpenCL; or high-level such as OpenMP or OpenACC. Our product, Codee, helps developers automatically port their codebases to accelerators using OpenACC and OpenMP.

OpenACC is a really simple way to make use of the hardware accelerators. Similar to OpenMP from the previous section, it relies on compiler pragmas and compiler support to compile code that will run on an accelerator.

Here is an example of a loop with OpenACC directives, automatically generated by Codee, which offloads the execution to a GPU:

double out_result;
double sum = 0.0;

#pragma acc data copyin(N) copy(sum)
{
#pragma acc parallel
    {
#pragma acc loop reduction(+ : sum)
        for (int i = 0; i < N; i++) {
            double x = (i + 0.5) / N;
            sum += sqrt(1 - x * x);
        }
    }  // end parallel
}      // end data

out_result = 4.0 / N * sum;

The above code snippet calculates the value of number π and it is executed on the GPU. It performs all the necessary steps to run on an accelerator: move input data to the accelerator, computations, move the results back to the main memory.

This loop took 0.9 s to finish on a GPU compared to the 2.4 s on a CPU when the variable N was two billion. A good way of using the accelerators!

You can use Codee to detect computations that can be offloaded to accelerator devices such as GPUs. Moreover, it offers optimization recommendations focused on GPUs such as PWR009 and PWR015. It can also help by generating both the OpenMP and OpenACC directives required to distribute the workload to the GPU.

Usage of vectorization capabilities of your CPU

Modern CPUs contain vector units that work on fixed length vectors (e.g. vector of four doubles or vector of eight integers) and can execute a single operation on a vector in one instruction. If vectorization is available, the CPU can e.g. load four doubles from memory, perform four additions and store four results back to memory for the same time it would take to do the same operations on one double.

Compilers will typically create vectorized code automatically, but there are many obstacles to this which we already wrote about. Take for example, a naive matrix multiplication algorithm:

for (int i = 0; i < n; i++) {
     for (int j = 0; j < n; j++) {
         c[i][j] = 0;
         for (int k = 0; k < n; k++) {
             c[i][j] = c[i][j] + a[i][k] * b[k][j];
         }
     }
}

The compiler will not automatically vectorize the above loop since the access to b[k][j] is not going through the memory sequentially, i.e. the CPU is accessing elements of the matrix b with a stride which is n. A technique called loop interchange can help the performance. Here is the modified source code:

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        c[i][j] = 0;
    }
    for (int k = 0; k < n; k++) {
        for (int j = 0; j < n; j++) {
            c[i][j] = c[i][j] + a[i][k] * b[k][j];
        }
    }
}

In this example, we first performed loop fission on loop over j and split it into two loops. The first loop performs the initialization of c[i][j] (line 3), the second loop performs the calculations. Next, we interchanged the loop over j and the loop over k. With this transformation, in the innermost loop we only have constant memory accesses (always accessing the same memory location: a[i][k]) or sequential accesses (accessing neighboring memory locations: c[i][j] and b[k][j]).

The original code took 14.4 s to execute on a 2400×2400 integer matrix. The modified version took 3.8 s, which is a drastic improvement in speed! All thanks to loop interchange and vectorization.

Just like for multicore CPUs and offloading to the GPUs, Codee can help with vectorization. For instance, it offers hints on how to refactor your code to enable auto-vectorization (eg. PWR020) or how to increase the vectorization performance (eg. PWR019). Additionally it can help you generate vectorization directives for OpenMP or compiler-specific ones (eg. gcc, clang, icc).

Optimizing for the memory subsystem

Modern CPUs are very fast, but the memories they use to load and store data are much slower. Therefore, modern CPUs often have to wait for the data to be fetched from memory. This is especially visible in the case of non-sequential memory access patterns ¹: strided memory access pattern and random memory access pattern. Every effort to decrease the amount of data needed to be fetched from memory, to move to a sequential memory access pattern or increase locality (by accessing data already present in the cache) will yield performance improvements, which can sometimes be significant.

Consider the following example of matrix transposition:

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        b[i][j] = a[j][i];
    }
}

From the performance perspective, access to array a[j][i] is inefficient because elements of the array are accessed with a stride n. A technique called loop tiling or loop blocking can help to increase the performance. On modern CPUs, everytime you access an element of an array, a few neighboring elements will be loaded to the data cache and their access will be very cheap. Loop tiling exploits this fact by operating on small blocks of data.

The difference in access patterns can be seen in the following table:

Original loop	Loop tiling with tile size = 2
b[0][0] = a[0][0] b[0][1] = a[1][0] b[0][2] = a[2][0] b[0][3] = a[3][0]	b[0][0] = a[0][0] b[0][1] = a[1][0] b[1][0] = a[0][1] b[1][1] = a[1][1]
b[0][4] = a[4][0] b[0][5] = a[5][0] b[0][6] = a[6][0] b[0][7] = a[7][0]	b[0][2] = a[2][0] b[0][3] = a[3][0] b[1][2] = a[2][1] b[1][3] = a[3][1]

Because of memory caches, access to a[0][0] also makes access to a[0][1] very fast. The tiling version takes advantage of this since it is accessing a[0][1] very soon after it has accessed a[0][0].

The matrix transpose algorithm with loop tiling looks (tile size = 4) like this:

for (int ii = 0; ii < n; ii += 4) {
    for (int jj = 0; jj < n; jj += 4) {
        for (int i = ii; i < (ii + 4); i++) {
            for (int j = jj; j < (jj + 4); j++) {
                b[i][j] = a[j][i];
            }
        }
    }
}

On a matrix of 10000 x 10000 elements it took 860 ms to execute the original algorithm, and 288 ms to execute the loop tiling version.

There are numerous other ways to optimize for the memory subsystems, here are just a few examples:

Use loop interchange to move to a better memory access pattern.
Decrease the size of your struct/or class. Remove rarely used members to other structs.
Move to Struct-of-Arrays. This transformation often results in the loop in question getting vectorized.
Decrease the size of your random access data structure (e.g. use smaller data types or smaller pointers).

Codee provides hints on how to apply these and other techniques to optimize your code for the memory subsystems, such as the PWR010 or PWR016 recommendations. It also offers several reports to get insights on how memory is used in your code, including memory access patterns.

Optimizing for the CPU’s branch prediction unit

Modern CPUs are optimized to maintain a high throughput of instructions. They follow a pipelined design in which ideally the stream of instructions to process should never be interrupted. Conditional code and branching poses a problem because the instructions to execute following the conditional branch depend on the condition. Thus the next exact branch instructions can not be pipelined because they are not known until the condition instructions are fully evaluated.

Subscribe to our newsletter

and receive in-depth technical articles, white papers, videos, webinars, product announcements, and more.

To help with this, CPUs have advanced branch prediction units that attempt to predict if the branch is to be taken or not. Based on the prediction, the CPU starts executing the instructions at the predicted destination of the branch so that the computation never stops. If the prediction turned out to be right, this means that the CPU wasn’t sitting idly and has done some work. If it’s wrong, the CPU needs to revert the results of the already executed instructions and start over. Reverting the results that happen as the result of the misprediction means that the CPU will waste some time.

Most of the time the branch predictor works great! It can predict simple branches, like always true or always false, but it can also predict branches that have complex history, e.g. alternating true/false or true hundred times and false once (happens in loops).

Branch predictor works poorly when the branch is unpredictable (happens on random data) and the chance of the condition being true is 50%. In that case you can expect a successful prediction rate of 50%, and this kind of code has many places for improvements.

Take for an example the following code (motivated by the Canny image processing example):

void calculate_hysteresis(int cond[], int in[], int n, int out[]) {
    for (int i = 0; i < n; i++) {
        if (cond[i] > 0) {
            out[in[i]]++;
        }
    }
}

The above loop is non-vectorizable due to a possible data dependency between loop iterations, since i[i] and in[i-1] can refer to the same element of the out array. If the condition cond[i] > 0 is unpredictable with a 50% chance of being true, the CPU will lose many cycles doing useless work.

In these cases we can try making the above code branchless. Here is an example of the same code but without the `if` statement:

void calculate_hysteresis_branchless(int cond[], int in[], int n, int out[]) {
    for (int i = 0; i < n; i++) {
        out[in[i]]+=(cond[i] > 0);
    }
}

For the unpredictable condition, the original version took 516 ms to execute for an array of size 100 million numbers. The branchless version took 123 ms. Branchless implementation is 4.2 times faster.

Codee currently does not emit recommendations related to optimizations for the branch prediction, but we plan to address this in the future as a way of speeding up loops that cannot be optimized using vectorization.

Summary

In this post we presented techniques that allow you to speed up your code by optimally using the available hardware. We started by taking advantage of additional CPUs and accelerator hardware, moved over to using vectorization capabilities of the CPU, explored efficient use of memory subsystem and finished with optimizations for the branch predictor. We gave examples which demonstrate the possibilities of each of these techniques.

Our Codee can help you there in many ways! It guides you through the process of optimizing your code by providing human-readable actionable hints. It can detect inefficient code that the compiler cannot vectorize automatically and propose fixes as well as inefficient memory access patterns that are slowing down your code. Moreover, it can rewrite your code automatically to take advantage of additional CPU cores and accelerator devices. It is our vision to make it the best tool on the market for writing fast and efficient code!

Additional information:
All the tests for this post were executed on AMD Ryzen 7 4800H CPU with 16 cores, 16 GB of RAM memory and GeForce GTX 1650 Ti GPU on Ubuntu 20.04. We disabled processor frequency scaling in order to decrease runtime variance.

1 We wrote extensively about memory access patterns in a post about the Canny image detection algorithm.

Build correct, secure, modern and fast Fortran, C and C++ scientific software

See our plans

Book a demo

Cookie	Type	Duration	Description
	0
__asc	0	30 minutes
__auc	0	1 year
__bs_id	0	1 year
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__lxGr__ses	0	15 minutes
__lxGr__var_654116	0	15 minutes
__lxGr__var_654122	0	15 minutes
__lxGr__var_654124	0	15 minutes
__lxGr__var_654130	0	15 minutes
__lxGr__var_654134	0	15 minutes
__lxGr__var_654146	0	15 minutes
__lxGr__var_654157	0	15 minutes
__lxGr__var_654161	0	15 minutes
__lxGr__var_654163	0	15 minutes
__lxGr__var_654165	0	15 minutes
__lxGr__var_654333	0	15 minutes
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
__stidv	0	1 year
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmt_onm	0	10 minutes
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_abck	0	1 year
_cb	0	1 year
_cb_ls	0	1 year
_cb_svref	0	30 minutes
_chartbeat2	0	1 year
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_ga_Y5Q8GRTQY9	0	2 years
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_bgs	0	1 minute
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gat_hearst	0	1 minute
_gat_tDelegDominio	0	1 minute
_gat_tDominio	0	1 minute
_gat_tRollupComscore	0	1 minute
_gat_tRollupDelegacion	0	1 minute
_gat_tRollupGlobal	0	1 minute
_gat_tRollupLvgTotal	0	1 minute
_gat_tRollupNivel1	0	1 minute
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
_li_ss	0	1 month

Cookie	Type	Duration	Description
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
cookielawinfo-checkbox-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given the consent to the usage of cookies under the category 'Necessary'.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given their consent to the usage of cookies under the category 'Non-Necessary'.
DSID	1	1 hour	To note specific user identity. Contains hashed/encrypted unique ID.
JSESSIONID	1		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	0		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
pmpro_visit	0		The cookie is set by the Paid Membership Pro plugin. The cookie is used to manage user memberships.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	0	1 hour	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Type	Duration	Description
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
AMP_TOKEN	0	1 hour	This cookie is set by Google Analytics - This cookie contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
audit	0	1 year	This cookie is set by Rubicon Project and is used for recording cookie consent data.
bcookie	0	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	0		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	0	1 day	This cookie is set by LinkedIn and used for routing.
mailchimp_landing_site	0	4 weeks	The cookie is set by the email marketing service MailChimp.
na_id	0	1 year	This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
ouid	0	1 year	The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
pid	0	1 year	Helps users identify the users and lets the users use twitter related features from the webpage they are visiting.
PugT	0	1 month	This cookie is set by pubmatic.com. The purpose of the cookie is to check when the cookies were last updated on the browser in order to limit the number of calls to the server-side cookie store.
sid	0		This cookie is very common and is used for session state management.
test_cookie	0	11 months	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
YSC	1		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Type	Duration	Description
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
ad-id	1	7 months	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content
demdex	0	5 months	This cookie is set under the domain demdex.net and is used by Adobe Audience Manager to help identify a unique visitor across domains.
GPS	0	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
pardot	0		The cookie is set when the visitor is logged in as a Pardot user.
tk_lr	0	1 year	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_or	0	5 years	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_r3d	0	3 days	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience

Cookie	Type	Duration	Description
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
ad-privacy	1	5 years	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content to the users.
ATN	1	2 years	The cookie is set by atdmt.com. The cookies stores data about the user behavior on multiple websites. The data is then used to serve relevant advertisements to the users on the website.
dpm	0	5 months	The cookie is set by demdex.net. This cookie assigns a unique ID to each visiting user that allows third-party advertisers target that users with relevant ads.
everest_g_v2	0	1 year	The cookie is set under eversttech.net domain. The purpose of the cookie is to map clicks to other events on the client's website.
fr	1	2 months	The cookie is set by Facebook to show relevant advertisements to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
khaos	0	1 year	This cookie is set by rubiconproject.com. The cookie is used to store user data in an anonymous form such as the IP address, geographical location, websites visited, and the ads clicked. The purpose of the cookie is to tailor the ads displayed to the users based on the users movement on other site in the same ad network.
ljt_reader	0	1 year	This is a Lijit Advertising Platform cookie. The cookie is used for recognizing the browser or device when users return to their site or one of their partner's site.
mako_uid	0	1 year	This cookie is set under the domain ps.eyeota.net. The cookies is used to collect data about the users' visit to the website such as the pages visited. The data is used to create a users' profile in terms of their interest and demographic. This data is used for targeted advertising and marketing.
mc	0	1 year	This cookie is associated with Quantserve to track anonymously how a user interact with the website.
NID	1	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
p2	0	1 week	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
personalization_id	0	2 years	This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
PUBMDCID	0	2 months	This cookie is set by pubmatic.com. The cookie stores an ID that is used to display ads on the users' browser.
si	0	5 years	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
TDCPM	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
TDID	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
test_cookie	0	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
tuuid	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
tuuid_lu	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
uid	0	1 month	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisements.
uuid	0	1 month	To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE	1	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Many ways to speed up your program

Distributing workload to multiple CPU cores

Distributing workload to accelerators

Usage of vectorization capabilities of your CPU

Optimizing for the memory subsystem

Optimizing for the CPU’s branch prediction unit

Summary

PRODUCT

COMPANY

SEARCH

NEWSLETTER

Distributing workload to multiple CPU cores

Distributing workload to accelerators

Usage of vectorization capabilities of your CPU

Optimizing for the memory subsystem

Optimizing for the CPU’s branch prediction unit

Summary

Reader Interactions

Comments

Leave a Reply Cancel reply

Footer

PRODUCT

COMPANY

SEARCH

NEWSLETTER