A touch of parallelism: example of NPB CG Benchmark

The ultimate goal of the Codee software suite is to help users achieve the peak performance of their software. One of the ways to do it is with a touch of parallelism. This post will talk about the NPB CG benchmark, a popular benchmark for comparing supercomputers, developed by NASA. We will talk about how our Codee can help users distribute their program’s workload to several CPU cores.

The workflow

The steps are as follows:

We are assuming that you have a Linux system with GCC or Clang toolchains.
We will download the source code of the benchmark.
Compile it.
Profile it to see where the hot spots are.
Use Codee to automatically distribute the workload in hot loops to several CPU cores.
And finally, we are going to measure the speed improvement.

Prerequisites

First, we will need to download the source code of NPB CG Benchmark. When you have the code, unpack it at a convenient location. You will also need our Codee tool (contact us for a demo), and perf profiler available in your Linux’ distribution repository.

The benchmark we are interested in is located in SNU_NPB-1.0.3/NPB3.3-SER-C/CG directory. Go to that directory and execute the following commands:

$ mkdir ../bin
$ make CLASS=C CC=clang CFLAGS="-O3 -g -fopenmp" CLINKFLAGS="-lm -fopenmp"
$ ../bin/cg.C.x

The make command will compile the benchmark for class C (class C means a large input size, you could also look up in the source code for other input sizes). With the last command the binary will start running the benchmark.

Finding the hot loops

To make our program run faster, first we need to find the code that takes the most time. We call this code hot code and we use tools called profilers to find it. On Linux, one of the most popular profilers is perf, which we will use here. To collect the profiling data and display the profile result, execute:

$ perf record --call-graph dwarf ../bin/cg.C.x
$ perf report

Command perf report opens perf’s text user interface where you can look around and see which functions take the most time. IBM has a full tutorial for using perf available here; we assume you are familiar with perf; if not, check it out. We are interested in functions that have a high percentage in the Self column. In our case, the function that clearly dominates the profile is conj_grad.

To see which code inside the function takes the most time, press the key A for annotation. Annotation in perf’s terminology means marking each instruction in the function with a percentage: how much time did the program spend doing that particular instruction.

If we look at the perf’s output, we see that our program spends a lot of time running instructions belonging to the loop for (k = rowstr[j]; k < rowstr[j+1]; k++). If we add percentages for each instruction colored with red or green, it turns out 94,48% of the time is spent there. Here is the full source code of the loop with its surrounding loops:

 for (cgit = 1; cgit <= cgitmax; cgit++) {
    ...
    for (j = 0; j < lastrow - firstrow + 1; j++) {
      sum = 0.0;
      for (k = rowstr[j]; k < rowstr[j+1]; k++) {
        sum = sum + a[k]*p[colidx[k]];
      }
      q[j] = sum;
    }
   …
}

Before doing any parallelization effort, you should try to understand why your serial code is slow. Speeding up the serial code will almost always result in speeding up the parallel code. In this case, the reason for the slowness comes from data cache misses. The access to value p[colidx[k]] will almost always result in a data cache miss, and since this access is inside a triply nested loop, we can expect many cache misses. You can confirm the hypothesis by recording data cache misses using perf record -e cache-misses --call-graph dwarf ../bin/cg.C.x. The command will point you to the code where most cache misses occur.

Unfortunately, in this case no serial optimization would do the trick: there is no way we can refactor the code to avoid access to p[colidx[k]]. We might try to cache the values, e.g. p_tmp[k] = p[colidx[k]] since we are in the nested loop, but in this case, k always has a different value for all values of j. So, we have to look for performance elsewhere.

The magic of parallelism

Now when we know what is the code where our program spends most of its time, we can focus on speeding it up. We use Codee to do this automatically. It can detect the loops with parallelization potential and annotate them with OpenMP pragma ¹. The runtime will then be able to distribute the loop to multiple CPU cores.

The steps to use Codee are as follows:

Use pwreport --evaluation to produce the entry-level performance optimization report for the input source code
Use pwreport --actions to display the details about the performance issues in the code and how to fix them
Use pwdirectives to automatically generate optimized code using OpenMP directives for multithreading
Measure the performance difference

Start using Codee by invoking the tool pwreport tool with the flag --evaluation. To include the performance optimizations related to multithreading use the flag --include-tags multi. The command looks like this:

$ pwreport --evaluation --include-tags multi cg.c -- -I ../common/

We are instructing pwreport --evaluation to perform the analysis on the file cg.c. In the command line, after --, we need to provide the same flags we are passing to the compiler. In our case we are providing -I../common, which means to look for headers in ../common directory. The output produced by Codee is as follows:

Compiler flags: -I ../common/

Target Lines of code Analyzed lines Analysis time # actions Effort Cost    Profiling
------ ------------- -------------- ------------- --------- ------ ------- ---------
cg.c   498           231            79 ms         12        144 h  4712€   n/a

ACTIONS PER STAGE OF THE PERFORMANCE OPTIMIZATION ROADMAP
Target Serial scalar Serial control Serial memory Vectorization Multithreading Offloading
------ ------------- -------------- ------------- ------------- -------------- ----------
cg.c   0             0              0             0             12             n/a

Target : analyzed directory or source code file
Lines of code : total lines of code found in the target (computed the same way as the sloccount tool)
Analyzed lines : relevant lines of code successfully analyzed
Analysis time : time required to analyze the target
# actions : total actionable items (opportunities, recommendations, defects and remarks) detected
Effort : estimated number of hours it would take to carry out all actions (serial scalar, serial control, serial memory, vectorization, multithreading and offloading with 1, 2, 4, 8, 12 and 16 hours respectively)
Cost : estimated cost in euros to carry out all the actions, paying the average salary of 56,286€/year for a professional C/C++ developer working 1720 hours per year
Profiling : estimation of overall execution time required by this target

SUGGESTIONS
  You can specify multiple inputs which will be displayed as multiple rows (ie. targets) in the table, eg:
        pwreport --evaluation some/other/dir --include-tags multi cg.c -- -I ../common/

  Use --actions to find out details about the detected actions:
        pwreport --actions --include-tags multi cg.c -- -I ../common/

  You can focus on a specific optimization type by filtering by its tag (serial-scalar, serial-control, serial-memory, vectorization, multithreading, offloading), eg.:
        pwreport --actions --include-tags serial-scalar cg.c -- -I ../common/

1 file successfully analyzed and 0 failures in 79 ms

This is Codee’s entry-level performance optimization report produced by the command pwreport --evaluation. It reports twelve (12) actions related to multithreading, and also suggests a subsequent command invocation pwreport --actions to find out details about the detected actions. We are interested in the hot loop at line 458 of the `cg.c:conj_grad` function. The corresponding command-line invocation is as follows:

$ pwreport --actions --level 2 --include-tags multi cg.c:conj_grad -- -I ../common/
Compiler flags: -I ../common/

ACTIONS REPORT

  FUNCTION BEGIN at cg.c:conj_grad:406:1
    406: static void conj_grad(int colidx[],
...
      LOOP BEGIN at cg.c:conj_grad:458:5
        458:     for (j = 0; j < lastrow - firstrow + 1; j++) {
        459:       sum = 0.0;
        460:       for (k = rowstr[j]; k < rowstr[j+1]; k++) {
        461:         sum = sum + a[k]*p[colidx[k]];
        462:       }
        463:       q[j] = sum;
        464:     }

        [OPP001] cg.c:458:5 is a multi-threading opportunity
          Compute patterns:
            - 'forall' over the variable 'q'

          SUGGESTION: use pwloops to get more details or pwdirectives to generate directives:
            pwloops cg.c:conj_grad:458:5 -- -I ../common/
            pwdirectives --omp multi cg.c:conj_grad:458:5 --in-place -- -I ../common/

          More information on: https://www.appentra.com/knowledge/opportunities

      LOOP END
...
  FUNCTION END

CODE COVERAGE
  Analyzable files:            1 / 1     (100.00 %)
  Analyzable functions:        1 / 1     (100.00 %)
  Analyzable loops:           12 / 12    (100.00 %)
  Parallelized SLOCs:          0 / 55    (  0.00 %)

METRICS SUMMARY
  Total recommendations:         0
  Total opportunities:           8
  Total defects:                 0
  Total remarks:                 0

SUGGESTIONS

  8 opportunities for parallelization were found in your code, get more information with pwloops:
        pwloops cg.c:conj_grad -- -I ../common/

  More details on the defects, recommendations and more in the Knowledge Base:
        https://www.appentra.com/knowledge/

1 file successfully analyzed and 0 failures in 56 m

For this loop, we obtain a forall multi-threading opportunity for the variable q. A forall pattern in a loop is a pattern where each iteration of the loop modifies a single element of the output array, and each iteration does work independently of the other iterations. Multi opportunity is a contraction for multithreading, which means that the loop can be parallelized by distributing it to multiple CPU cores.

In the suggestion attached to the multi-threading opportunity, it is possible to see the command to auto parallelize the loop using pwdirectives. Let’s do it in a simple manner, by copying and pasting the suggested command:

$ cp cg.c cg-original.c
$ pwdirectives --omp multi cg.c:conj_grad:458:5 --in-place -- -I ../common/
Compiler flags: -I ../common/

Results for file 'cg.c':
  Successfully parallelized loop at 'cg.c:conj_grad:458:5' [using multi-threading]:
      [INFO] cg.c:458:5 Parallel forall: variable 'q'
      [INFO] cg.c:458:5 Loop parallelized with multithreading using OpenMP directive 'for'
      [INFO] cg.c:458:5 Parallel region defined by OpenMP directive 'parallel'
Successfully updated cg.c

Minimum software stack requirements: OpenMP version 2.0 with multithreading capabilities

The replacement is done in place with the command line switch --in-place. To show you the changes done by Codee, find below the output of the diff tool and the OpenMP-enabled loop:

$ diff cg-original.c cg.c
457a458,460
> 	#pragma omp parallel default(none) shared(a, colidx, firstrow, lastrow, p, q, rowstr) private(j, k, sum)
> 	{
> 	#pragma omp for private(k, sum) schedule(auto)
464a468
> 	} // end parallel

$ less cg.c
...
#pragma omp parallel default(none) shared(a, colidx, firstrow, lastrow, p, q, rowstr) private(j, k, sum)
    {
    #pragma omp for private(k, sum) schedule(auto)
    for (j = 0; j < lastrow - firstrow + 1; j++) {
      sum = 0.0;
      for (k = rowstr[j]; k < rowstr[j+1]; k++) {
        sum = sum + a[k]*p[colidx[k]];
      }
      q[j] = sum;
    }
    } // end parallel
...

Codee added two OpenMP pragmas to allow distributing the loop to multiple CPU cores. It marked the variables as private or shared, depending on the pattern they should be used in when running in multithreaded settings.

The results

Command pwdirectives has created a parallelized version of our loop. Now we need to recompile it. But before doing that, let’s measure how much time does the original program take. You may use the time command:

$ time ../bin/cg.C.x

Subscribe to our newsletter

and receive in-depth technical articles, white papers, videos, webinars, product announcements, and more.

Now, let’s recompile it with multithreading and measure the difference. To recompile and run the multithreaded version:

$ make CLASS=C CC=clang CFLAGS="-O3 -g -fopenmp" CLINKFLAGS="-lm -fopenmp"
$ time ../bin/cg.C.x

We measured the speed difference on a general purpose AMD Ryzen 7 4800H laptop with 8 cores and 16 hardware threads, 16 GB of memory, Ubuntu 20.04 operating system and CLANG 10 compiler. Here are the results:

The multithreaded version is 3 times faster than the original version. A nice improvement with a little effort!

A few concluding remarks

In this example, we showed how focusing on hot loops together with distribution to multiple cores can help speed up our program. Codee did all the dirty work for us, so we were running our program at full speed very quickly.

Although Codee detected many parallelization opportunities, we decided to parallelize only one hot spot. Why? Parallelization by distributing the workload to several cores is not a magic solution to every performance issue. It takes time to spawn several threads and divide the workload among them, so this kind of optimizations pays off only for significant workloads. The idea to parallelize every possible loop regardless of its trip count would bring severe performance degradation.

A question you might be asking yourself is why the speedup is not proportional to the number of threads? If the serial version took 141.4 seconds, we could expect that the version with four threads takes one fourth of that and the version with sixteen threads takes one sixteenth. This can actually happen in some applications, but in this particular case the reason for the slowness was not the lack of computational power, but the lack of memory bandwidth. All the cores share access to the same memory, and this bandwidth is limited. That is the reason why the speed up is not proportional to the number of threads and also the reason why the version with 4 threads is only slightly slower (52.1 seconds) compared to the version with 16 threads (45.7 seconds).

Related resources

Knowledge Base of the Codee tool
NASA – NAS Parallel Benchmarks (NPB)
Seoul National University – SNU NPB Suite
IBM Developer – Analyzing performance with perf annotate

1 Note: OpenMP is the de facto industry standard when it comes to distributing workloads to several CPU cores. All the major compilers support it. It uses compiler pragmas to instruct the compiler and the runtime on how to distribute the workload to several CPU cores.

Build correct, secure, modern and fast Fortran, C and C++ scientific software

See our plans

Book a demo

Cookie	Type	Duration	Description
	0
__asc	0	30 minutes
__auc	0	1 year
__bs_id	0	1 year
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__lxGr__ses	0	15 minutes
__lxGr__var_654116	0	15 minutes
__lxGr__var_654122	0	15 minutes
__lxGr__var_654124	0	15 minutes
__lxGr__var_654130	0	15 minutes
__lxGr__var_654134	0	15 minutes
__lxGr__var_654146	0	15 minutes
__lxGr__var_654157	0	15 minutes
__lxGr__var_654161	0	15 minutes
__lxGr__var_654163	0	15 minutes
__lxGr__var_654165	0	15 minutes
__lxGr__var_654333	0	15 minutes
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
__stidv	0	1 year
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmt_onm	0	10 minutes
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_abck	0	1 year
_cb	0	1 year
_cb_ls	0	1 year
_cb_svref	0	30 minutes
_chartbeat2	0	1 year
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_ga_Y5Q8GRTQY9	0	2 years
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_bgs	0	1 minute
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gat_hearst	0	1 minute
_gat_tDelegDominio	0	1 minute
_gat_tDominio	0	1 minute
_gat_tRollupComscore	0	1 minute
_gat_tRollupDelegacion	0	1 minute
_gat_tRollupGlobal	0	1 minute
_gat_tRollupLvgTotal	0	1 minute
_gat_tRollupNivel1	0	1 minute
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
_li_ss	0	1 month

Cookie	Type	Duration	Description
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
cookielawinfo-checkbox-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given the consent to the usage of cookies under the category 'Necessary'.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given their consent to the usage of cookies under the category 'Non-Necessary'.
DSID	1	1 hour	To note specific user identity. Contains hashed/encrypted unique ID.
JSESSIONID	1		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	0		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
pmpro_visit	0		The cookie is set by the Paid Membership Pro plugin. The cookie is used to manage user memberships.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	0	1 hour	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Type	Duration	Description
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
AMP_TOKEN	0	1 hour	This cookie is set by Google Analytics - This cookie contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
audit	0	1 year	This cookie is set by Rubicon Project and is used for recording cookie consent data.
bcookie	0	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	0		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	0	1 day	This cookie is set by LinkedIn and used for routing.
mailchimp_landing_site	0	4 weeks	The cookie is set by the email marketing service MailChimp.
na_id	0	1 year	This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
ouid	0	1 year	The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
pid	0	1 year	Helps users identify the users and lets the users use twitter related features from the webpage they are visiting.
PugT	0	1 month	This cookie is set by pubmatic.com. The purpose of the cookie is to check when the cookies were last updated on the browser in order to limit the number of calls to the server-side cookie store.
sid	0		This cookie is very common and is used for session state management.
test_cookie	0	11 months	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
YSC	1		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Type	Duration	Description
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
ad-id	1	7 months	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content
demdex	0	5 months	This cookie is set under the domain demdex.net and is used by Adobe Audience Manager to help identify a unique visitor across domains.
GPS	0	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
pardot	0		The cookie is set when the visitor is logged in as a Pardot user.
tk_lr	0	1 year	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_or	0	5 years	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_r3d	0	3 days	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience

Cookie	Type	Duration	Description
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
ad-privacy	1	5 years	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content to the users.
ATN	1	2 years	The cookie is set by atdmt.com. The cookies stores data about the user behavior on multiple websites. The data is then used to serve relevant advertisements to the users on the website.
dpm	0	5 months	The cookie is set by demdex.net. This cookie assigns a unique ID to each visiting user that allows third-party advertisers target that users with relevant ads.
everest_g_v2	0	1 year	The cookie is set under eversttech.net domain. The purpose of the cookie is to map clicks to other events on the client's website.
fr	1	2 months	The cookie is set by Facebook to show relevant advertisements to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
khaos	0	1 year	This cookie is set by rubiconproject.com. The cookie is used to store user data in an anonymous form such as the IP address, geographical location, websites visited, and the ads clicked. The purpose of the cookie is to tailor the ads displayed to the users based on the users movement on other site in the same ad network.
ljt_reader	0	1 year	This is a Lijit Advertising Platform cookie. The cookie is used for recognizing the browser or device when users return to their site or one of their partner's site.
mako_uid	0	1 year	This cookie is set under the domain ps.eyeota.net. The cookies is used to collect data about the users' visit to the website such as the pages visited. The data is used to create a users' profile in terms of their interest and demographic. This data is used for targeted advertising and marketing.
mc	0	1 year	This cookie is associated with Quantserve to track anonymously how a user interact with the website.
NID	1	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
p2	0	1 week	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
personalization_id	0	2 years	This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
PUBMDCID	0	2 months	This cookie is set by pubmatic.com. The cookie stores an ID that is used to display ads on the users' browser.
si	0	5 years	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
TDCPM	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
TDID	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
test_cookie	0	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
tuuid	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
tuuid_lu	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
uid	0	1 month	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisements.
uuid	0	1 month	To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE	1	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

A touch of parallelism: example of NPB CG Benchmark

The workflow

Prerequisites

Finding the hot loops

The magic of parallelism

The results

A few concluding remarks

Related resources

PRODUCT

COMPANY

SEARCH

NEWSLETTER

The workflow

Prerequisites

Finding the hot loops

The magic of parallelism

The results

A few concluding remarks

Related resources

Reader Interactions

Leave a Reply Cancel reply

Footer

PRODUCT

COMPANY

SEARCH

NEWSLETTER