Flexibility and Performance: turbocharging loop unswitching

In the previous post we talked about loop unswitching: a technique that the compilers use to make loops faster. Loop unswitching consists of detecting conditional statements whose condition never changes during the execution of the loop (loop invariant conditions) and then moving the condition outside of the loop.

Loop unswitching opens the door for other compiler optimizations, notability vectorization. However, sometimes the compilers opt-out of loop unswitching for the reasons we talked about in the previous post. If this is the case with your hot code, you can take the matters into your own hands and manually unswitch your hot loop.

In this post we talk about manual loop unswitching, and later we investigate the performance of loop unswitching on an example inspired by the code of one of our customers.

Manual loop unswitching

As you may have seen from the previous examples, loop unswitching is a fragile compiler optimization that often may not take place. However, there are ways to force it on the loops on the hot path.

Here is an example we used in the previous post to demonstrate loop unswitching:

int calculate_sum(int* a, int n, bool only_positives) {
   int sum = 0;
    for (int i = 0; i < n; i++) {
        if (only_positives) {
            if (a[i] > 0) {
                sum += a[i];
            }
        } else {
            sum += a[i];
        }
    }
 
    return sum;
}

The variable only_positives is a function parameter that never changes its value during the execution of the loop at line 4.

The compiler might or might not perform the loop unswitching automatically. However, you can manually perform the loop unswitching for high performance.

The simplest way to do it is to extract the loop invariant condition outside of the loop and create two loops, one per branch:

int calculate_sum(int* a, int n, bool only_positives) {
   int sum = 0;
   if (only_positives) {
        for (int i = 0; i < n; i++) {
            if (a[i] > 0) {
                sum += a[i];
            }
        }
    } else {
        for (int i = 0; i < n; i++) {
            sum += a[i];
        }
    }

    return sum;
}

In the above code, we created two versions of the loop, one where the only_positives is true and the other where it is false. We removed the unreachable code from the inside of the loop. The loop body is simpler, smaller and generally easier for the compiler to optimize.

Subscribe to our newsletter

and receive in-depth technical articles, white papers, videos, webinars, product announcements, and more.

However, this kind of manual loop unswitching results in duplicate loops and code that is more difficult to maintain. Luckly, there are ways to do it without code duplication.

If you are using C++, it is really simple to do it manually using templates. The original example rewritten with templates looks like this:

template <bool only_positives>
int calculate_sum(int* a, int n) {
   int sum = 0;
    for (int i = 0; i < n; i++) {
        if (only_positives) {
            if (a[i] > 0) {
                sum += a[i];
            }
        } else {
            sum += a[i];
        }
    }

    return sum;
}

The only difference from the original code is that only_positives is now a template parameter. The compiler basically creates two copies of the same function, one in with a constant value of true for only_positives, the other with false. In order to invoke the function, the template parameter must be specified (eg. calculate_sum<true>(a, n)) thus matching the proper copy of the function. The values of template parameters are known at compile time; if the value of only_positives is true, the compiler will perform dead code elimination and remove the else branch (lines 9-11). Otherwise, it will perform dead code elimination on the if branch (lines 5-8).

If your project is using C, the same behavior can be made using macros. You can create a function calculate_sum_only_positives_true for the case where the parameter is true and calculate_sum_only_postives_false for the case when the parameter is false. Here is an example on how to do it:

#define CALCULATE_SUM(only_positives)                                  \
    int calculate_sum_only_positives_##only_positives(int* a, int n) { \
        int sum = 0;                                                   \
        for (int i = 0; i < n; i++) {                                  \
            if (only_positives) {                                      \
                if (a[i] > 0) {                                        \
                    sum += a[i];                                       \
                }                                                      \
            } else {                                                   \
                sum += a[i];                                           \
            }                                                          \
        }                                                              \
                                                                       \
        return sum;                                                    \
    }

CALCULATE_SUM(true)
CALCULATE_SUM(false)

The above code snippet does exactly the same as C++ version. It creates two functions with names calculate_sum_only_positives_true and calculate_sum_only_postives_false. Dead code elimination makes sure that unused branches are cut off.

Alternatively, you can do loop unswitching inside the function:

int calculate_sum(int* a, int n, bool only_positives) {
    int sum = 0;
#define BODY(only_positives_m)    \
    for (int i = 0; i < n; i++) { \
        if (only_positives_m) {   \
            if (a[i] > 0) {       \
                sum += a[i];      \
            }                     \
        } else {                  \
            sum += a[i];          \
        }                         \
    }

    if (only_positives) {
        BODY(true);
    } else {
        BODY(false);
    }

#undef BODY
    return sum;
}

We define a macro named BODY that accepts argument only_positives_m. On line 14 we check the value of function parameter only_positives. If it is true, we instantiate the macro by providing the value true for only_positives_m, otherwise, we instantiate the macro by providing false for only_positives_m. The compiler will automatically eliminate branches with dead code.

Automatic loop unswitching done by the compilers closely resembles the loop unswitching we did here.

How much does loop unswitching improve performance?

We wanted to test the effect of loop unswitching on performance of our code. For this, we created an example inspired by the codebase provided to us by one of our customers. The code consists of two functions array_calculation and single_calculation. Function array_calculation calls single_calculation in every iteration of the loop. Here is the source code of single_operation:

enum single_operation_e {
    ADD_SUB,
    MUL_DIV,
    SIN_COS,
    SQRT_ABS,
};

enum single_operation_e single_operation;

void single_calculation(int in1, int in2, float* out1, float* out2) {
    switch (single_operation) {
        case ADD_SUB:
            *out1 = in1 + in2;
            if (out2) {
                *out2 = in1 - in2;
            }
            break;
        case MUL_DIV:
            *out1 = in1 * in2;
            if (out2) {
                *out2 = in1 / (in2 + (in2 == 0));
            }
            break;
        case SIN_COS:
            *out1 = sinf(in1);
            if (out2) {
                *out2 = cosf(in1);
            }
            break;
        case SQRT_ABS:
            *out1 = abs(in1);
            if (out2) {
                *out2 = sqrt(abs(in2));
            }
            break;
    }
}

The function single_calculation modifies its behavior based on two variables: the single_operation (line 8) global variable, which has an enum type that determines what kind of operations we want to do, and the out2 (line 10) function parameter, to which the function will write an additional computed when its value is not NULL.

The source code of the function array_calculation looks like this:

enum array_operation_e {
    ADD,
    SUBTRACT,
    REPLACE,
};

enum array_operation_e array_operation;

void array_calculation(float* a, float* b, int n, bool set_b) {
    float r1, r2;
    for (int i = 0; i < n; i++) {
        single_calculation(i, i + 1, &r1, &r2);
        if (array_operation == ADD) {
            a[i] += r1;
            if (set_b) {
                b[i] += r2;
            }
        } else if (array_operation == SUBTRACT) {
            a[i] -= r1;
            if (set_b) {
                b[i] -= r2;
            }
        } else {
            a[i] = r1;
            if (set_b) {
                b[i] = r2;
            }
        }
    }
}

This function also behaves differently based on two variables: array_operation which is an enum variable defined in the global memory (line 7) and set_b which is a boolean function parameter. Note that if set_b is false, we are not using the value r2 returned by single_calculation (lines 15, 20 and 25).

It is possible to manually do loop unswitching by creating a different function for each possible combination of parametrization values. For example, we can create one function single_calculation_MUL_DIV_true which has a value MUL_DIV for single_operation and a variable out2 is not null, and another function single_calculation_ADD_SUB_false where single_operation is ADD_SUB and variable out2 is null. We remove all the dead code manually as well as all invariant conditions.

Sometimes, this can be the way to go. But often, this approach leads to unmaintainable code. In C, we can use macros to create a generalized function body that we later specialize with macro parameters. In the case of the single_calculation function, two parametrization variables single_operation and out2 would yield eight different versions of the function single_calculation. Here is how to do it with C macros:

#define SINGLE_CALCULATION(operation, set_out2)       \
    void single_calculation_##operation##_##set_out2( \
        int in1, int in2, float* out1, float* out2) { \
        switch (operation) {                          \
            case ADD_SUB:                             \
                *out1 = in1 + in2;                    \
                if (set_out2) {                       \
                    *out2 = in1 - in2;                \
                }                                     \
                break;                                \
            case MUL_DIV:                             \
                *out1 = in1 * in2;                    \
                if (set_out2) {                       \
                    *out2 = in1 / (in2 + (in2 == 0)); \
                }                                     \
                break;                                \
            case SIN_COS:                             \
                *out1 = sinf(in1);                    \
                if (set_out2) {                       \
                    *out2 = cosf(in1);                \
                }                                     \
                break;                                \
            case SQRT_ABS:                            \
                *out1 = abs(in1);                     \
                if (set_out2) {                       \
                    *out2 = sqrt(abs(in2));           \
                }                                     \
                break;                                \
        }                                             \
    }

SINGLE_CALCULATION(ADD_SUB, true)
SINGLE_CALCULATION(ADD_SUB, false)
SINGLE_CALCULATION(MUL_DIV, true)
SINGLE_CALCULATION(MUL_DIV, false)
SINGLE_CALCULATION(SIN_COS, true)
SINGLE_CALCULATION(SIN_COS, false)
SINGLE_CALCULATION(SQRT_ABS, true)
SINGLE_CALCULATION(SQRT_ABS, false)

The macro SINGLE_CALCULATION (lines 1-30) specifies a common body for all the later generated functions. At lines 32-39 we generate eight versions of the single_calculation function, each version having a different name. For example, the function with the name simple_calculation_ADD_SUB_true means that single_operation has a value ADD_SUB and set_out2 has a value true. Dead code elimination makes sure that unused switch branches (lines 11-28) and set_out2 condition check (line 7) are eliminated.

Function array_calculation passes parameters to single_calculation and therefore, we need to also make it parametrized in compile time. This is achieved similarly as with single_calculation:

#define ARRAY_CALCULATION(array_operation, single_operation, set_b)          \
    void array_calculation_##array_operation##_##single_operation##_##set_b( \
        float* a, float* b, int n) {                                         \
        float r1, r2;                                                        \
        for (int i = 0; i < n; i++) {                                        \
            single_calculation_##single_operation##_##set_b(i, i + 1, &r1,   \
                                                            &r2);            \
            if (array_operation == ADD) {                                    \
                a[i] += r1;                                                  \
                if (set_b) {                                                 \
                    b[i] += r2;                                              \
                }                                                            \
            } else if (array_operation == SUBTRACT) {                        \
                a[i] -= r1;                                                  \
                if (set_b) {                                                 \
                    b[i] -= r2;                                              \
                }                                                            \
            } else {                                                         \
                a[i] = r1;                                                   \
                if (set_b) {                                                 \
                    b[i] = r2;                                               \
                }                                                            \
            }                                                                \
        }                                                                    \
    }

ARRAY_CALCULATION(ADD, ADD_SUB, true)
ARRAY_CALCULATION(ADD, ADD_SUB, false)
ARRAY_CALCULATION(ADD, MUL_DIV, true)
ARRAY_CALCULATION(ADD, MUL_DIV, false)
ARRAY_CALCULATION(ADD, SIN_COS, true)
ARRAY_CALCULATION(ADD, SIN_COS, false)
ARRAY_CALCULATION(ADD, SQRT_ABS, true)
ARRAY_CALCULATION(ADD, SQRT_ABS, false)
ARRAY_CALCULATION(SUBTRACT, ADD_SUB, true)
ARRAY_CALCULATION(SUBTRACT, ADD_SUB, false)
ARRAY_CALCULATION(SUBTRACT, MUL_DIV, true)
ARRAY_CALCULATION(SUBTRACT, MUL_DIV, false)
ARRAY_CALCULATION(SUBTRACT, SIN_COS, true)
ARRAY_CALCULATION(SUBTRACT, SIN_COS, false)
ARRAY_CALCULATION(SUBTRACT, SQRT_ABS, true)
ARRAY_CALCULATION(SUBTRACT, SQRT_ABS, false)
ARRAY_CALCULATION(REPLACE, ADD_SUB, true)
ARRAY_CALCULATION(REPLACE, ADD_SUB, false)
ARRAY_CALCULATION(REPLACE, MUL_DIV, true)
ARRAY_CALCULATION(REPLACE, MUL_DIV, false)
ARRAY_CALCULATION(REPLACE, SIN_COS, true)
ARRAY_CALCULATION(REPLACE, SIN_COS, false)
ARRAY_CALCULATION(REPLACE, SQRT_ABS, true)
ARRAY_CALCULATION(REPLACE, SQRT_ABS, false)

The common body of all array_calculation functions is defined in macro ARRAY_CALCULATION (lines 1-25). The function behavior is modified using three variables: array_calculation, single_operation and set_b. Two of the variables are passed later to single_calculation: single_operation and set_b.

The compiler needs to generate twenty four versions of array_calculation (lines 27-50). Dead code elimination makes sure unused branches are removed. Note on line 6, that, for example, function array_calculation_REPLACE_MUL_DIV_true calls the function single_calculation_MUL_DIV_true. The value for variables single_operation which is MUL_DIV and for the set_out2 which is true are not passed to the function as function arguments; it is the name of the function (single_calculation_MUL_DIV_true) that marks what are the values of the two parameters inside.

Experiment results

Before talking about experiment results, a note: in our experiment, all the code was inside one compilation unit (a compilation unit is a single source file with all the headers inlined). This made it easier for the compiler to do the optimizations, especially when it comes to figuring out if a condition is loop invariant or not. In a real-world code base, the compiler will struggle to detect loop invariants involving several compilation units unless you enable Link-Time Optimizations (LTO), as explained in our previous post.

The performance of loop unswitching depends, as you will see later, on the compiler. Some compilers do this optimization more aggressively than others. We tested with three compilers: GCC 9.3.0, CLANG 10 and ICC 19.1. The experiments were executed on AMD Ryzen 7 4800H CPU with 16 cores and 16 GB of RAM memory on Ubuntu 20.04. We disabled processor frequency scaling in order to decrease runtime variance

The function array_calculation receives two arrays as arguments, array a and array b. The function iterates over both arrays simultaneously, and for each pair a[i] and b[i] it calls the function simple_calculation. Function simple_calculation returns two values. Depending on the value of parameter array_operation we either add to, subtract from or replace the original values a[i] and b[i]. If the parameter set_b is true, we set both values for a and b, otherwise we skip changing the value for b.

The function simple_calculation returns two outputs using pointers out1 and out2. If out2 is NULL, we skip the calculation needed for out2. The parameter simple_operation defines what kind of operation is going to be performed on the input. We have four values for this parameter: ADD_SUB (add and subtract), MUL_DIV (multiply and divide), SIN_COS (sine and cosine) and SQRT_ABS (square root and abs).

As far as computational intensity is concerned, we expect that the runtime will be the longest when simple_operation is SIN_COS, followed by MUL_DIV, SQRT_ABS and ADD_SUB. If parameter set_b is true, we expect that the runtime is longer than when its value is false. As far as array_calculation is concerned, we don’t expect a significant difference in runtimes since the computation is very light in all three cases (addition, subtraction and move).

We compared the runtimes of manual unswitching and automatic compiler unswitching. The full results for each possible parameter configuration are available in this link. Here we give a summary of the results.

The fastest speedup was 21.2 times, the slowest speedup was close to 1 (no speedup). There was no speedup in 8.3 % of the measured configurations and the average speedup was 4.8. We didn’t see any case where there was a performance regression (ie. slowdown) related manual loop unswitching. Here is the speedups by percentile for all three compilers together:

Percentile	Speedup
0.1	1.09x
0.25	1.80x
0.5	3.51x
0.75	6.9x
0.9	10.35x

Out of all measured samples, 90% had a speed up greater than 1.09, 75% had a speed up greater than 1.8, 50% had a speed up greater than 3.51, 25% had a speed up greater than 6.9 and 10% had a speed up greater than 10.35.

Loop unswitching alone does provide some performance increase. However, the most significant speedups achieved here are a consequence of the reduced complexity of the loops thanks to the unswitching. This opens the door to many other compiler optimizations that the compiler was unable to perform for the original, more complex, loop. One notable such optimization is vectorization, as it is known that vectorized code can be several times faster than its non-vectorized counterpart.

Efficacy of the loop unswitching per compiler

Percentiles per compiler give a picture of the compiler efficacy in doing loop unswitching automatically:

Percentile	GCC speedup	CLANG speedup	INTEL speedup
0.1	1.0x	1.1x	3.14x
0.25	1.15x	1.72x	3.90x
0.5	1.89x	2.47x	6.67x
0.75	7.25x	4.37x	8.54x
0.9	10.16x	4.73x	13.01x

GCC seems to be the best compiler when it comes to doing automatic loop unswitching. In 50% of the measured configurations the speed up was less than 1.89. Intel’s compiler seems to be the worst, in 90% of the cases the speed up was greater than 3.14. CLANG seems to generate the most efficient code with the default settings. In our measurements, only in 10% of the measured configurations the speedup was larger than 4.73.

Conclusion

Loop unswitching works by moving out loop invariant conditionals outside of the loop and generating a version of the loop for each possible condition value. We showed techniques to manually perform loop unswitching to make sure the compiler doesn’t skip them and we measured the performance benefits of manual loop unswitching, which experimentation proved to be substantial.

Even in a simple testing environment such as ours, it is clear that the compilers use loop unswitching very sparingly. The reason is understandable: the compilers need to balance between speed and binary size. Most of the time the compiler doesn’t know which loops are hot. If it would perform loop unswitching on all the loops, the binary size would grow too much. Therefore, most compilers will tend to avoid this optimization. However, this optimization is too important to miss out on your hot path and it is most likely worth performing manually.

Codee can help you with this, for instance through its PWR022: Move invariant conditional out of the loop to facilitate vectorization recommendation. This will hint you which loops have invariant conditions which can be extracted out of the loop. By carefully inspecting your hotspot for these you can get significant performance gains!

Build correct, secure, modern and fast Fortran, C and C++ scientific software

See our plans

Book a demo

Cookie	Type	Duration	Description
	0
__asc	0	30 minutes
__auc	0	1 year
__bs_id	0	1 year
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__lxGr__ses	0	15 minutes
__lxGr__var_654116	0	15 minutes
__lxGr__var_654122	0	15 minutes
__lxGr__var_654124	0	15 minutes
__lxGr__var_654130	0	15 minutes
__lxGr__var_654134	0	15 minutes
__lxGr__var_654146	0	15 minutes
__lxGr__var_654157	0	15 minutes
__lxGr__var_654161	0	15 minutes
__lxGr__var_654163	0	15 minutes
__lxGr__var_654165	0	15 minutes
__lxGr__var_654333	0	15 minutes
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
__stidv	0	1 year
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmt_onm	0	10 minutes
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_abck	0	1 year
_cb	0	1 year
_cb_ls	0	1 year
_cb_svref	0	30 minutes
_chartbeat2	0	1 year
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_ga_Y5Q8GRTQY9	0	2 years
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_bgs	0	1 minute
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gat_hearst	0	1 minute
_gat_tDelegDominio	0	1 minute
_gat_tDominio	0	1 minute
_gat_tRollupComscore	0	1 minute
_gat_tRollupDelegacion	0	1 minute
_gat_tRollupGlobal	0	1 minute
_gat_tRollupLvgTotal	0	1 minute
_gat_tRollupNivel1	0	1 minute
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
_li_ss	0	1 month

Cookie	Type	Duration	Description
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
cookielawinfo-checkbox-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given the consent to the usage of cookies under the category 'Necessary'.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given their consent to the usage of cookies under the category 'Non-Necessary'.
DSID	1	1 hour	To note specific user identity. Contains hashed/encrypted unique ID.
JSESSIONID	1		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	0		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
pmpro_visit	0		The cookie is set by the Paid Membership Pro plugin. The cookie is used to manage user memberships.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	0	1 hour	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Type	Duration	Description
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
AMP_TOKEN	0	1 hour	This cookie is set by Google Analytics - This cookie contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
audit	0	1 year	This cookie is set by Rubicon Project and is used for recording cookie consent data.
bcookie	0	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	0		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	0	1 day	This cookie is set by LinkedIn and used for routing.
mailchimp_landing_site	0	4 weeks	The cookie is set by the email marketing service MailChimp.
na_id	0	1 year	This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
ouid	0	1 year	The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
pid	0	1 year	Helps users identify the users and lets the users use twitter related features from the webpage they are visiting.
PugT	0	1 month	This cookie is set by pubmatic.com. The purpose of the cookie is to check when the cookies were last updated on the browser in order to limit the number of calls to the server-side cookie store.
sid	0		This cookie is very common and is used for session state management.
test_cookie	0	11 months	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
YSC	1		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Type	Duration	Description
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
ad-id	1	7 months	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content
demdex	0	5 months	This cookie is set under the domain demdex.net and is used by Adobe Audience Manager to help identify a unique visitor across domains.
GPS	0	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
pardot	0		The cookie is set when the visitor is logged in as a Pardot user.
tk_lr	0	1 year	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_or	0	5 years	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_r3d	0	3 days	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience

Cookie	Type	Duration	Description
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
ad-privacy	1	5 years	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content to the users.
ATN	1	2 years	The cookie is set by atdmt.com. The cookies stores data about the user behavior on multiple websites. The data is then used to serve relevant advertisements to the users on the website.
dpm	0	5 months	The cookie is set by demdex.net. This cookie assigns a unique ID to each visiting user that allows third-party advertisers target that users with relevant ads.
everest_g_v2	0	1 year	The cookie is set under eversttech.net domain. The purpose of the cookie is to map clicks to other events on the client's website.
fr	1	2 months	The cookie is set by Facebook to show relevant advertisements to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
khaos	0	1 year	This cookie is set by rubiconproject.com. The cookie is used to store user data in an anonymous form such as the IP address, geographical location, websites visited, and the ads clicked. The purpose of the cookie is to tailor the ads displayed to the users based on the users movement on other site in the same ad network.
ljt_reader	0	1 year	This is a Lijit Advertising Platform cookie. The cookie is used for recognizing the browser or device when users return to their site or one of their partner's site.
mako_uid	0	1 year	This cookie is set under the domain ps.eyeota.net. The cookies is used to collect data about the users' visit to the website such as the pages visited. The data is used to create a users' profile in terms of their interest and demographic. This data is used for targeted advertising and marketing.
mc	0	1 year	This cookie is associated with Quantserve to track anonymously how a user interact with the website.
NID	1	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
p2	0	1 week	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
personalization_id	0	2 years	This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
PUBMDCID	0	2 months	This cookie is set by pubmatic.com. The cookie stores an ID that is used to display ads on the users' browser.
si	0	5 years	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
TDCPM	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
TDID	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
test_cookie	0	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
tuuid	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
tuuid_lu	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
uid	0	1 month	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisements.
uuid	0	1 month	To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE	1	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Flexibility and Performance: turbocharging loop unswitching

Manual loop unswitching

How much does loop unswitching improve performance?

Experiment results

Efficacy of the loop unswitching per compiler

Conclusion

PRODUCT

COMPANY

SEARCH

NEWSLETTER

Manual loop unswitching

How much does loop unswitching improve performance?

Experiment results

Efficacy of the loop unswitching per compiler

Conclusion

Reader Interactions

Leave a Reply Cancel reply

Footer

PRODUCT

COMPANY

SEARCH

NEWSLETTER