Flexibility and performance: introduction to loop unswitching

Flexibility is very important in the development of software. A great deal of research and education is invested into creating flexible software. However, flexibility and performance don’t often come hand in hand, as most developers have already witnessed sometime in their careers.

One of the ways to achieve flexibility is to parametrize the behavior of a function using a parametrization variable. For instance, a boolean value can be passed to a function so that it changes its behavior depending on whether the boolean is true or false. Once set, the parametrization variable is later used in the code inside the condition, to steer the execution of the program in the desired direction. Here is an example:

int calculate_sum(int* a, int n, bool only_positives) {
   int sum = 0;
    for (int i = 0; i < n; i++) {
        if (only_positives) {
            if (a[i] > 0) {
                sum += a[i];
            }
        } else {
            sum += a[i];
        }
    }

    return sum;
}

In the above code snippet, if we set the variable only_positives to true we tell the function we want to calculate the sum only on positive values of the array a. Otherwise, we want to calculate the sum on all values of the array a. The problem of the above code is that the variable only_positives is evaluated over and over inside the loop even though it never changes its value. We call variables and expressions that never change their values inside the loop loop invariant. Needless evaluation of loop invariant conditions in the hot loop of your program can harm the good speed of your program.

In this post we will talk about parametrization and performance, i.e how parametrization variables in the hot code can sometimes ruin the good performance of your program. More specifically, we will talk about loop unswitching as a technique to speed up code with parameters, we will show how and when the compilers do this for us. In the upcoming post, we will show you how to force loop unswitching on the compiler. There, we will also make an experiment with a more complicated example inspired by the codebase of one of our customers to see how loop unswitching affects performance.

Loop unswitching

If you have spent a minute or two analyzing the previous snippet, an idea has surely crossed your mind: to create two versions of the above loop, one for the case where only_positive is true, and the other when it is false. The transformation looks something like this:

int calculate_sum(int* a, int n, bool only_positives) {
    int sum = 0;
    if (only_positives) {
        for (int i = 0; i < n; i++) {
            if (a[i] > 0) {
                sum += a[i];
            }
        }
    } else {
        for (int i = 0; i < n; i++) {
            sum += a[i];
        }
    }

    return sum;
}

We moved the conditional check on variable only_positives outside of the loop and created two copies of the original loop. One where only_positives is true (lines 4-8) and the other where only_positives is false (lines 10-12). If you look at the example attentively, you will see that the new loops are smaller and only contain code that is necessary for calculation.

The transformation of moving loop invariant conditions outside of the loop is called loop unswitching and a compiler with a high optimization level will surely perform it in this simple example.

The good thing about loop unswitching is that it opens the door to other compiler optimizations, especially vectorization. Vectorized code is in principle several times faster than its scalar counterpart, but vectorization often doesn’t pay off if there are conditional statements in the loop body.

Even though loop unswitching is great from the performance point of view, there is no guarantee that the compiler will perform it for your hot loop. Let’s investigate a few reasons why the compiler might decide not to do loop unswitching.

Binary size grows too much

As you have seen in the previous section, since the only_positives parameter has two possible values (true and false), the compiler has to create two copies of the loop. Our loop was small, so the compiler will probably do it (unless we disable optimizations that increase the binary size, such as compiler flag -Os with GCC, CLANG and ICC).

However, if the body of the loop were large, the compiler might decide, based on its own cost predictor, that the binary size would grow too much and opt not to perform the optimization. Imagine code that contains hundreds of loops that depend on a boolean condition. If it would unswitch all the loops, the binary size could grow by a factor of two. But only hot loops would actually benefit substantially from the optimization. Since the compiler doesn’t know which loops are more important than others, it might decide to skip the optimization.

The number of loop variants grows exponentially as a factor of the number of parametrization variables. For instance, if a loop had three parameterization variables, two booleans and an enum with three possible values, the perfect loop unswitching would have 2 times 2 times 3 cases which equals 12. Instead of one loop, we have 12 loops!

The compiler cannot guarantee loop invariance of the condition

What seems obvious to the developer is not necessarily obvious to the compiler. Sometimes, compilers cannot do loop unswitching automatically because they cannot guarantee that the parametrization variable inside the loop is actually loop invariant (i.e. it will evaluate to the same value for all the iterations of the loop).

When the parametrization variable is passed to the function by value (not as a pointer or by reference), a copy of the parametrization variable is created inside the function that is inaccessible to the outside of the function. This makes it easier for the compiler to guarantee that the parametrization variable will not change its value during the execution of the loop and can perform an effective loop unswitching.

Subscribe to our newsletter

and receive in-depth technical articles, white papers, videos, webinars, product announcements, and more.

If the parametrization variable is a global variable, a function parameter passed by reference or a regular or static data member of a class, then it becomes more difficult for the compiler to determine if the value is loop invariant. Our Codee tool helps promote best software development practices that can help in this situation such as the recommendation PWR001 which recommends avoiding the use of global variables.

Let’s look into more detail on a few reasons why the compiler might not be able to determine the invariance of the loop condition.

The problem of pointer aliasing

One reason why automatic loop unswitching can be more complicated is pointer aliasing. Consider the following source code:

#define ONLY_POSITIVES 21

bool settings[MAX_SETTINGS_COUNT];

…

void increment_array(int* a, int n) {
    for (int i = 0; i < n; i++) {
        if (settings[ONLY_POSITIVES]) {
            if (a[i] > 0) {
                a[i]++;
            }
        } else {
            a[i]++;
        }
    }
}

In this example, function is called increment_array and it increments elements of an array. If the parameter settings[ONLY_POSITIVES] is true, it increments only positive elements of the array, otherwise it increments all of them.

Now imagine, for some bizarre reason, somebody calls our function calculate_sum like this:

increment_array(settings, MAX_SETTINGS_COUNT);

When called like this, pointers a and settings point to the same array. Note that in C, there is no such thing as bool, bool is defined in header stdbool.h as another name for int. Since we are modifying array a, we are also modifying array settings. Pointer a and pointer settings alias each other. For this particular invocation of increment_array, the condition if (settings[ONLY_POSITIVES]) is not invariant and the compiler cannot unswitch the loop.

You might say that this is not the way how one should call increment_array, but the compiler sees things differently. The compiler must make sure that the function increment_array works properly even when used in this bizarre way, and will not perform loop unswitching (or it might perform it, but then it would have to make two copies of the loop, one for the pointer aliased version and the other without it).

A good approach is to never access variables which are not local to the current function (see recommendation PWR001). This can be achieved really simply, by creating a local on-stack copy of the variable inside the function. In our example, instead of always accessing settings[ONLY_POSITIVES]), we would keep a copy inside a local variable bool only_positives = settings[ONLY_POSITIVES]; This makes the job easier for the compiler to do the automatic loop unswitching.

Function calls and global memory

Let’s look at the code snippet from the beginning but with slight modifications:

bool only_positives;
...

int calculate_sum(int* a, int n) {
   int sum = 0;
    for (int i = 0; i < n; i++) {
        if (only_positives) {
            if (a[i] > 0) {
                sum += calculate(a[i]);
            }
        } else {
            sum += calculate(a[i]);
        }
    }

    return sum;
}

Let’s say, for flexibility sake, we move the actual calculation to a function calculate. So, instead of sum += a[i] we have sum += calculate(a[i]). We could define function calculate in several ways, e.g:

int calculate(int a) { return a; }
int calculate(int a) { return sin(a); }

In both cases the function calculate doesn’t modify the parameterization variable only_positives. If the compiler manages to inline the call to calculate in the loop body, it can do an in-place analysis, confirm this and perform the unswitching. But what happens if the compiler cannot inline function calculate. Imagine that calculate does the following:

int calculate(int a) { 
    if (a < 0) {
        only_positives = true;
    }
    return a;
}

This version of the function calculate modifies the value of the parametrization variable only_positives (since only_positives is in the global memory). The condition on only_positives is not loop-invariant anymore. The loop cannot be unswitched.

When the compiler cannot inline the function, it must assume that the function can possibly modify the state of the complete global memory, and therefore generate the version without loop unswitching.

One solution to this problem is to make the body of the function calculate available to the compiler during the compilation of calculate_sum. You can move the definition of calculate to the same compilation unit as calculate_sum, or make it available by defining it in the header.

Another approach would be to turn on Link Time Optimizations with your compiler using appropriate compiler switches. This would enable inlining between different compilation units, which in turn enabled loop unswitching when possible.

A third approach would be to mark the function calculate either as const, which means that its output depends only on the input and nothing else, or to define it as pure, which means that its output depends both on the input and the memory state, but the function doesn’t modify the content of the memory. Pure and const functions cannot modify the state of the global memory, therefore, they give guarantees that the parametrization variable won’t change during the execution of the program.

There is no portable way of marking functions with these parameters, each compiler has its own. On GCC and CLANG you would mark your function with __attribute__ ((const)) or __attribute__ ((pure)).

Conclusion

In this post we introduced loop unswitching, a technique the compilers use to speed up loops. We also talked about the obstacles to loop switching and ways to help the compiler do the unswitching automatically. Codee can help you avoid some of the pitfalls that prevent loop unswitching.

In the next post we will talk about how the developer can force loop unswitching on the compiler, we will demonstrate the performance obtained using loop unswitching with a more elaborate example and show how Codee can detect places in your code that have loops with invariant conditions and proposes ways to rewrite them in order to profit from automatic loop unswitching that the compilers provide.

Build correct, secure, modern and fast Fortran, C and C++ scientific software

See our plans

Book a demo

Cookie	Type	Duration	Description
	0
__asc	0	30 minutes
__auc	0	1 year
__bs_id	0	1 year
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__lxGr__ses	0	15 minutes
__lxGr__var_654116	0	15 minutes
__lxGr__var_654122	0	15 minutes
__lxGr__var_654124	0	15 minutes
__lxGr__var_654130	0	15 minutes
__lxGr__var_654134	0	15 minutes
__lxGr__var_654146	0	15 minutes
__lxGr__var_654157	0	15 minutes
__lxGr__var_654161	0	15 minutes
__lxGr__var_654163	0	15 minutes
__lxGr__var_654165	0	15 minutes
__lxGr__var_654333	0	15 minutes
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
__stidv	0	1 year
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmt_onm	0	10 minutes
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_abck	0	1 year
_cb	0	1 year
_cb_ls	0	1 year
_cb_svref	0	30 minutes
_chartbeat2	0	1 year
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_ga_Y5Q8GRTQY9	0	2 years
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_bgs	0	1 minute
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gat_hearst	0	1 minute
_gat_tDelegDominio	0	1 minute
_gat_tDominio	0	1 minute
_gat_tRollupComscore	0	1 minute
_gat_tRollupDelegacion	0	1 minute
_gat_tRollupGlobal	0	1 minute
_gat_tRollupLvgTotal	0	1 minute
_gat_tRollupNivel1	0	1 minute
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
_li_ss	0	1 month

Cookie	Type	Duration	Description
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
cookielawinfo-checkbox-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given the consent to the usage of cookies under the category 'Necessary'.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given their consent to the usage of cookies under the category 'Non-Necessary'.
DSID	1	1 hour	To note specific user identity. Contains hashed/encrypted unique ID.
JSESSIONID	1		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	0		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
pmpro_visit	0		The cookie is set by the Paid Membership Pro plugin. The cookie is used to manage user memberships.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	0	1 hour	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Type	Duration	Description
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
AMP_TOKEN	0	1 hour	This cookie is set by Google Analytics - This cookie contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
audit	0	1 year	This cookie is set by Rubicon Project and is used for recording cookie consent data.
bcookie	0	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	0		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	0	1 day	This cookie is set by LinkedIn and used for routing.
mailchimp_landing_site	0	4 weeks	The cookie is set by the email marketing service MailChimp.
na_id	0	1 year	This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
ouid	0	1 year	The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
pid	0	1 year	Helps users identify the users and lets the users use twitter related features from the webpage they are visiting.
PugT	0	1 month	This cookie is set by pubmatic.com. The purpose of the cookie is to check when the cookies were last updated on the browser in order to limit the number of calls to the server-side cookie store.
sid	0		This cookie is very common and is used for session state management.
test_cookie	0	11 months	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
YSC	1		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Type	Duration	Description
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
ad-id	1	7 months	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content
demdex	0	5 months	This cookie is set under the domain demdex.net and is used by Adobe Audience Manager to help identify a unique visitor across domains.
GPS	0	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
pardot	0		The cookie is set when the visitor is logged in as a Pardot user.
tk_lr	0	1 year	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_or	0	5 years	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_r3d	0	3 days	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience

Cookie	Type	Duration	Description
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
ad-privacy	1	5 years	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content to the users.
ATN	1	2 years	The cookie is set by atdmt.com. The cookies stores data about the user behavior on multiple websites. The data is then used to serve relevant advertisements to the users on the website.
dpm	0	5 months	The cookie is set by demdex.net. This cookie assigns a unique ID to each visiting user that allows third-party advertisers target that users with relevant ads.
everest_g_v2	0	1 year	The cookie is set under eversttech.net domain. The purpose of the cookie is to map clicks to other events on the client's website.
fr	1	2 months	The cookie is set by Facebook to show relevant advertisements to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
khaos	0	1 year	This cookie is set by rubiconproject.com. The cookie is used to store user data in an anonymous form such as the IP address, geographical location, websites visited, and the ads clicked. The purpose of the cookie is to tailor the ads displayed to the users based on the users movement on other site in the same ad network.
ljt_reader	0	1 year	This is a Lijit Advertising Platform cookie. The cookie is used for recognizing the browser or device when users return to their site or one of their partner's site.
mako_uid	0	1 year	This cookie is set under the domain ps.eyeota.net. The cookies is used to collect data about the users' visit to the website such as the pages visited. The data is used to create a users' profile in terms of their interest and demographic. This data is used for targeted advertising and marketing.
mc	0	1 year	This cookie is associated with Quantserve to track anonymously how a user interact with the website.
NID	1	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
p2	0	1 week	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
personalization_id	0	2 years	This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
PUBMDCID	0	2 months	This cookie is set by pubmatic.com. The cookie stores an ID that is used to display ads on the users' browser.
si	0	5 years	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
TDCPM	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
TDID	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
test_cookie	0	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
tuuid	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
tuuid_lu	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
uid	0	1 month	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisements.
uuid	0	1 month	To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE	1	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Flexibility and performance: introduction to loop unswitching

Loop unswitching

Binary size grows too much

The compiler cannot guarantee loop invariance of the condition

The problem of pointer aliasing

Function calls and global memory

Conclusion

PRODUCT

COMPANY

SEARCH

NEWSLETTER

Loop unswitching

Binary size grows too much

The compiler cannot guarantee loop invariance of the condition

The problem of pointer aliasing

Function calls and global memory

Conclusion

Reader Interactions

Leave a Reply Cancel reply

Footer

PRODUCT

COMPANY

SEARCH

NEWSLETTER