Flexibility is very important in the development of software. A great deal of research and education is invested into creating flexible software. However, flexibility and performance don’t often come hand in hand, as most developers have already witnessed sometime in their careers.
One of the ways to achieve flexibility is to parametrize the behavior of a function using a parametrization variable. For instance, a boolean value can be passed to a function so that it changes its behavior depending on whether the boolean is true or false. Once set, the parametrization variable is later used in the code inside the condition, to steer the execution of the program in the desired direction. Here is an example:
int calculate_sum(int* a, int n, bool only_positives) {
int sum = 0;
for (int i = 0; i < n; i++) {
if (only_positives) {
if (a[i] > 0) {
sum += a[i];
}
} else {
sum += a[i];
}
}
return sum;
}
In the above code snippet, if we set the variable only_positives
to true we tell the function we want to calculate the sum only on positive values of the array a
. Otherwise, we want to calculate the sum on all values of the array a
. The problem of the above code is that the variable only_positives
is evaluated over and over inside the loop even though it never changes its value. We call variables and expressions that never change their values inside the loop loop invariant. Needless evaluation of loop invariant conditions in the hot loop of your program can harm the good speed of your program.
In this post we will talk about parametrization and performance, i.e how parametrization variables in the hot code can sometimes ruin the good performance of your program. More specifically, we will talk about loop unswitching as a technique to speed up code with parameters, we will show how and when the compilers do this for us. In the upcoming post, we will show you how to force loop unswitching on the compiler. There, we will also make an experiment with a more complicated example inspired by the codebase of one of our customers to see how loop unswitching affects performance.

Subscribe to our newsletter
and get the latest tips and best practices from experts in software performance.
Loop unswitching
If you have spent a minute or two analyzing the previous snippet, an idea has surely crossed your mind: to create two versions of the above loop, one for the case where only_positive
is true, and the other when it is false. The transformation looks something like this:
int calculate_sum(int* a, int n, bool only_positives) {
int sum = 0;
if (only_positives) {
for (int i = 0; i < n; i++) {
if (a[i] > 0) {
sum += a[i];
}
}
} else {
for (int i = 0; i < n; i++) {
sum += a[i];
}
}
return sum;
}
We moved the conditional check on variable only_positives
outside of the loop and created two copies of the original loop. One where only_positives
is true (lines 4-8) and the other where only_positives
is false (lines 10-12). If you look at the example attentively, you will see that the new loops are smaller and only contain code that is necessary for calculation.
The transformation of moving loop invariant conditions outside of the loop is called loop unswitching and a compiler with a high optimization level will surely perform it in this simple example.
The good thing about loop unswitching is that it opens the door to other compiler optimizations, especially vectorization. Vectorized code is in principle several times faster than its scalar counterpart, but vectorization often doesn’t pay off if there are conditional statements in the loop body.
Even though loop unswitching is great from the performance point of view, there is no guarantee that the compiler will perform it for your hot loop. Let’s investigate a few reasons why the compiler might decide not to do loop unswitching.
Binary size grows too much
As you have seen in the previous section, since the only_positives
parameter has two possible values (true
and false
), the compiler has to create two copies of the loop. Our loop was small, so the compiler will probably do it (unless we disable optimizations that increase the binary size, such as compiler flag -Os
with GCC, CLANG and ICC).
However, if the body of the loop were large, the compiler might decide, based on its own cost predictor, that the binary size would grow too much and opt not to perform the optimization. Imagine code that contains hundreds of loops that depend on a boolean condition. If it would unswitch all the loops, the binary size could grow by a factor of two. But only hot loops would actually benefit substantially from the optimization. Since the compiler doesn’t know which loops are more important than others, it might decide to skip the optimization.
The number of loop variants grows exponentially as a factor of the number of parametrization variables. For instance, if a loop had three parameterization variables, two booleans and an enum with three possible values, the perfect loop unswitching would have 2 times 2 times 3 cases which equals 12. Instead of one loop, we have 12 loops!
The compiler cannot guarantee loop invariance of the condition
What seems obvious to the developer is not necessarily obvious to the compiler. Sometimes, compilers cannot do loop unswitching automatically because they cannot guarantee that the parametrization variable inside the loop is actually loop invariant (i.e. it will evaluate to the same value for all the iterations of the loop).
When the parametrization variable is passed to the function by value (not as a pointer or by reference), a copy of the parametrization variable is created inside the function that is inaccessible to the outside of the function. This makes it easier for the compiler to guarantee that the parametrization variable will not change its value during the execution of the loop and can perform an effective loop unswitching.
If the parametrization variable is a global variable, a function parameter passed by reference or a regular or static data member of a class, then it becomes more difficult for the compiler to determine if the value is loop invariant. Our Codee tool helps promote best software development practices that can help in this situation such as the recommendation PWR001 which recommends avoiding the use of global variables.

Building performance into the code from day one with Codee
Request a demo ›
Let’s look into more detail on a few reasons why the compiler might not be able to determine the invariance of the loop condition.
The problem of pointer aliasing
One reason why automatic loop unswitching can be more complicated is pointer aliasing. Consider the following source code:
#define ONLY_POSITIVES 21
bool settings[MAX_SETTINGS_COUNT];
…
void increment_array(int* a, int n) {
for (int i = 0; i < n; i++) {
if (settings[ONLY_POSITIVES]) {
if (a[i] > 0) {
a[i]++;
}
} else {
a[i]++;
}
}
}
In this example, function is called increment_array
and it increments elements of an array. If the parameter settings[ONLY_POSITIVES]
is true, it increments only positive elements of the array, otherwise it increments all of them.
Now imagine, for some bizarre reason, somebody calls our function calculate_sum
like this:
increment_array(settings, MAX_SETTINGS_COUNT);
When called like this, pointers a
and settings
point to the same array. Note that in C, there is no such thing as bool
, bool
is defined in header stdbool.h
as another name for int
. Since we are modifying array a
, we are also modifying array settings
. Pointer a
and pointer settings
alias each other. For this particular invocation of increment_array
, the condition if (settings[ONLY_POSITIVES])
is not invariant and the compiler cannot unswitch the loop.
You might say that this is not the way how one should call increment_array
, but the compiler sees things differently. The compiler must make sure that the function increment_array
works properly even when used in this bizarre way, and will not perform loop unswitching (or it might perform it, but then it would have to make two copies of the loop, one for the pointer aliased version and the other without it).
A good approach is to never access variables which are not local to the current function (see recommendation PWR001). This can be achieved really simply, by creating a local on-stack copy of the variable inside the function. In our example, instead of always accessing settings[ONLY_POSITIVES])
, we would keep a copy inside a local variable bool only_positives = settings[ONLY_POSITIVES];
This makes the job easier for the compiler to do the automatic loop unswitching.
Function calls and global memory
Let’s look at the code snippet from the beginning but with slight modifications:
bool only_positives;
...
int calculate_sum(int* a, int n) {
int sum = 0;
for (int i = 0; i < n; i++) {
if (only_positives) {
if (a[i] > 0) {
sum += calculate(a[i]);
}
} else {
sum += calculate(a[i]);
}
}
return sum;
}
Let’s say, for flexibility sake, we move the actual calculation to a function calculate
. So, instead of sum += a[i]
we have sum += calculate(a[i])
. We could define function calculate
in several ways, e.g:
int calculate(int a) { return a; }
int calculate(int a) { return sin(a); }
In both cases the function calculate
doesn’t modify the parameterization variable only_positives
. If the compiler manages to inline the call to calculate
in the loop body, it can do an in-place analysis, confirm this and perform the unswitching. But what happens if the compiler cannot inline function calculate
. Imagine that calculate
does the following:
int calculate(int a) {
if (a < 0) {
only_positives = true;
}
return a;
}
This version of the function calculate
modifies the value of the parametrization variable only_positives
(since only_positives
is in the global memory). The condition on only_positives
is not loop-invariant anymore. The loop cannot be unswitched.
When the compiler cannot inline the function, it must assume that the function can possibly modify the state of the complete global memory, and therefore generate the version without loop unswitching.
Don’t miss our next blog post. ✉ Subscribe!
One solution to this problem is to make the body of the function calculate
available to the compiler during the compilation of calculate_sum
. You can move the definition of calculate
to the same compilation unit as calculate_sum
, or make it available by defining it in the header.
Another approach would be to turn on Link Time Optimizations with your compiler using appropriate compiler switches. This would enable inlining between different compilation units, which in turn enabled loop unswitching when possible.
A third approach would be to mark the function calculate
either as const, which means that its output depends only on the input and nothing else, or to define it as pure, which means that its output depends both on the input and the memory state, but the function doesn’t modify the content of the memory. Pure and const functions cannot modify the state of the global memory, therefore, they give guarantees that the parametrization variable won’t change during the execution of the program.
There is no portable way of marking functions with these parameters, each compiler has its own. On GCC and CLANG you would mark your function with __attribute__ ((const))
or __attribute__ ((pure))
.
Conclusion
In this post we introduced loop unswitching, a technique the compilers use to speed up loops. We also talked about the obstacles to loop switching and ways to help the compiler do the unswitching automatically. Codee can help you avoid some of the pitfalls that prevent loop unswitching.
In the next post we will talk about how the developer can force loop unswitching on the compiler, we will demonstrate the performance obtained using loop unswitching with a more elaborate example and show how Codee can detect places in your code that have loops with invariant conditions and proposes ways to rewrite them in order to profit from automatic loop unswitching that the compilers provide.

Building performance into the code from day one with Codee
Leave a Reply