OPEN CATALOG

Best practices for performance

Perfect-loop nesting

In the context of program optimizations, perfect loop nesting happens when all the statements in a loop nest of two or more loops are in the innermost loop.

Consider following example:

for (int i = 0; i < n; i++) {
    for (int j = 0; i < n; i++) {
        c[i] += a[j] + b[j][i];
    }
}

In the above example of two nested loops, all the statements are in the innermost loop. We say that the loops are perfectly nested. Contrast this with the following example:

for (int i = 0; i < n; i++) {
    c[i] = 0.0;
    for (int j = 0; i < n; i++) {
        c[i] += a[j] + b[j][i];
    }
}

In this example, the loops are not perfectly nested because of the statement c[i] = 0.0 on line 2. 
Perfect-loop nesting is very important for loop optimizations, since it enables other optimization techniques like loop interchange or loop tiling.

Converting imperfectly-nested loops to perfectly nested #

As already mentioned, perfect loop nesting is a prerequisite for loop interchange. Luckily, many times imperfectly nested loops can be converted to perfectly nested. Here are three examples:

Example 1: Isolating statements that prevent perfect-loop nesting into a separate loop #

for (int i = 0; i < n; i++) {
    c[i] = 0.0;
    for (int j = 0; i < n; i++) {
        c[i] += a[j] + b[j][i];
    }
}

This loop can be made perfectly nested by extracting the statement c[i] = 0.0 on line 2 into a separate loop, like this:

for (int i = 0; i < n; i++) {
    c[i] = 0.0;
}

for (int i = 0; i < n; i++) {
    for (int j = 0; i < n; i++) {
        c[i] += a[j] + b[j][i];
    }
}

The loop nest on lines 5-10 is perfectly nested and can profit from other optimizations.

Example 2: Promoting temporary scalars to vectors to enable perfect loop nesting #

A second example of imperfectly nested loops:

for (int i = 0; i < n; i++) {
    double sum = 0.0;
    double dot = 0.0;
    for (int j = 0; i < n; i++) {
        sum += a[j] + b[j][i];
        dot += a[j] * b[i][j];
    }
    c[i] = dot / sum;
}

In this example, three statements on lines 2, 3 and 8 prevent perfect loop nesting. Conversion to perfect loop nesting is possible by promoting scalar variables sum and dot to arrays and splitting the loop into three smaller loops:

double* sum_arr = malloc(sizeof(double) * n);
double* dot_arr = malloc(sizeof(double) * n);

for (int i = 0; i < n; i++) {
    sum_arr[i] = 0.0;
    dot_arr[i] = 0.0;
}

for (int i = 0; i < n; i++) {
    for (int j = 0; i < n; i++) {
        sum_arr[i] += a[j] + b[j][i];
        dot_arr[i] += a[j] * b[i][j];
    }
}

for (int i = 0; i < n; i++) {
    c[i] = dot_arr[i] / sum_arr[i];
}
free(sum_arr);
free(dot_arr);

The loop nest on lines 9-14 is now perfectly nested and can be optimized further.

Example 3: #

The last example looks like this:

for (int i = 0; i < n; i++) {
   double* a_ptr = a + n * i;
   double* b_ptr = b + i;
   
    for (int j = 0; j < n; j++) {
        *a_ptr = *b_ptr;
         a_ptr++;
         b_ptr+=n;
    }
}

This example uses pointer-based notation to access the elements of the array. By converting the index-based notation, we can eliminate the statements on lines 2 and 3 preventing perfect loop nesting:

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        a[j + i * n] = b[i + j * n]; 
    }
}

By moving to the index based notations, we made the loops perfectly nested with the potential to enable other optimizations.

Resources

Loop interchange
Loop tiling