
This recommendation is part of the open catalog of best practice rules for performance that is automatically detected and reported by Codee.
Issue
Inefficient matrix access pattern detected that can be improved through loop interchange.
Actions
Interchange inner and outer loops in the loop nest.
Relevance
Inefficient memory access pattern and low locality of reference are among the main reasons for low performance on modern computer systems. Matrices are stored in a row-major order in C and column-major order in Fortran. Iterating over them column-wise (in C) and row-wise (in Fortran) is inefficient, because it uses the memory subsystem suboptimally.
Nested loops that iterate over matrices inefficiently can be optimized by applying loop interchange. Using loop interchange, the inefficient matrix access pattern is replaced with a more efficient one. Often, loop interchange enables vectorization of the innermost loop which additionally improves performance.
Note
Loop interchange can be performed only on perfectly nested loops, i.e. on loops where all the statements are in the body of the innermost loop. If the loops are not perfectly nested, it is often possible to make them perfectly nested through refactoring.
Code examples
The following code shows two nested loops:
void zero(double **A, int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
A[j][i] = 0.0;
}
}
}
The matrix A
is accessed column-wise, which is inefficient. To fix it, we perform the loop interchange of loops over i
and j
. After the interchange, the loop over j
becomes the outer loop and loop over i
becomes the inner loop.
After this modification, the access to matrix A
is no longer column-wise, but row-wise, which is much faster and more efficient. Additionally, the compiler can vectorize the inner loop.
void zero(double **A, int n) {
for (int j = 0; j < n; j++) {
for (int i = 0; i < n; i++) {
A[j][i] = 0.0;
}
}
Related resources
Related resources

Building performance into the code from day one with Codee