Best practices for performance


Parallelization is a process of solving a computer problem by dividing it into independent parts, solving the parts independently, and finally, merging the parts into a final solution.

Consider the following example:

double sum = 0.0;
for (int i = 0; i < 1000; i++) {
    sum += a[i];

This code iterates from 0 to 999 and calculates the sum of elements of array a. This problem can be solved in parallel. For example, we can split the problem into four smaller subproblems, each smaller subproblem working on a part of the array a which is 250 elements in size. The computer can independently calculate the sum for each part of the array a, and when this is done, combine the independently calculated sums into the final sum.

Not all problems can be parallelized easily, and the biggest obstacle to parallelization are loop-carried dependencies.

There are several different types of hardware that can be used to speed up computation through parallelization. The most famous are:

  • Vectorization: special unit of the CPU core that can process more than one piece of data at the time.
  • Multithreading: distributing the parts of the problem to several CPU cores for individual processing
  • Offloading: utilizing special massively-parallel hardware architectures to solve the problems. These architectures are specialized hardware, the most famous being GPUs.