In modern computer systems, the memory subsystem is often the bottleneck. Understanding the way the memory subsystem works and adjusting a piece of code to accommodate it can yield significant performance improvements.
The memory subsystem consists of several levels of data cache, intended to speed up access to the commonly used data. The data cache is smaller than the main memory and therefore can only hold a part of the data. Access to the data cache is much faster than access to the main memory.
The data caches are built on assumptions that programs behave in the following ways:
- Temporal locality: If the CPU has accessed a certain memory location, there is a high probability that it will access it again in the near future. Using the same values in different loop iterations is an example of temporal locality.
- Spatial locality: If the CPU has accessed a certain memory location, there is a high probability that it will access its neighboring locations in the near future. Iterating through an array is an example of spatial locality.
The common name for both assumptions is locality of reference.
The data that hasn’t been accessed for some time by the CPU is evicted from the cache to make room for the new data. It is therefore important for a piece of code to reuse the data while in the data cache and before it gets evicted if aiming for optimum performance.