The lecture slides as a PDF provide a good overview of everything.
The book also does a very good job explaining memory structures. Emphasis here should be on cache and TLB concepts, because these are the things that are truly hardware... you'll see more about demand-paged virtual memory if you take an OS course. Here's a quick outline for what you should have a basic understanding of:
So, how much does memory access order matter? Consider the following program:
volatile double a[N][N]; main() { register int i, j; a: for (i=0; i<N; ++i) { b: for (j=0; j<N; ++j) { a[i][j] = 0; } } }
Let's compile four versions of this for N=4096. The first version, x0, is as shown above. The second version, x1, swaps lines a: and b:. The third and fourth versions, x2 and x3, are like the first two versions, except in that the loops run backwards, using for (i=N-1; i>=0; --i) { and for (j=N-1; j>=0; --j) {. On a rather old AMD Athlon 64 3200+ processor, a somewhat newer Intel i7-8700 3.2GHz processor, and a modern AMD Ryzen 9 6900HX (a lower-power mobile processor, whereas the other two are desktops), the user times are:
Program | Athlon 64 3200+ User Time (seconds) | i7-8700 @ 3.2GHz User Time (seconds) | Ryzen 9 6900HX User Time (seconds) |
---|---|---|---|
x0 | 0.136 | 0.024 | 0.003 |
x1 | 1.652 | 0.126 | 0.135 |
x2 | 0.148 | 0.024 | 0.003 |
x3 | 1.632 | 0.138 | 0.122 |
In summary, the best memory access order was 12X faster than the worst one on the older processor. It didn't make as much difference on the i7-8700 processor, but was still 5.75X faster. On the newest processor, the Ryzen 9, the best memory pattern ran a very scary 45X faster! There is a lot going on even in this simple test, but a couple of points here are:
However, there's another way to interpret these performance numbers. The AMD Athlon 64 3200+ was released in 2003. The Intel i7-8700 3.2GHz was released in 2018 and the Ryzen 9 6900HX in 2022. Not surprisingly, the newer processors are much faster than the old one using the same version of the code. Very surprisingly, the best layout using the 19-years older processor was STILL COMPETITIVE with the worst access order using the newer processor!