References: CPE380 Memories

The lecture slides as a PDF provide a good overview of everything.

The book also does a very good job explaining memory structures. Emphasis here should be on cache and TLB concepts, because these are the things that are truly hardware... you'll see more about demand-paged virtual memory if you take an OS course. Here's a quick outline for what you should have a basic understanding of:

So, how much does memory access order matter? Consider the following program:

volatile double a[N][N];

main()
{
        register int i, j;

a:        for (i=0; i<N; ++i) {
b:	        for (j=0; j<N; ++j) {
        	        a[i][j] = 0;
	        }
        }
}

Let's compile four versions of this for N=4096. The first version, x0, is as shown above. The second version, x1, swaps lines a: and b:. The third and fourth versions, x2 and x3, are like the first two versions, except in that the loops run backwards, using for (i=N-1; i>=0; --i) { and for (j=N-1; j>=0; --j) {. On a rather old AMD Athlon 64 3200+ processor, a somewhat newer Intel i7-8700 3.2GHz processor, and a modern AMD Ryzen 9 6900HX (a lower-power mobile processor, whereas the other two are desktops), the user times are:
Program Athlon 64 3200+ User Time (seconds) i7-8700 @ 3.2GHz User Time (seconds) Ryzen 9 6900HX User Time (seconds)
x0 0.136 0.024 0.003
x1 1.652 0.126 0.135
x2 0.148 0.024 0.003
x3 1.632 0.138 0.122

In summary, the best memory access order was 12X faster than the worst one on the older processor. It didn't make as much difference on the i7-8700 processor, but was still 5.75X faster. On the newest processor, the Ryzen 9, the best memory pattern ran a very scary 45X faster! There is a lot going on even in this simple test, but a couple of points here are:

However, there's another way to interpret these performance numbers. The AMD Athlon 64 3200+ was released in 2003. The Intel i7-8700 3.2GHz was released in 2018 and the Ryzen 9 6900HX in 2022. Not surprisingly, the newer processors are much faster than the old one using the same version of the code. Very surprisingly, the best layout using the 19-years older processor was STILL COMPETITIVE with the worst access order using the newer processor!


CPE380 Computer Organization and Design.