References: EE380 Memories

The book does a very good job of explaining memory structures; read it. Emphasis here should be on cache and TLB concepts, because these are the things that are truly hardware... you'll see more about demand-paged virtual memory if you take an OS course. Here's a quick outline for what you should have a basic understanding of:

So, how much does memory access order matter? Consider the following program:

volatile double a[N][N];

main()
{
        register int i, j;

a:        for (i=0; i<N; ++i) {
b:	        for (j=0; j<N; ++j) {
        	        a[i][j] = 0;
	        }
        }
}

Let's compile four versions of this for N=4096. The first version, x0, is as shown above. The second version, x1, swaps lines a: and b:. The third and fourth versions, x2 and x3, are like the first two versions, except in that the loops run backwards, using for (i=N-1; i>=0; --i) { and for (j=N-1; j>=0; --j) {. On an Athlon 64 3200+ processor, the user times are:
Program User Time (seconds)
x0 0.136
x1 1.652
x2 0.148
x3 1.632

In summary, the best memory access order was 12X faster than the worst one!


EE380 Computer Organization and Design.