Spring 2021 EE380 Assignment 5 Solution

For this question, check all that apply. Which of the following things would you expect to find within the processor chip for a multi-core laptop processor?
TLB
L1 Cache
L2 Cache
L3 Cache
Main memory
Not in a laptop, but in some SOCs (system on a chip)
For this question, check all that apply. Which one of the following four statements about the memory hierarchy are true?
Larger capacity caches tend to be slower
Larger cache line sizes take better advantage of Spatial Locality
Modern processors often have separate caches for instructions and data
Temporal Locality refers to an object being likely to be referenced again soon after being referenced once
For comparable cache size, a direct mapped cache is easier to build (simpler logic) than set associative cache
For this question, check all that apply. Remember this diagram of the AMD Athlon? According to the diagram, which of the following techniques is used in this design?

Fully associative cache
Nothing here suggests that
History-based branch prediction
Separate L1 caches for code and data
Superscalar execution of integer arithmetic
Instruction scheduling with register renaming
As discussed in class, the renaming is obvious given the huge number of FP registers
Suppose that a simple system has a single cache with an access time of 1 clock cycle. Cache misses are satisfied with an average memory latency of 200 clock cycles. Assuming a cache hit ratio of about 0.99 (99%), roughly how long does the average reference take? Show the formula that would give the answer.
99% means for 100 refs, (99*1)+(1*200)=299 cycles; thus, 2.99 (i.e., about 3) cycles/reference
Given how modern memory systems work, and assuming N is a big number, which of the following would you expect to run faster or would they be about the same? Be sure to explain your reasoning why. Choice A:
```
struct { int a, b, c; } abc[N];
for (int i=0; i<N; ++i) { abc[i].a = 0; }
```
Or Choice B:
```
int a[N]; int b[N]; int c[N];
for (int i=0; i<N; ++i) { a[i] = 0; }
```
Both loops touch the same data, but in B it's all contiguous whereas in A it's spaced every third int. Thus, A references about 3X as many lines and will run slower. It might even need to use more TLB entries.
Which one of the following three I/O mechanisms would be most appropriate for a desktop PC to use in reading keystrokes from a keyboard?
Polling
Wastes too much processor time
Interrupts
DMA
Not much data to move, so no need for DMA
For this question, mark all answers that apply.
Which of the following statements about the memory hierarchy are true?
The address used to search the L2 cache is usually a physical memory address
L1 and lower are usually physical addresses
It is possible to suffer a TLB miss for a reference to a datum that is already in cache
Fewer TLB entries than cache buckets implies this is possible
If a program repeatedly accesses the same few variables, it has high temporal locality
Pretty much the definition...
LRU is a common replacement policy; it replaces the line that hasn't been accessed for the longest time
Often an LRU approximation, but yes
The content of a dirty line is potentially different from that of the same address in lower levels of the memory hierarchy
Again, pretty much the definition
For this question, mark all answers that apply according to the following MIPS pipeline diagram:

Consider executing the following code MIPS sequence:
```
A:	andi	$t1, $t0, 47
B:	and	$t3, $t2, $t1
C:	andi	$t4, $t0, 1541
D:	sw	$t4, 3276($t5)
E:	xor	$t0, $t5, $t2
F:	lw	$t1, 6356($t5)
```
This code is to be executed on a pipelined MIPS implementation like that shown in the reference diagram. Unless stated otherwise, assume value forwarding is not implemented. Which of the following statements are true?
There is a true dependence (RAW) between instructions A and B
On $t1
There is an output dependence (WAW) between instructions D and F
Nope - don't write the same regs nor memory locations
Adding value forwarding to the pipeline would result in no pipeline bubbles for this code
Only lw is a problem forwarding doesn't fix
Without value forwarding, the code would execute in less time if instruction C were moved to between A and B
Remember that dependence between A and B?
As written, instruction E couldn't move to before C, but it could if we renamed register $t0 with $t6 in instruction E
Classic fix for WAR

Computer Organization and Design.