Spring 2019 EE380 Assignment 2 Sample Solution

A particular program expressed in a particular ISA executes 50 ALU instructions, 8 Loads, 3 Stores, and 2 Branches. A simple, non-pipelined, implementation of that ISA takes 10 CPI for each ALU instruction, 50 CPI for each Load, 20 CPI for each Store, and 20 CPI for each Branch. The original clock period is 5ns. How many clock cycles would the program take to execute? How many microseconds would the program take to execute?
# CPI ALU 50 10 Load 8 50 Store 3 20 Branch 2 20 (50*10)+(8*50)+(3*20)+(2*20) = 500+400+60+40 = 1000 cycles 1000 cycles @ 5ns = 5000ns = 5us
For this question, check all that apply. Given the circumstances described in question 1 above, which of the following changes by itself would yield at least 2X speedup?
Adding a cache memory reduces both Load and Store to 10 CPI
(50*10)+(8*10)+(3*10)+(2*20) = 650 cycles
An improved design reduces the CPI for ALU instructions from 10 to 5
Can't get 2X overall improving only part by 2X...
A new compiler reduces the number of ALU instructions from 50 to 10
ALU goes from (50*10) to (10*10), saves 400 cycles
The algorithm is redesigned, eliminating 30 ALU instructions and 6 Load instructions
(20*10)+(2*50)+(3*20)+(2*20) = 400 cycles
New VLSI fabrication technology changes the clock frequency to 500MHz, but Loads now take 80 CPI each
500MHz is 2ns period; Loads add (8*(80-50)) = 240 cycles; 1240 cycles @ 2ns = 2480ns
For this question, check all that apply. Which of the following statements about performance is/are true?
In computing, more FLOPS is a good thing
Shortest-job-first is a scheduling method to improve throughput
Many processors now have a performance register that counts clock ticks
The time the processor spends executing your program's instructions is called the user time
For any two processors, even executing different ISAs, the one with the faster clock rate will complete a given program in less time
Somtimes, a synthetic benchmark will have significantly different performance from the program it models. Which do you think is more common: the synthetic benchmark performs better or worse than the real program? Why?
Usually, synthetic benchmark outperforms the real program. Reason is that more compiler optimizations can be applied. It's hard to write synthetic code that can't be optimized.
You have written a program that currently takes 1 hour (60 minutes) to run on a high-end desktop computer: reading the data from the input file takes 4 minutes, the computation takes 54 minutes, and writing the output file takes 2 minutes. On a large enough parallel supercomputer, you can speed-up the computation a lot... but not the file I/O. According to Amdahl's law, what is the maximum speedup you could hope to get by using a parallel supercomputer?
60 minutes: 4 input, 54 compute, 2 output Input and output not affected, but compute approaches 0. Maximum possible speedup is (4+54+2)/(4+2) = 10X
With all those self-driving taxis scooting around, it would be nice if you could hail one just by giving the usual hand signal as it approaches -- rather than having to use a cell phone to make the pick-up request. You would like to build a system using a camera and quite a lot of image processing computation to recognize when a human is hailing. The currently best processor you can buy isn't fast enough to do this computation before the taxi has driven past the person who wanted a ride. Fortunately, your business plan says you don't need to ship your first product for two years. How would you go about figuring-out if fast enough versions of that processor will be available in two years when your product needs to ship? (Note: there are lots of valid answers to this question.)
You're basically guessing at what computer tech will be viable two years from now. From Top500.Org, we know high-end machines get about 2X faster every year... but that doesn't necessarily mean your code will run 4X faster in 2 years. A more detailed estimate could be obtained by benchmarking on some existing machines and looking at the product roadmaps to see what the manufacturers expect the new versions of those machines will look like in two years.

Computer Organization and Design.