Spring 2019 EE380 Assignment 2 Sample Solution

  1. A particular program expressed in a particular ISA executes 50 ALU instructions, 8 Loads, 3 Stores, and 2 Branches. A simple, non-pipelined, implementation of that ISA takes 10 CPI for each ALU instruction, 50 CPI for each Load, 20 CPI for each Store, and 20 CPI for each Branch. The original clock period is 5ns. How many clock cycles would the program take to execute? How many microseconds would the program take to execute?
  2. For this question, check all that apply. Given the circumstances described in question 1 above, which of the following changes by itself would yield at least 2X speedup?
    Adding a cache memory reduces both Load and Store to 10 CPI
    (50*10)+(8*10)+(3*10)+(2*20) = 650 cycles
    An improved design reduces the CPI for ALU instructions from 10 to 5
    Can't get 2X overall improving only part by 2X...
    A new compiler reduces the number of ALU instructions from 50 to 10
    ALU goes from (50*10) to (10*10), saves 400 cycles
    The algorithm is redesigned, eliminating 30 ALU instructions and 6 Load instructions
    (20*10)+(2*50)+(3*20)+(2*20) = 400 cycles
    New VLSI fabrication technology changes the clock frequency to 500MHz, but Loads now take 80 CPI each
    500MHz is 2ns period; Loads add (8*(80-50)) = 240 cycles; 1240 cycles @ 2ns = 2480ns
  3. For this question, check all that apply. Which of the following statements about performance is/are true?
    In computing, more FLOPS is a good thing
    Shortest-job-first is a scheduling method to improve throughput
    Many processors now have a performance register that counts clock ticks
    The time the processor spends executing your program's instructions is called the user time
    For any two processors, even executing different ISAs, the one with the faster clock rate will complete a given program in less time
  4. Somtimes, a synthetic benchmark will have significantly different performance from the program it models. Which do you think is more common: the synthetic benchmark performs better or worse than the real program? Why?
  5. You have written a program that currently takes 1 hour (60 minutes) to run on a high-end desktop computer: reading the data from the input file takes 4 minutes, the computation takes 54 minutes, and writing the output file takes 2 minutes. On a large enough parallel supercomputer, you can speed-up the computation a lot... but not the file I/O. According to Amdahl's law, what is the maximum speedup you could hope to get by using a parallel supercomputer?
  6. With all those self-driving taxis scooting around, it would be nice if you could hail one just by giving the usual hand signal as it approaches -- rather than having to use a cell phone to make the pick-up request. You would like to build a system using a camera and quite a lot of image processing computation to recognize when a human is hailing. The currently best processor you can buy isn't fast enough to do this computation before the taxi has driven past the person who wanted a ride. Fortunately, your business plan says you don't need to ship your first product for two years. How would you go about figuring-out if fast enough versions of that processor will be available in two years when your product needs to ship? (Note: there are lots of valid answers to this question.)


EE380 Computer Organization and Design.