EE380 Assignment 2 Solution


  1. A particular program expressed in a particular ISA executes 200 ALU instructions, 10 Loads, 16 Stores, and 4 Branches. A simple, non-pipelined, implementation of that ISA takes 8 CPI for each ALU instruction, 20 CPI for each load, 10 CPI for each Store, and 10 CPI for each Branch. The original clock frequency is 2GHz. How many clock cycles would the program take to execute? How many microseconds would the program take to execute?
  2. For this question, check all that apply. Given the circumstances described in question 1 above, which of the following changes by itself would yield at least 2X speedup?
    A clever compiler is able to eliminate all the Branch instructions
    An improved ALU design reduces ALU instruction CPI from 8 to 2
    This yields 200*2+10*20+16*10+4*10=800 cycles
    Rewriting the program reduces the number of ALU instructions to 100
    Adding a cache reduces Load CPI from 20 to 5 and Store CPI from 10 to 5
    New VLSI fabrication technology halves the clock period, but doesn't change memory speed so Load takes 40 CPI
  3. For this question, check all that apply. Which of the following statements about performance is/are true?
    In computing, more FLOPS is a good thing
    Shortest-job-first is a scheduling method to improve throughput
    For any given application, the SPEC benchmarks are the best predictor of performance
    The time the processor spends executing your program's instructions is called the user time
    For two processors implementing the same ISA, the one with the faster clock rate will complete a given program in less time
  4. Somtimes, a synthetic benchmark will have significantly different performance from the program it models. Which do you think is more common: the synthetic benchmark performs better or worse than the real program? Why?
  5. You have written a program that currently takes 1 hour (60 minutes) to run on a high-end desktop computer: reading the data from the input file takes just one minute, the computation takes 57 minutes, and writing the output file takes two minutes. On a large enough parallel supercomputer, you can speed-up the computation a lot... but not the file I/O. According to Amdahl's law, what is the maximum speedup you could hope to get by using a parallel supercomputer?
  6. With all those self-driving taxis scooting around, it would be nice if you could hail one just by giving the usual hand signal as it approaches -- rather than having to use a cell phone to make the pick-up request. You would like to build a system using a camera and quite a lot of image processing computation to recognize when a human is hailing. The currently best processor you can buy isn't fast enough to do this computation before the taxi has driven past the person who wanted a ride. Fortunately, your business plan says you don't need to ship your first product for two years. How would you go about figuring-out if fast enough versions of that processor will be available in two years when your product needs to ship? (Note: there are lots of valid answers to this question.)


EE380 Computer Organization and Design.