EE380 Assignment 1 Solution

  1. For this question, check all that apply. Which of the following statements is/are true?
    Allowing a few simultaneous writers to a bus is easy
    DRAM usually takes less power per bit held than SRAM
    A multiplexor can be built using a decoder and tri-state drivers
    The MFC signal indicates no space is left in main memory (memory full capacity)
    no, it says a pending memory fetch has completed
    Zin only makes sense in a state where the ALU is doing something (e.g., ALUadd)
  2. Given this processor hardware design, add control states to the following to implement an XOR-with-immediate instruction (as decoded by the when below), such that xori $rt,$rs,immed yields rt=(rs^immed). This is actually a MIPS instruction, as we'll discuss later. Hint: it's a lot like the addi given to you, isn't it? You should add initial values and test your design using the simulator before submitting it here.

    You can test your code with:

    MEM[0]={xori}+rs(9)+rt(10)+immed(13)
    MEM[4]=0
    $9=39
    

    Register $10 should end-up holding the value 0x0000002a (42 in decimal).

  3. Given this processor hardware design, add control states to the following to implement a "one" instruction (as decoded by the when below), such that one immed($rs) places the value 1 in memory, i.e., mem[rs+immed]=1;. You should add initial values and test your design using the simulator before submitting it here.

    You can test your code with:

    MEM[0]=op(2)+rs(1)+immed(40)
    MEM[4]=0
    MEM[44]=42
    $1=4
    

    Memory MEM[44] should end-up holding the value 0x00000001.

  4. What high-level languages call goto is usually called a jump instruction in assembly language -- and simply places a constant address value into the PC. The catch is that you can't have a 6-bit opcode and a 32-bit (immediate) memory address fit in one 32-bit instruction. A typical way to handle this is to implement a branch instruction, which encodes the offset from the current PC value to target address. Since our PC has already been incremented to point at the next address, the offset is actually computed from the updated PC value, not from the address of the branch. There's also one more tweak: since we always go to a memory location that's a multiple of 4 (because each instruction is 4 bytes long), we don't need to store the bottom two bits -- they are always 0. Thus, a br instruction at memory location 0x00000000 would jump to 0x00000010 (16) by encoding an offset of (16-(0+4))/4 = 3, in other words, the 16-bit value 0x0003. There is an IRoffsetout operation that will extract that value and put it times 4 on the bus. All you need to do is extract the offset and add it to the PC. Add states to the following to implement this br instruction.

    You can test your code with:

    MEM[0]=0x10000000+immed(3)
    MEM[4]=0
    MEM[16]=0
    

    The simulator should stop after failing to decode the instruction fetched from MEM[16], at which time the PC should hold 20, which is 0x00000014.

  5. Given this processor hardware design and the control sequence below, describe in words (or C-like pseudo code) the function of the instruction xyzzy $rt,$rs.
    when op() op(1) Xyzzy
    
    Start:
     PCout, MARin, MEMread, Yin
     CONST(4), ALUadd, Zin, UNTILmfc
     MDRout, IRin
     Zout, PCin, JUMPonop
     HALT /* Should end here on undecoded op */
    
    Xyzzy:
     SELrs, REGout, Yin
     Yout, ALUadd, Zin
     Zout, ALUadd, Zin /* Yes, this is legal! */
     Zout, SELrt, REGin, JUMP(Start)
    

  6. Given the xyzzy $rt,$rs instruction as defined above, and assuming that a memory load request takes 8 clock cycles to complete (after MEMread has been issued), how many clock cycles would it take to execute each xyzzy instruction? You may use the simulator to get or check your answer. In any case, give and briefly explain your answer here:
  7. Given this processor hardware design, suppose that the following control state is the limiting factor in determining the maximum clock speed. Given that the propagation delay associated with Zin is 1ns, MARin is 2ns, REGout is 4ns, SELrt is 8ns, and ALUslt is 16ns, what is the period (in nanoseconds) of the fastest allowable clock? You may use the simulator to get or check your answer. In any case, give and briefly explain your answer here:
    ALUslt, Zin, SELrt, REGout, MARin
    

  8. Given this processor hardware design, add control states to the following to implement an average instruction (as decoded by the when below), such that avg $rd,$rs,$rt gives rd the average of the other two values. You're probably used to averaging by adding and then dividing by 2, but that actually doesn't work -- because the add can go out of range. Instead, use the algorithm that rd=((rs&rt)+((rs^rt)>>1)) (which is actually a trick from the MAGIC algorithms page).

    You can test your code with:

    MEM[0]=op(1)+rd(8)+rs(9)+rt(10)
    MEM[4]=0
    $8=42
    $9=64
    $10=32
    

    Register $8 should end-up holding the value 48, which is 0x00000030.

  9. Given this processor hardware design, add control states to the following to implement an exchange-with-memory instruction (as decoded by the when below), such that xchg $rt,@$rs swaps the values in register rt and memory[rs]. Hint: swaps are usually done using a temporary register -- Y is a good choice. You should add initial values and test your design using the simulator before submitting it here.

    You should be able to make your own test case.... ;-)



EE380 Computer Organization and Design.