Assignment 2: Multi-Cycle KySMet

In this project, your team is going to build a multi-cycle implementation of KySMet, the little instruction set design you built an assembler for in Assignment 1. Although this is designed to be a SIMD computer with multiple processing elements, in this project you'll only be implementing one processing element... so it looks like a fairly normal machine, but with a couple of structures normal machines don't normally have.

In this project, you'll be determining how to encode the KySMet instruction set, building an AIK assembler that embodies that coding (wait a second... you all just did that!), creating a multi-cycle implementation of the processor and memory, and testing it with some attention paid to test coverage. That's a lot, so you're not doing it alone, but in teams of 3-4 students. Let's take it one step at a time... which is also how you should do it.

Some Things That Aren't Obvious About KySMet

Although KySMet is pretty straightforward, there are a few things you need to be aware of in addition to the KySMet specification.

First and foremost, KySMet is a Harvard architecture with separate data and instruction memories. Why? Because in a SIMD-parallel implementation, there will be a data memory for each processing element, but only one instruction memory.
There are 16 general-purpose registers, but you'll still need some more. For example, there should be a register called the pc (program counter) which is 16 bits long and initialized to 0. Despite some of the 16 general-purpose registers having special names, like sp, there is actually nothing special about any of them except the fact that the "read only" registers must be intialized correctly. Thus, $zero should be 0, $NPROC should be 1, and $IPROC should be 0.
Although it might seem strange, KySMet has dedicated hardware structures for the enable stack and the call stack. You will almost certainly want to implement each one as a shift register, as suggested in the KySMet specification.
Although I told you all in class that multipliers are not really combinatorial circuits for 32-bit integers, we'll allow you to pretend the Verilog multiply operator is a valid combinatorial implementation for 16-bit numbers in KySMet. Same goes for add.
What should trap do? It should cause the Verilog simulation of the processor to halt. That gets a bit trickier in a pipelined implementation, but it's straightforward in the multicycle version you're building now.
Because there is only one processing element in this implementation, there obviously isn't any need to coordinate across processing elements. Given that the only processing element is its own neighbor, left and right essentially become register copy instructions. Similarly, the none_active check in jumpf trivially is implemented by examining only the current enable status of the one processing element.

This is really a very simple general-register "RISC" ISA, but with the twist that it is intended for SIMD parallel implementation... which you will not be doing until the last project.

Top Down

I said it in class, but let me repeat it here: you're going to be building a fairly complex collection of stuff. You'll never get it all working unless you're pretty methodical about the development process... which I'm strongly recommending should be mostly top down.

Before doing anything, look at the instruction set. Think about what kind of hardware structures you're going to need to implement each type of instruction. Remember those high-level processor architecture diagrams in EE380? Well, you want to think a bit about what one of those would look like for your KySMet processor. In fact, your multi-cycle design will probably look a lot like the Simple Processor Architecture from EE380, although there will be various simplifications (e.g., you don't need a MFC line because you can assume your memory completes an access in one cycle). Remember how we built-up that design in EE380 by going through the instruction set and incrementally adding whatever was needed to implement each instruction? Think about this project the same way.

Am I saying you need to draw one of those diagrams right at the start of the project? Not at all. What I'm saying is that you should always have in the back of your head roughly what the big picture is expected to look like. As you think about each instruction, think about what hardware will be involved in executing it and what types of control signals and datapaths will be needed. What things seem hard to do (the fancy title for this is "identify technological risk factors")? Make little notes to yourself. Discuss these things in your team. Make the big or confusing decisions as a team -- and document the non-obvious things in your Implementor's Notes.

Instruction Encoding And The Assembler

The KySMet ISA contains 25 instructions, described here. For the most part, they're pretty straightforward general-register instructions. Each instruction is to be one 16-bit word long, with three exceptions that are each two 16-bit words long: call, jump, and jumpf. You need to define the mapping of the ISA into instruction bit patterns -- you know, like you just did in the previous project. The catch is that now you might want to rethink the encoding to try to simplify your Verilog implementation. You should discuss that within your team.

For example, think about the MIPS instruction set encoding discussed in EE380; remember how the lw and sw instructions used the same instruction field for specifying the register that holds the base memory address? That's the kind of logic you want to apply here. Group things together by how they work, and try to make the encoding reflect those groupings. You can use whatever instruction encoding is most convenient for you to implement in this project, which might or might not be the same field arrangements or values any of your team members used in the previous assignment (or that I used in the assembler sample solution). It's all up to you.

Now you're probably getting nervous about the encoding choices. Don't be. Unlike the real world, in this class you can always change your mind if you later discover your instruction encoding was awkward. It should also be understood that many different encodings are comparably good, so don't be nervous if you hear that somebody else did things differently... you really can both be equally right. Still nervous? Explaining any nervousness-inducing decisions you made in your Implementor's Notes should help you feel better. ;-)

The Verilog Hardware Design

I bet a lot of you are scared of this. You should be; it could be a huge mess. The trick is to never let it become a huge mess by sticking to that top down structured design discipline.

This design problem is not entirely new for you, but the design work you did in EE380 skipped a lot of implementation details that you cannot skip here. Still, think about things as you were told to in EE380. Step through what each instruction needs to do and logically build-up that big picture of the implementation architecture. Think about what function units, data paths, and control signals you will need. Do this before writing Verilog definitions of any piece. In fact, write it up in your implementor's notes before you write Verilog code.

When you think you're nearly ready to start writing Verilog code, recall that in lecture I showed you a sample solution for the Spring 2016 semester instruction set (IDIOT, as described in this Spring 2016 project handout):

// basic sizes of things
`define WORD	[15:0]
`define Opcode	[15:12]
`define Dest	[11:6]
`define Src	[5:0]
`define STATE	[4:0]
`define REGSIZE [63:0]
`define MEMSIZE [65535:0]

// opcode values, also state numbers
`define OPadd	4'b0000
`define OPinvf	4'b0001
`define OPaddf	4'b0010
`define OPmulf	4'b0011
`define OPand	4'b0100
`define OPor	4'b0101
`define OPxor	4'b0110
`define OPany	4'b0111
`define OPdup	4'b1000
`define OPshr	4'b1001
`define OPf2i	4'b1010
`define OPi2f	4'b1011
`define OPld	4'b1100
`define OPst	4'b1101
`define OPjzsz	4'b1110
`define OPli	4'b1111

// state numbers only
`define OPjz	`OPjzsz
`define OPsys	5'b10000
`define OPsz	5'b10001
`define Start	5'b11111
`define Start1	5'b11110

// source field values for sys and sz
`define SRCsys	6'b000000
`define SRCsz	6'b000001

module processor(halt, reset, clk);
output reg halt;
input reset, clk;

reg `WORD regfile `REGSIZE;
reg `WORD mainmem `MEMSIZE;
reg `WORD pc = 0;
reg `WORD ir;
reg `STATE s = `Start;
integer a;

always @(reset) begin
  halt = 0;
  pc = 0;
  s = `Start;
  $readmemh0(regfile);
  $readmemh1(mainmem);
end

always @(posedge clk) begin
  case (s)
    `Start: begin ir <= mainmem[pc]; s <= `Start1; end
    `Start1: begin
             pc <= pc + 1;            // bump pc
	     case (ir `Opcode)
	     `OPjzsz:
                case (ir `Src)	      // use Src as extended opcode
                `SRCsys: s <= `OPsys; // sys call
                `SRCsz: s <= `OPsz;   // sz
                default: s <= `OPjz;  // jz
	     endcase
             default: s <= ir `Opcode; // most instructions, state # is opcode
	     endcase
	    end

    `OPadd: begin regfile[ir `Dest] <= regfile[ir `Dest] + regfile[ir `Src]; s <= `Start; end
    `OPand: begin regfile[ir `Dest] <= regfile[ir `Dest] & regfile[ir `Src]; s <= `Start; end
    `OPany: begin regfile[ir `Dest] <= |regfile[ir `Src]; s <= `Start; end
    `OPdup: begin regfile[ir `Dest] <= regfile[ir `Src]; s <= `Start; end
    `OPjz: begin if (regfile[ir `Dest] == 0) pc <= regfile[ir `Src]; s <= `Start; end
    `OPld: begin regfile[ir `Dest] <= mainmem[regfile[ir `Src]]; s <= `Start; end
    `OPli: begin regfile[ir `Dest] <= mainmem[pc]; pc <= pc + 1; s <= `Start; end
    `OPor: begin regfile[ir `Dest] <= regfile[ir `Dest] | regfile[ir `Src]; s <= `Start; end
    `OPsz: begin if (regfile[ir `Dest] == 0) pc <= pc + 1; s <= `Start; end
    `OPshr: begin regfile[ir `Dest] <= regfile[ir `Src] >> 1; s <= `Start; end
    `OPst: begin mainmem[regfile[ir `Src]] <= regfile[ir `Dest]; s <= `Start; end
    `OPxor: begin regfile[ir `Dest] <= regfile[ir `Dest] ^ regfile[ir `Src]; s <= `Start; end

    default: halt <= 1;
  endcase
end
endmodule

Don't try to copy and edit that Verilog code; KySMet is (very deliberately) too different. However, nothing you're doing requires a solution that is much more complex than the above. If you think your solution needs to be significantly more complex, you're not yet ready to start writing Verilog code: design first, code second.

Structuring Your Verilog Code

As I did in the sample above and suggested in class, I strongly suggest that you think in terms of writing definitions of control signals and dummy top-level modules (with their output and input specifications). I very much like the idea of having an abstracted list of control signal definitions using `define. By consistently using things like `WORD instead of [15:0], the Verilog hardware description becomes just a little more abstract; you no longer have to ask yourself if something that says [15:0] is a 16-bit word or if it is a collection of other things that just happens to also be 16 bits. The same benefit happens by using `Opadd instead of 4'b0000, but you also get three more benefits:

As I did above, directly deriving the control signals and state numbers from the instruction opcode can greatly simplify things. Of course, you can't just use the 4-bit opcode field for KySMet because there are 25 different types of instructions, but you can create a virtual opcode by combining the instruction opcode field and some other field data -- as I did in the IDIOT example above, where I made a virtual 5-bit opcode value to distinguish OPjz, OPsys, and OPsz.
If you decide to make the ALU a separate module (which I didn't do above, but might be a good thing to do to simplify testing), you know that the module implementing the ALU will understand the same control signal the same way as any module that instantiates an ALU.
Knowing the complete set of ALU operations and their encoding becomes a fairly detailed specification of what your ALU must implement. This little header of `defines is really a both a design specification and a part of the design implementation.

In summary, in lectures you got a fairly detailed overview of how to go about designing hardware for a complete computer system. The bottom line is that you should start by defining the set of function units, data paths, and control signals you will need. Define the interfaces and signals. Then build the modules themselves. Note also that for this project, you are allowed to use things like the Verilog + operator to build an adder: you need synthesizable Verilog, but you don't have to specify things at any particular level.

How Many Modules Should There Be?

Well, it isn't too difficult to build the entire processor as a single module -- as I did above. However, that makes the Verilog code harder to test and debug. You don't want to wait until everything is written to start testing and debugging the pieces. It also makes it much harder to reuse pieces of it in the next project, which will be a pipelined implementation. Worse still, if we were rendering the design to an FPGA or ASIC, it is quite possible that a single-module version of the Verilog code will generate unnecessarily complex hardware. This can happen by the compiler failing to factor function units (e.g., creating multiple ALUs when one would suffice) or, even more often, by implementing memories at the gate level because the Verilog compiler failed to recognize that your memory could be implemented using a standard memory structure. Still, how many modules you make is entirely up to you.

Test Plan

As we discussed in class, testing a complex piece of hardware is a lot more difficult than simply enumerating all input values and comparing circuit outputs to those of an oracle (correct reference) computation. Your project needs to include a test plan (best described in your Implementor's Notes) as well as a testbench implementing the planned test procedure.

In class, we distinguished testing correctness of the design from testing correct operation of an implementation of the design. For this project, you do not need to worry about implementation test issues: i.e., your test plan does not need to target identification of faults caused by faulty manufacture, timing issues, etc. Neither do you need to "design for testability" in this project -- for example, you don't need to insert scan access paths for internal state that would otherwise be unobservable in the circuit implementation. What you need to do is develop a test plan that will give good certainty that your design itself is logically, functionally, correct.

In class, we discussed the covered test coverage tool, the metrics it collects, and what should be considered acceptable coverage values. Fundamentally, the most important type of coverage for this project is that every circuit path (every Verilog statement) should be used in some test case. You need not use the covered tool, nor its version embedded in this course's Verilog WWW form interface, to perform the coverage analysis, but you should provide some explanation in your Implementor's Notes of how your suite of test cases covers approximately 100% of all statements (lines of Verilog). You may (should) assume that built-in Verilog structures and operators, such as +, are operating correctly without exhaustively testing them... but implementations of things like co probably require some test cases.

The testbench you create to implement your test plan should look a lot like the testbench you wrote for Assignment 0, except:

You may find it impractical to have a separate oracle computation module, instead simply writing Verilog code that directly embeds the manually-determined correct values to compare with. There is nothing wrong with you explictly comparing that an Add instruction that was supposed to add 1 + 3 really does equal the constant 4 in your testbench. In fact, you can even use the little trick discuessed in lecture of writing an assembly language test program than simply falls into a sy instruction and halts early if any test fails. The nice thing is that, using such a test program, you only need to know the final PC value to know what test failed -- you don't need to examine internal state such as registers and memory.
Ideally, you should have a single module testbench that tries all the test cases as a single large test sequence -- e.g., a single KySMet test program. However, it may be simpler for you to write a separate module to perform each test case, and that is acceptable for this project. If you take the multiple-module test approach, you may submit each test module in a separate file such that simply catenating a test module file on the end of your design file will produce the complete Verilog code to test that case. Alternatively, you could have a single module testbench that invokes your individual test modules in sequence, but beware that writing your testbench in that style easily can result in multiple instances of your design being created by the Verilog interpreter, which can be very slow. Be sure to document how your test plan should be executed in your Implementor's Notes.

If you think about it, that basically means the Verilog portion of your testbench can be something very simple, like:

module testbench;
reg reset = 0;
reg clk = 0;
wire halted;
processor PE(halted, reset, clk);
initial begin
  $dumpfile;
  $dumpvars(0, PE);
  #10 reset = 1;
  #10 reset = 0;
  while (!halted) begin
    #10 clk = 1;
    #10 clk = 0;
  end
  $finish;
end
endmodule

This just enables trace generation, intializes everything with a reset, and then keeps toggling the clk until the processor says it has reached a halted state.

Note that my online Verilog WWW interface allows use of $readmem directives, so it is much simpler to use that mechanism to initialize memory for your test cases. Include any such files in your submission as files with names ending in .vmem (to indicate that they are Verilog memory initialization files).

Due Dates

The recommended due date is Friday, March 2, 2018. By that time, you should definitely have at least submitted something that includes the assembler specification (kysmet.aik), and Implementor's Notes including an overview of the structure of your intended design. That overview could be in the form of a diagram, or it could be a list of top-level modules, but it is important in that it ensures you are on the right track. Final submissions will be accepted up to just before class on Monday, March 5, 2018.

Note that you can ensure that you get at least half credit for this project by simply submitting a tar of an "implementor's notes" document explaining that your project doesn't work because you have not done it yet. Given that, perhaps you should start by immediately making and submitting your implementor's notes document? (I would!)

Submission Procedure

For each project, you will be submitting a tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf. It also may include test cases, sample output, a make file, etc., but should not include any files that are built by your Makefile (e.g., no binary executables). Be sure to make it obvious which files are which; for example, if the Verilog source file isn't kysmet.v or the AIK file isn't kysmet.aik, you should be saying where these things are in your implementor's notes.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok).

Advanced Computer Architecture.