Assignment 3: Quicker Logick

In Assignment 2, you defined the instruction encoding, built an assembler, wrote Verilog code for a multi-cycle implementation of the Logick instruction set architecture, created a test coverage plan, and tested your design. The good news is that most of that stuff can be reused for this project. The bad news is that you still have a pile of Verilog to write... because you'll be implementing a nice, fast, pipelined version.

A Single-Cycle Starting Point

Remember EE380? Not really? That's ok... just play along anyway. Back in EE380, we followed a rather neat plan in the textbook that basically recommended that a pipelined design could best be created by initially designing a slow single-cycle implementation. The function units, data paths, and control signals defined for the single-cycle implementation could then be used (with only minor modifications) in the pipelined version. It was mostly just a matter of carving the single-cycle design into appropriate pipeline stages. Well, now is the time we see if that approach really works....

You really need to design your own single-cycle implementation from scratch for this project. Why? Because there are too many options. For example, there are a number of very different, yet comparably-good, ways that you can handle condition codes, immediate values, etc. Fortunately, it's a pretty simple instruction set in any case.

So, I don't really claim you should take the following diagram too literally.... However, here's a rough diagram showing one way to start thinking about a single-cycle implementation that you can then pipeline:

We have been discussing most of the issues in class. I'm not going to go through everything again here, but some key ideas were:

Well, that wasn't so bad now, was it? Well, of course not... because that's not what you have to do. You have to make a pipelined design.

Setting The Stage(s)

One of the first steps in making a pipelined implementation is figuring-out how many stages there should be and what belongs in each.

It is fairly obvious that the memories (including the register file) will take a little while to access, and we all know ALUs are notoriously slow. Thus, we'd expect a stage for each of those things along any circuit path. The most natural result here is a pipeline with 3 or 4 stages (you really shouldn't be ending-up with fewer than 3 stages).

As we have been discussing in class, one of the most useful concepts in creating hardware (or parallel software) is owner computes: the idea that each register/memory should be written into by only one entity, its owner, and that entity should also compute the value that will be written. Thus, a pipeline doesn't really look at all sequential. Instead, a pipeline is a set of independent, parallel-executing, entities that communicate by the owner of each register updating the register value which is read by one or more other entities. For example, the buffer at the end of the instruction fetch stage (let's call this stage 0) will certainly include a register that holds the destination register number ($d), and this "d0" register is owned and written by stage 0. Because register 0 is read only, let's imagine that you set d0 to 0 if the instruction doesn't write a value -- for example, a st instruction doesn't write into a register (just into data memory). It sounds like the value written into d0 depends only on the instruction fetched, but in fact it might depend on what's happening in other stages because an instruction that was speculatively fetched after a br might turn into a null operation, meaning it doesn't write anywhere. Owner computes says stage 0 should be the thing that determines what to write, which will require it to also examine register owned by other stages. Similarly, stage 1 obviously will be reading the value in d0 -- if for no other reason than to copy it into register d1 (which stage 1 owns).

Keep in mind that computing complex formulas also can be isolated into little always, or even combinatorial assign, blocks that exist for the sole purpose of owning that computation's result. For example, you might find it easier to have a separate block that owns and computes the interlock condition that would prevent the instruction fetch and register read stages (stages 0 and 1) from advancing when there is a dependence on an instruction further in the pipeline. For example, in the (very different from your project) sample solution for the SIK stack machine, you'll not only see an always clause per pipe stage, but also assign blocks that own things like the halt signal (which is derived from two separate halt signals because SIK actually uses a superscalar pipeline that isn't truly halted until both halves of the pipe are halted). Even simpler examples of non-stages that might appropriately become separate blocks include the ALU and the main memory.

Not too bad, right? Of course not... but your task is a little harder. The ALU instructions and load and store are all easy, and sy is still just halt, but that leaves some instructions with issues. How can you deal with them?

Let's be completely clear about what I expect: your submission should be a viable three-or-more-stage pipelined Verilog implementation of Logick's instruction set. The significant design decisions made should also be discussed in your Implementor's Notes.

Stuff You Can Reuse

Assignment 2 was scary, but that was because you had never done something like this before -- now you have. For this project, you are allowed to reuse any pieces from the Assignment 2 solutions that you, or any of your Assignment 3 teammates. helped create. You also may use any materials I give you here; for example, here are the guts of my sample solution for the Verilog code of the multi-cycle Logick project. Perhaps most importantly, you also are free to not use any of those things; in other words, you can combine any of those materials and make changes as your Assignment 3 team sees fit. For example, if you don't like the way instruction fields are encoded, feel free to re-write the AIK assembler (but make sure your Implementor's Notes documents how instructions are encoded and why).

In general, you are not allowed to use anything from another Assignment 3 team nor from an Assignment 2 team that none of your Assignment 3 team members were on. You can use things done by any of your Assignment 3 team members, including things their teams did on Assignment 2, and things provided as part of this assignment.

As discussed in class, Verilog code that specifies memory accesses somewhat carelessly is very likely to result in a bigger circuit than if we carefully factored things into modules and created single instances of those modules. For example, a Verilog compiler might fail to map Data Memory into a dedicated memory block within an FPGA, instead constructing a memory using thousands of logic cells. Using an instance of a memory module designed to comply with the FPGA-maker's guidelines (e.g., this dual-port RAM with a Single Clock from ALTERA) ensures that the vendor's Verilog toolchain will correctly infer use of the intended hardware modules inside the FPGA. Of course, in this class we are not rendering designs into physical circuits, so these issues of complexity (and timing analysis) are neither obvious nor critical... but you should always be aware of the potential hardware complexity you risk introducing by using a specification style that doesn't explicitly factor-out the desired modules.

Testing

Again, the test coverage plan and testbench from Assignment 2 are probably very close to what you want. However, you do need to seriously think about coverage again. Why? You are not testing the same Verilog code, so there may be some paths that didn't exist before -- and they might not be covered with a testbench that covered your old version.

Due Dates

The recommended due date is Friday, November 3. Final submissions will be accepted up to when class begins on Wednesday, November 8.

Submission Procedure

Each team will submit a project tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf. (Fairly obviously, the Implementor's Notes should also say who the implementors are -- list all team members as authors.) It also may include test cases, sample output, a make file, etc., but should not include any files that are built by your Makefile (e.g., no binary executables). For this particular project, name the Verilog source file pipe.v.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok).

Use the submission form below to submit your project as a single submission for your team -- you do not submit as individuals. The last submission before the final deadline is the one that will be graded.

Your team name is .
Your team password is


EE480 Advanced Computer Architecture.