Assignment 3: Pipelined Tangled

In this project, your team is going to build a pipelined implementation of Tangled, the instruction set design you built an assembler for in Assignment 1 and a multi-cycle implementation of in Assignment 2. This implementation will still be missing the Qat instructions... but you'll actually be implementing a little better floating-point arithmetic than you did in the multicycle design.

Floating NaNs

In this project, you still can use my 16-bit floating-point library, but not without making some improvements to it. This time I expect you to add support for NAN. A Verilog definition of the bit pattern that represents not-a-number is:

`define NAN    16'hffc0

The rules for handling NAN are very simple:

Adding the above NAN handling is pretty straighforward, but it will force you to read and understand the floating-point implementation a bit better... and that's why I'm having you do it. Note that your NAN handling must be implemented completely combinatorially using assign, not always.

That said, this project isn't really about floating-point, but pipelining....

A Single-Cycle Starting Point

Remember CPE380? Not really? That's ok... just play along anyway. Back in CPE380, we followed a rather neat plan in the textbook that basically recommended that a pipelined design could best be created by initially designing a slow single-cycle implementation. The function units, data paths, and control signals defined for the single-cycle implementation could then be used (with only minor modifications) in the pipelined version. It was mostly just a matter of carving the single-cycle design into appropriate pipeline stages... probably about 4 of them. Well, now is the time we see if that approach really works....

I don't suggest that you should take the following diagram too literally.... However, here's a rough diagram showing one way to start thinking about a single-cycle implementation that you can then pipeline:

In class, we will be discussing pipelining issues for a while. Thus, I'm not going to go through everything here, but key ideas specific to Tangled include:

Well, that wasn't so bad now, was it? Of course, I only gave you an approximate single-cycle design, while you must create a complete pipelined version.

Setting The Stage(s)

One of the first steps in making a pipelined implementation is figuring-out how many stages there should be and what belongs in each.

Although we are not forcing you to do any timing analysis, you should make reasonable assumptions about how much can be done in one clock cycle. Your pipelined design most naturally seems to consist of four stages: instruction fetch, register read, data memory access or ALU operation, and register write. These are approximately color-coded in the above diagram. One could argue that register write doesn't really need its own stage, but your ALU is slow enough by itself. If it makes more sense to you to build more than four stages, feel free to do that... but I would argue that four stages is the easiest to justify for the current project.

As we are discussing in class, one of the most useful concepts in creating hardware (or parallel software) is owner computes: the idea that each register/memory should be written into by only one entity, its owner, and that entity should also compute the value that will be written. Thus, a pipeline doesn't really look at all sequential. Instead, a pipeline is a set of independent, parallel-executing, entities that communicate by the owner of each register updating the register value which is read by one or more other entities. For example, the buffer at the end of the instruction fetch stage (let's call this stage 0) will certainly include a register that holds the destination register number (d), and this "d register" is owned and written by stage 0. Of course, the destination register number is potentially needed until the very last stage (where the register write is done), but it doesn't stay in "d": each stage will have its own register for that. For example, stage one might own "d1" and normally will set it to the value read from "d0". Keep in mind some registers, such as the PC, have many potential sources for their next value -- but the Verilog always block that logically owns the PC is the only thing that should write a new value into it.

Keep in mind that computing complex formulas can be isolated into little always, or combinatorial assign (perhaps even using a Verilog function), blocks that aren't pipeline stages per se. They are parallel-executing hardware units that exist for the sole purpose of owning that computation's result. For example, you might find it easier to have a separate block that owns and computes the interlock condition that would prevent the instruction fetch and register read stages (stages 0 and 1) from advancing when there is a dependence on an instruction further in the pipeline. Also remember the basic rule that = assignment is used for "temporary" values that are used within the clock cycle in which the value is assigned, whereas <= should be used for anything being communicated from one clock cycle to the next or expected to have a stable value for other things to read during the clock cycle.

Not too bad, right? Well, here are a few more things to think about:

Let's be completely clear about what I expect: your submission should be a viable four-or-more-stage pipelined Verilog implementation of the Tangled instruction set implementing both integer and float instructions, with float NaN support, but treating Qat instructions as sys. The significant design decisions made should also be discussed in your Implementor's Notes.

Stuff You Can Reuse

The Tangled ISA should be familiar by now. Assignment 2 was scary, but that was mostly because you had never done something like this before -- now you have. Reusing knowledge and even some code can make this project easier.

You can use my sample multi-cycle Tangled as reference material, but shouldn't directly copy code from it. You are not allowed to use anything from another Assignment 3 team nor from an Assignment 2 team that none of your Assignment 3 team members were on. You can use things done by any of your Assignment 3 team members, including things their teams did on Assignment 2, and things provided as part of this assignment. If you find other materials, for example solutions posted from previous semesters, useful, you may borrow ideas from them, but should generally not literally copy code and you must cite the sources you borrowed ideas from in your Implementor's Notes. Although you might be able to do many things exactly as you did in the previous project, you should carefully consider everything from the instruction encoding on, because rethinking often can produce a fully working implementation faster and easier than reusing....

As discussed in class, Verilog code that specifies memory accesses somewhat carelessly is very likely to result in a bigger circuit than if we carefully factored things into modules and created single instances of those modules. For example, a Verilog compiler might fail to map Data Memory into a dedicated memory block within an FPGA, instead constructing a memory using thousands of logic cells. Using an instance of a memory module designed to comply with the FPGA-maker's guidelines (e.g., this dual-port RAM with a Single Clock from ALTERA, which is this Verilog code) ensures that the vendor's Verilog toolchain will correctly infer use of the intended hardware modules inside the FPGA. Incidentally, the ALTERA RAM block seems to be able to handle simultaneous read/write in one clock using the two ports, although the declaration given makes the memory byte-wide rather than storing 16-bit objects. Of course, in this class we are not rendering designs into physical circuits, so these issues of complexity (and timing analysis) are neither obvious nor critical... but you should always be aware of the potential hardware complexity you risk introducing by using a specification style that doesn't explicitly factor-out the desired modules. Try not to do anything exceptionally wasteful -- or at least explain why you did it in your Implementor's Notes.

Testing

Again, the test coverage plan and testbench from Assignment 2 are probably very close to what you want. However, you do need to seriously think about coverage again. Why? You are not testing the same Verilog code, so there may be some paths that didn't exist before -- and they might not be covered with a testbench that covered your old version. Also, as discussed in class, don't wait until "everything" is done to start testing! Do incremental testing as you add support for each type of instruction to your pipeline....

In particular, you'll be adding NaN support to the floatin-point implementation. That wasn't in the previous version. Thus, you should be testing that all the floating-point instructions work as they should for NaN operands. Feel free to test them either within the processor or as separate modules.

Just to be clear, I do not expect you to incorporate any design for testability features in your Verilog design.

Due Dates

I know that with COVID-19 disrupting everyones life, it is difficult to stick to a schedule. However, it's important that we leave enough time for the last project. Thus, the due date is before class, Monday, November 9, 2020.

First priority should be for you to get your assembler specification (tangled.aik) and Implementor's Notes together, including an overview of the structure of your intended design. That overview could be in the form of a diagram, or it could be a list of top-level modules and/or "owner" groupings, but it is important in that it should demonstrate that you are on the right track and thus allow assigning significant partial credit even if you don't get much farther. This project isn't really much harder than the multi-cycle one, and you should be able to get things working... but we are aware that the awkwardness of not being physically together with your teammates might slow down debugging, and you should be aware of that too.

Submission Procedure

For each project, your team (NOT each person individually) will be submitting a tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf. It also may include test cases, sample output, a make file, etc., but should not include any files that are built by your Makefile (e.g., no binary executables). Be sure to make it obvious which files are which; for example, if the Verilog source file isn't tangled.v or the AIK file isn't tangled.aik, you should be saying where these things are in your implementor's notes. Don't forget to include any VMEM files and test assembly code.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok). Also note that zip is not compatible with tar.

Your team name is .
Your password is


EE480 Advanced Computer Architecture.