Assignment 4: Floating

In Assignment 3, you defined the instruction encoding, built an assembler, wrote Verilog code for a pipelined implementation of the IDIOT instruction set architecture, created a test coverage plan, and tested your design. Well, this project builds on that... but without actually requiring you to build on that. You'll simply be implementing the floating-point instructions for IDIOT... which really just means implementing the corresponding ALU operations. Yes, I do still want you to build that ALU into a pipelined IDIOT, but build the ALU first!

Our Mutant Floating Point Format

We've been talking about this for a while now, but the basic reality is that IEEE 754 floating point standard is pretty complex. The good news is that the latest (2008) version is only 70 pages long. On the UK campus, you can get IEEE Std 754-2008 for free from this IEEE Xplore site. Conforming to the standard is not easy, and you will not demonstrate conformance (which would be yet another form of test planning!) nor is our intent to conform. What will be non-conforming? Well, to being with, we're using a 16-bit binary layout that is essentially what the standard calls binary32 format, but missing the 16 least significant bits of the mantissa. That is not the same as the binary16 format, although it is very similar to what some GPUs have implemented. We're also not going to be very careful about the arithmetic, producing (disturbingly) approximate answers. The only subnormal value you'll deal with is positive 0, and we'll also ignore +/- infinity, NaN (both the quiet and signaling types), and rounding modes.

There is a lovely little formula on page 9 of the standard that says the floating point value is:

(-1)S * 2E-bias * (1 + 21-p * T)

For our format described in IEEE 754-2008 terms:

S: sign, 1 bit E: encoded exponent, 8 bits T: trailing part of significant, 7 bits

That's slightly confusing. Think of it this way: what Verilog would call {1'b1, T} as an unsigned integer, gives the value:

(-1)S * 2E-bias-7 * {1'b1, T}

For example, the decimal value 3 is (-1)0 * 2128-127-7 * 8'b11000000, which is 1 * 2-6 * 192, which is 192/64. Thus, the encoding of 3.0 is 0x4040.

The really cool thing about using this odd format is that you can play with it using ordinary binary32 floating-point math, as implemented by langauges like C, and simply ignore the last 16-bits of the binary32 result. The good news is that Verilog also knows about binary32 values as real, and provides built-in functions $bitstoreal() and $realtobits(). The bad news is that Icarus Verilog doesn't implement either one of those functions. Oh well. They aren't supposed to be synthesizable anyway.

Now there's more good news! As of April 27, 2016, there is a little CGI form to play with math and conversions to/from our mutant 16-bit float format. Enjoy. Incidentally, this format really brings home the imprecision of using floating point. For example, adding 10 and 0.1 results in 10 + 0.0996094 = 10.0625. Really? Yup. By the way, 100 + 1 = 101. For the repeating fraction generated by 0.1, truncation is not your friend. ;-)

The Operations To Be Implemented

Well, we've talked quite a bit about them in class... here they are:

Instruction Description Functionality
addf $d, $s ADD Floats $d += $s
f2i $d, $s Float-to-Integer $d = ((int) $s)
i2f $d, $s Integer-to-Float $d = ((float) $s)
invf $d, $s approximate INVerse Float $d = 1.0/$s
mulf $d, $s MULtiply Floats $d *= $s

You'll probably want to refer back to my slides on floating point for some insights on how to do each of these operations. Perhaps surprisingly, addf is the hardest one to implement. However, I'd strongly suggest starting with f2i and i2f. Why? Because they will make testing a lot less painful.

Originally, I suggested saving the the invf operation for last because, although it's quite easy, I was working on a better algorithm. Well, as of April 27, 2016, I have one. You can compute the reciprocal of the mantissa using a simple table lookup; recip.vmem is a 128-entry 7-bit lookup table for replacing the bottom 7 bits of the mantissa. The exponent field simply needs to be negated, which (given the offset encoding) can be done by 254 - exp. However, there is a minor catch: if the bottom 7 bits of the mantissa were not all 0, then inverting it using the table basically implied a shift of the mantissa one to the left because 1/f where f>1 gives a fraction greater than 1/2 and less than 1. Thus, you need to subtract 1 more from the exponent if the original 7 low mantissa bits were not 0; in other words: 253 - exp. This new reciprocal algorithm is much better than just doing the integer subtract, and it's still an easy combinatorial circuit.

You can use the algorithms presented in class. Now that we've told you how to build a barrel shifter, you can also implement that by just using a Verilog shift operator with a variable on the right side.

I strongly suggest you test the floating point ALU operations before you try to integrate them into the pipelined IDIOT processor.

Integration

I'll be going through a sample solution for Assignment 3 in class, and you can use that as a basis for adding the floating-point operations. However, For this project, you are allowed to reuse any pieces from the Assignment 3 solutions that you, or any of your Assignment 3 teammates, helped create. You also may use any materials I give you here. Perhaps most importantly, you also are free to not use any of those things; in other words, you can combine any of those materials and make changes as your Assignment 4 team sees fit. For example, if you don't like the way instruction fields are encoded, feel free to re-write the AIK assembler (but make sure your Implementor's Notes documents how instructions are encoded and why). You also may add pipeline stages if you feel that it is not appropriate to implement even this crude floating point arithmetic in a single-cycle ALU.

We have now reviewed my sample solution for Assignment 3. Here is Verilog code and minimal VMEM0 and VMEM1 to initialize the register file and main memory.

Philosophy

Along with the general overview of your approach in the implementor's notes, I want you to give a brief statement (a paragraph or two in the implementor's notes) about how reasonable it was to implement the floating-point operations we selected for this processor as opposed to other floating-point operations. Think broadly in terms of three questions:

Note that there is not one correct answer for this, but a wide range of valid responses. I want you to think about this in a broader context. Think about what floating-point operations are used for and how this particular 16-bit architecture might actually be used. What kinds of computation is this processor well-suited for?

Testing

The test coverage plan and testbench from Assignment 3 is probably very close to what you want, but it probably doesn't test any of the floating-point operations. You'll need to add that. The rest of the test plan should be fine as it was. Note that you should still test other operations because it is quite possible to break things while adding support for new functions.

Due Dates

Final submissions will be accepted up to when the final exam begins at 3:30PM on Wednesday, May 4..

Submission Procedure

Each team will submit a project tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf. (Fairly obviously, the Implementor's Notes should also say who the implementors are -- list all team members as authors.) It also may include test cases, sample output, a make file, etc., but should not include any files that are built by your Makefile (e.g., no binary executables). For this particular project, name the Verilog source file floatpipe.v.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok).

Use the submission form below to submit your project as a single submission for your team -- you do not submit as individuals. The last submission before the final deadline is the one that will be graded.

Your team name is .
Your team password is


EE480 Advanced Computer Architecture.