Assignment 4: E Pluribus Unum?
Out of many, one. This project is the SIMD parallel pipelined Verilog
implementation of the KySMet instruction set
architecture. You are going to make it parametric such that any
reasonable number of PEs (processing elements) can be built.
Your Verilog code should have one `define parameter,
`NPROC, which you can change to change the total number of PEs.
Where To Start?
Well, naturally you'll start with the previous project's solution.
Sort-of. Actually, you should start with something like this:
You already know how to pipeline the processor. Now begin by
thinking about the overall structure for parallelizing the CU
(control unit) and PEs (processing elements).
-
Your "processor" module here should really be the entire thing,
which means the CU and instantiations (using generate)
of a group of PEs. Thus, I'd still expect a module
processor(halt, reset, clk) to be defined and work
superficially a lot like in the previous two projects.
-
In a very simple view, the CU roughly correspondes to the first
stage of your pipeline: instruction fetch and decode. The CU
obviously owns the instruction memory, call stack, and program
counter (pc). Of course, the CU can initialize the
instruction memory from a VMEM file. Certainly, there is no need
to have multiple copies of any instruction decode logic, so I'd
suggest the CU should do whatever it takes to map an instruction
bit pattern into simple, unambiguous, internal "opcodes" for
each instruction.
-
As we discussed in class, there is a little ambiguity as to
whether the CU should also own the register file. Why? Because
you really only need one decoder -- all PEs will be fetching
from (and storing to) the same register numbers at the same
times, so a single register file with each cell being
16*NPROC bits/cell. It's probably easier to have each
PE own its register file, but that will cost a little extra
hardware. It's a decision to make. In any case, the owner will
need to ensure that register $0 holds 0, register
$1 holds NPROC, and register $2 as
seen by each PE holds the appropriate IPROC value.
-
Each PE module, of which there are NPROC
instantiations, owns an enable stack and a data memory. You
could harmlessly assume that the data memories are either
uninitialized or are all initialized identically from a single
VMEM file. Of course, each PE also contains an ALU, and
essentially all of the later stages of the pipeline... sort-of.
The catch is that much of the interlock logic does not really
need to be done per PE. For example, consider an instruction
getting stuck in the register read stage waiting for an earlier
instruction to write a result into a register. Every PE is
accessing the same register numbers, so all PEs have
identical pipeline interlock constraints. Perhaps the CU should
be tracking interlocks/forwarding? Perhaps separate
combinatorial assign or always blocks should
own the interlock flags? Again, these are decisions to make
early.
-
Unlike the single-PE versions you built earlier, the enable
stack here isn't trivial. Each PE needs to track its own enable
status, but the current PE enable status for all PEs must be
able to be checked by the CU. Recall that jumpf is
taken only if all PEs are disabled, and that is
something the CU needs to know. Of course, jumpf also
has the effect of potentially changing the enable status of any
PE, and that aspect of the instruction almost certainly should
be handled by the individual PEs. More significantly, a
disabled PE should not be doing loads, stores, nor changing
register values. In other words, an add
$u0,$u1,$u2 instruction should not change the value of
register $u0 if that particular PE is currently
disabled. You might want to think about factoring-out the
acceptance of changes from computation of values; alternatively,
you might simply turn disabled operations into null operations.
-
The left, right, and gor instructions
require communication between the PE register files (or is it
the CU's multi-wide register file?). Thus, you'll need a
parametric way to expose communication paths between PEs --
parametric because you want to be able to specify NPROC
and have the number of PEs instantiated match your
specification. This can be done using in and out busses in the
PE modules. The left and right communications
are simple formulas connecting different PEs. However,
gor is a bit fancier in that it requires bitwise ORing
of data from all PEs, so you need a parametric way to build an
OR across all PEs.
As you start to partition the pipelined design across the CU
and PEs, think carefully about which aspects of each instruction
are done where. For example, a jump instruction really
doesn't do anything in the PEs.
Due Dates
The recommended due date is Monday, April 23, 2018 (the start of
dead week). By that time, you should definitely have at least
submitted something that includes the assembler specification
(kysmet.aik), and Implementor's Notes including an
overview of the structure of your intended design. That
overview could be in the form of a diagram, or it could be a
list of top-level modules, but it is important in that it
ensures you are on the right track. Final
submissions will be accepted up to just before the final exam on
Tuesday, May 1, 2018 (at 3:30PM, if you really want to cut it
close).
Submission Procedure
For each project, you will be submitting a tarball (i.e., a file
with the name ending in .tar or .tgz) that
contains all things relevant to your work on the project.
Minimally, each project tarball includes the source code for the
project and a semi-formal "implementors
notes" document as a PDF named
notes.pdf. It also may include test
cases, sample output, a make file, etc., but should not include
any files that are built by your Makefile (e.g., no binary
executables). Be sure to make it obvious which files are which;
for example, if the Verilog source file isn't
kysmet.v or the AIK file isn't
kysmet.aik, you should be saying where
these things are in your implementor's notes.
Submit your tarball below. The file can be
either an ordinary .tar file created using tar cvf
file.tar yourprojectfiles or a compressed
.tgz file file created using tar zcvf
file.tgz yourprojectfiles. Be careful
about using * as a shorthand in listing
yourprojectfiles on the command line, because
if the output tar file is listed in the expansion, the result
can be an infinite file (which is not ok).
Advanced Computer Architecture.