Home Page of Fall 2022 EE685
This is the official home page for EE685. There is only one
section, meeting TR 3:30-4:45PM in room 265 FPAT, Hopefully, we
will continue to meet in person throughout this semester -- with
appropriate precautions against spread of the pandemic.
Although many course materials will be posted here, the course
Canvas site will be the primary place for course announcements.
You are expected to use canvas and materials linked here. We
expect at least to use Zoom as an optional alternative to
physical office meetings with Professor Dietz. The
best way to schedule office meetings, physical or virtual, is
via email with "EE685" in the Subject line.
The course content of EE685 has actually varied quite a bit over
the years. However, in general, it has tried to be a step
beyond the material covered in the CPE380-CPE480 sequence, and
so it will be this semester. What that means is a little
different from past offerings because CPE380 and CPE480 have
both seen major upgrades and restructurings over the past two
years. The current plan is for EE685 to be somewhat Verilog
intensive, but to quickly move to more advanced (mostly
parallel) architectures and architectural concepts (e.g.,
The list of reference materials below will grow dramatically as
the course progresses...
The Syllabus, which has
been updated for Fall 2022 and may still be subject to
change due to the special circumstances of the COVID19 pandemic
The introductory slides
An overview of Verilog
A really fast review of CPE380/480 up to the point of a simple pipelined Verilog implementation of MIPS
Slides introducing the simple MIPS
SIMD project instruction set
Papers describing basic (pronounced "old") SIMD
(Single Instruction stream, Multiple Data
stream) stuff. Notice that traditional SIMD is often
bit-serial and extremely simple per processing element.
Architecture of a massively parallel processor (PDF).
This paper describes Ken Batcher's SIMD MPP design at Goodyear Aerospace.
DAP -- a distributed array processor (PDF).
This paper describes the ICL DAP, another early SIMD machine.
Thinking Machines CM-2 (PDF). A (relatively late) version of the "Connection
Machine Model CM-2 Technical Summary, Version 6.0, November
1990." This includes description of the (CM-200) floating-point
hardware to the design.
KY Architecture Nanocontrollers. This was really introduced in Much Ado about
Almost Nothing: Compilation for Nanocontrollers (slides, full paper). This is
essentially answering the question: how simple can a SIMD PE be?
A Quantum-Inspired Model For Bit-Serial SIMD-Parallel Computation (slides, full paper). There
is also the Tangled processor design from CPE480 Fall
2020. For now, we'll ignore the quantum stuff; it's a bit-serial SIMD
with a few interesting twists.
Activity Counter Implementation Of Enable Logic
This paper describes a clever method for handling
tracking of nested SIMD enable/disable without use of a bit
Basic SWAR Architecture & Concepts
One of the first talks on the concepts of SWAR was
Multimedia Extensions For Microprocessors:
SIMD Within A Register
which I originally presented in February 1997 at Purdue University.
The HTML is a little ugly, but this is the original HTML,
and the server it was on supported different server-side
One of the best generic descriptions of the concepts of SWAR was
Compiling for SIMD within a Register
which is linked from Springer-Verlag using UK's EZProxy access
An Introduction to Modern GPU Architecture,
a very nice, if old, set of overview slides from NVIDIA
Mark Harris 2007 slides on reduction optimization
It is useful to note that there is now even better efficiency possible
warp shuffle, and lots of optimized functions are now available
The atomic primitives are described in this section of the CUDA-C programming guide.
Here are slides from NVIDIA overviewing their use.
A bit about AIK (Assembler Interpreter
from Kentucky). This assembler-interpreter was created
explicitly for EE480 by H. Dietz. It was built using the
PCCTS/Antlr tools (which were created by Dietz's group back in
the early 1990s). Given a very concise specification of an
assembly language, it interpretively implements a relatively
Papers describing MIMD (Multiple Instruction stream, Multiple Data stream) stuff.
We just completed SIMD, so here's how to make MIMD code execute on SIMD (GPU)
hardware. I've done a lot of this; see here. Slides overviewing
how this works are also available.
Henry G. Dietz and Thomas Schwederski, Extending Static
Synchronization Beyond SIMD and VLIW, Purdue University School of
Electrical Engineering, Technical Report TR-EE 88-25, June 1988.
Repetition Filter Memory in CHoPP:
A. Klappholz. (1981). IMPROVED DESIGN FOR A STOCHASTICALLY CONFLICT-FREE
MEMORY/INTERCONNECTION SYSTEM.. 443-448. Paper presented at Conf Rec Asilomar
Conf Circuits Syst Comput 14th, Pacific Grove, CA, USA.
(still looking for copy of this or other relevent article...)
Fetch-&-Add in the NYU Ultracomputer:
L. Rudolph, and
M. Snir, "The NYU Ultracomputer -- Designing an MIMD Shared Memory Parallel
Computer" in IEEE Transactions on Computers, vol. 32, no. 02, pp. 175-189,
1983. doi: 10.1109/TC.1983.1676201 (URL,
"An Overview of the NYU Ultracomputer Project (1986)"
(PDF) is a better, but more obscure, reference
Explanation of the "Hot Spot" problem for RP3:
G. F. Pfister and V. A. Norton, "``Hot spot'' contention
and combining in multistage interconnection networks," in IEEE Transactions on
Computers, vol. C-34, no. 10, pp. 943-948, Oct. 1985.
(URL, local copy)
Memory consistency models:
"Shared Memory Consistency Models: A Tutorial"
(PDF) -- Sarita Adve has done quite a few versions of this
sort of description
Modern atomic memory access instructions:
AMD64 atomic instructions
Transactional Memory has been a hot idea for quite a while.
Intel's Haswell processors incorporate a hardware implementation
described in chapter 8 of this
PDF (locally, PDF); but there were (still are) problems.
Replicated/Distributed Shared Memory: A very odd one is implemented in AFAPI as Replicated Shared Memory
Classical DSM: The best known is Treadmarks, out of Rice University.
One of the latest is DEX: Scaling Applications Beyond Machine Boundaries, which is part of
Memory system overview slides
Better pipeline management slides
Professor Hank Dietz would normally be in the
Davis Marksbury Building; see his home page for
complete contact info. Regular Zoom office hour times will soon
be listed there. He has an "open-door" policy that whenever his
door is open and he's not busy with someone else, he's available
-- and yup, there really is a slow-update live camera in
his office so you can check. However, during the pandemic
things are far less certain, and you should wear a mask if meeting
in his office. The best method to contact him is
to email email@example.com using "EE685" in the
subject line for anything related to this course. If
appropriate, individual Zoom meetings also can be scheduled via
Computer Organization and Design.