Home Page of Fall 2024 EE685
This is the home page for EE685. There is only one
section, meeting TR 3:30-4:45PM in room 267 FPAT.
Although many course materials will be posted here, the course
Canvas site will be the primary place for course announcements.
You are expected to use Canvas and materials linked here. The
best way to schedule office meetings, physical or virtual, is
via email with "EE685" in the Subject line.
The precise course content of EE685 has varied quite a bit over
the years. However, in general, it has tried to be a step
beyond the material covered in the CPE380-CPE480 sequence, and
so it will be this semester. What that means is a little
different from past offerings because CPE380 and CPE480 have
both seen major upgrades and internal restructurings over the
past two years. In particular, CPE380 is now fairly Verilog
intensive and students who took CPE380 in previous years, or who
took a similar course elsewhere, might not have that Verilog
exposure. Thus, some of the material in this EE685 overlaps the
latest (Fall 2024) CPE380 coverage to ensure that students are
comfortable with Verilog and the basic CPE380 concepts. Even in
the review-like portions this course introduces some more
advanced concepts, and it quickly moves to broader and deeper
material.
The following list of reference materials below will grow
dramatically as the course progresses. All links below the
horizontal rule have yet to be updated for Fall 2024 and are
likely to change significantly.
-
The syllabus, which has
been updated for Fall 2024.
-
The introductory slides
-
An overview of Verilog
-
Icarus
Verilog (iverilog and vvp) is the primary
tool we'll be using for compiling and simulating Verilog code.
It is actually part of gEDA. Note that you can install it on
Ubuntu Linux systems by simply selecting it in the Software
Center or Synaptic -- it's a standard part of the Ubuntu
distribution, as well as having been ported to Windows, etc.
-
Icarus Verilog Simulator CGI Interface created by
Professor Dietz and used for Verilog code throughout this course
-
EDA playground is an alternative WWW
interface for running Icarus Verilog...
and various other tools including some commercial simulators.
Requires Log In for use, but registration is free.
-
ASIC World has a multitude
of really nicely prepared materials showing how to use Verilog
-
A really fast review of CPE380/480 up to the point of a simple pipelined Verilog implementation of MIPS
-
Slides introducing the simple MIPS
SIMD project instruction set
-
Papers describing basic (pronounced "old") SIMD
(Single Instruction stream, Multiple Data
stream) stuff. Notice that traditional SIMD is often
bit-serial and extremely simple per processing element.
-
Architecture of a massively parallel processor (PDF).
This paper describes Ken Batcher's SIMD MPP design at Goodyear Aerospace.
-
DAP -- a distributed array processor (PDF).
This paper describes the ICL DAP, another early SIMD machine.
-
Thinking Machines CM-2 (PDF). A (relatively late) version of the "Connection
Machine Model CM-2 Technical Summary, Version 6.0, November
1990." This includes description of the (CM-200) floating-point
hardware to the design.
-
KY Architecture Nanocontrollers. This was really introduced in Much Ado about
Almost Nothing: Compilation for Nanocontrollers (slides, full paper). This is
essentially answering the question: how simple can a SIMD PE be?
-
A Quantum-Inspired Model For Bit-Serial SIMD-Parallel Computation (slides, full paper). There
is also the Tangled processor design from CPE480 Fall
2020. For now, we'll ignore the quantum stuff; it's a bit-serial SIMD
with a few interesting twists.
-
Activity Counter Implementation Of Enable Logic
(PDF).
This paper describes a clever method for handling
tracking of nested SIMD enable/disable without use of a bit
stack.
-
Basic SWAR Architecture & Concepts
-
One of the first talks on the concepts of SWAR was
Multimedia Extensions For Microprocessors:
SIMD Within A Register
(HTML,
PDF),
which I originally presented in February 1997 at Purdue University.
The HTML is a little ugly, but this is the original HTML,
and the server it was on supported different server-side
processing....
-
One of the best generic descriptions of the concepts of SWAR was
Compiling for SIMD within a Register
(PDF),
which is linked from Springer-Verlag using UK's EZProxy access
-
An Introduction to Modern GPU Architecture,
a very nice, if old, set of overview slides from NVIDIA
-
Mark Harris 2007 slides on reduction optimization
It is useful to note that there is now even better efficiency possible
using
warp shuffle, and lots of optimized functions are now available
using CUB
-
The atomic primitives are described in this section of the CUDA-C programming guide.
Here are slides from NVIDIA overviewing their use.
-
A bit about AIK (Assembler Interpreter
from Kentucky). This assembler-interpreter was created
explicitly for EE480 by H. Dietz. It was built using the
PCCTS/Antlr tools (which were created by Dietz's group back in
the early 1990s). Given a very concise specification of an
assembly language, it interpretively implements a relatively
full-featured assembler.
-
Papers describing MIMD (Multiple Instruction stream, Multiple Data stream) stuff.
-
We just completed SIMD, so here's how to make MIMD code execute on SIMD (GPU)
hardware. I've done a lot of this; see here. Slides overviewing
how this works are also available.
-
Henry G. Dietz and Thomas Schwederski, Extending Static
Synchronization Beyond SIMD and VLIW, Purdue University School of
Electrical Engineering, Technical Report TR-EE 88-25, June 1988.
(local PDF,
PDF).
-
Repetition Filter Memory in CHoPP:
A. Klappholz. (1981). IMPROVED DESIGN FOR A STOCHASTICALLY CONFLICT-FREE
MEMORY/INTERCONNECTION SYSTEM.. 443-448. Paper presented at Conf Rec Asilomar
Conf Circuits Syst Comput 14th, Pacific Grove, CA, USA.
(still looking for copy of this or other relevent article...)
-
Fetch-&-Add in the NYU Ultracomputer:
A. Gottlieb,
R. Grishman,
C.P. Kruskal,
K.P. McAuliffe,
L. Rudolph, and
M. Snir, "The NYU Ultracomputer -- Designing an MIMD Shared Memory Parallel
Computer" in IEEE Transactions on Computers, vol. 32, no. 02, pp. 175-189,
1983. doi: 10.1109/TC.1983.1676201 (URL,
local copy)
-
"An Overview of the NYU Ultracomputer Project (1986)"
(PDF) is a better, but more obscure, reference
-
Explanation of the "Hot Spot" problem for RP3:
G. F. Pfister and V. A. Norton, "``Hot spot'' contention
and combining in multistage interconnection networks," in IEEE Transactions on
Computers, vol. C-34, no. 10, pp. 943-948, Oct. 1985.
(URL, local copy)
-
Memory consistency models:
"Shared Memory Consistency Models: A Tutorial"
(PDF) -- Sarita Adve has done quite a few versions of this
sort of description
-
Modern atomic memory access instructions:
AMD64 atomic instructions
-
Transactional Memory has been a hot idea for quite a while.
Intel's Haswell processors incorporate a hardware implementation
described in chapter 8 of this
PDF (locally, PDF); but there were (still are) problems.
-
Replicated/Distributed Shared Memory: A very odd one is implemented in AFAPI as Replicated Shared Memory
-
Classical DSM: The best known is Treadmarks, out of Rice University.
One of the latest is DEX: Scaling Applications Beyond Machine Boundaries, which is part of
Popcorn Linux
-
Memory system overview slides
-
Better pipeline management slides
Course Staff
Professor Hank Dietz would normally be in the
Davis Marksbury Building; see his home page for
complete contact info. Regular Zoom office hour times will soon
be listed there. He has an "open-door" policy that whenever his
door is open and he's not busy with someone else, he's available
-- and yup, there really is a slow-update live camera in
his office so you can check. However, during the pandemic
things are far less certain, and you should wear a mask if meeting
in his office. The best method to contact him is
to email hankd@engr.uky.edu using "EE685" in the
subject line for anything related to this course. If
appropriate, individual Zoom meetings also can be scheduled via
email.
Computer Organization and Design.