References: EE380 Pipelined Design
We cover a lot of material here... and I'll be adding quite a
bit to what's posted here. The book also does a nice job on
this... read it. I don't expect you to be able to design a
state-of-the-art pipelined architecture from our quick and
somewhat superficial coverage, but you should have a basic
understanding of:
-
Basic pipelining
-
Dividing a single-cycle design into equal-delay stages,
adding buffers between stages
-
Control by moving single-cycle control signals
through the pipe stages along with data
-
Pipeline performance and the concepts associated with pipeline
bubbles (NOP insertion, hardware interlocks)
-
The MIPS pipeline discussed in class and in the text
(There's also a more interactive way to get used to this design:
DLXview, an interactive simulator for the DLX pipeline
structure, which is based on the MIPS pipeline in the text. It
does a lot more than you need for this class, but it is very
cool, using colors to indicate which instruction is at which
stage of the pipe.)
-
The concept of "superscalar" pipelining --
feeding multiple pipelines simultaneously
-
Structural hazards & how to fix them (e.g., add hardware)
-
Data dependence issues:
-
Read-after-write dependences and value forwarding
-
Write-after-write and write-after-read dependences
and register renaming
-
compile-time code scheduling and
hardware scheduling (out-of-order execution)
-
Control dependence issues:
-
Computation of branch target addresses (from offsets),
delayed branches, and BTBs (Branch Target Buffers)
-
Branch prediction; always not-taken, always taken, always taken
AND not taken, always taken if backward, compiler-marked
instructions for branch-usually-taken or
branch-usually-not-taken, and history schemes (e.g., using a
Branch History Buffer -- BHB) such as the two-bit (four state)
predictor discussed in class
-
The issues involving side-effects for instructions that
were incorrectly executed (incorrectly predicted)
-
How to make sense of pipeline structures in processor
architecture diagrams like these.
At the end of this chapter, we spend a bit of time overviewing
the Compaq Alpha, Intel Pentium III, 4, M, Core and Itanium; and
AMD Athlon, Opteron, and Athlon 64. Here are a few pointers to
good reference materials overviewing them:
-
This article has a very detailed walk through of Intel and
AMD's latest processors at the time it was written.
-
This set of slides details how the then-upcoming AMD Hammer
processors (AKA, Athlon64 and Opteron) works. Although
there are many improvements, the inclusion of
memory/interprocessor access in the processor pipeline is
perhaps the biggest innovation. BTW, Fred Weber, the author of
these slides, is also the guy who told me that ISAs don't matter
anymore... these slides make it pretty obvious how he can say
that.
-
This article reviews the basic structures of the Athlon,
Alpha 21264, and Pentium III.
-
Tom's Hardware is often an excellent place to discover obscure
details about PC stuff. They have two articles on the Pentium
4: this preliminary one goes through everything and
essentially says "good for games." They then issued this one, saying, well, not even so good for most games.
Actually, the Pentium 4 turns out to be mostly good for
LinPack and other floating-point programs that do LOTS of
SSE2 floating-point multiply-accumulate instructions inside a
loop small enough to fit within the trace cache... on that
code, the high clock rate makes Pentium 4 processors virtually
unbeatable.
IMHO (not necessarily that of UK), Intel has been taking
advantage of the fact that people who haven't had a course like
380 generally don't know that faster clock speed alone doesn't
mean higher performance. Of course, Intel also has been burned
by the opposite situation WRT IA64: a superior architecture
with a slower implementation doesn't do so well either. It also
is worth noting that the Pentium M (AKA, the processor in the
"Centrino technology" package) actually outperforms the Pentium
4 on many codes by taking the same kind of
more-work-per-slower-clock-cycle approach that AMD has taken; at
this writing, most of Intel's IA32 future is based on using
processor cores heavily influenced by the Pentium M.
Computer Organization and Design.