References: CPE380 Performance Analysis
The lecture slides on this material are
here as a PDF.
This material is largely in the end of the first chapter of
the textbook.
Make sure you are comfortable with the tabular breakdown of
expected instruction execution counts, CPIs, and clock period.
Other stuff:
-
The most prominent benchmark is HPL (High
Performance Linpack), which solves systems of linear
equations mostly by doing lots of matrix multiplies. This is a
particularly "supercomputer friendly" benchmark because
performance of communication between PEs becomes less important
as the problem is scaled-up, and the benchmark allows scaling
the problem as big as can fit in the machine rather than timing
the same-size problem on all machines. The results are reported
as FLOPS obtained in running the benchmark. The Top 500 supercomputers in the
world by this metric is a list everyone watches closely... which
has been good for UK in that machines operated by CCS historically placed well
on it, which is true of fewer than ten US universities. That
said, UK is not on the most recent list.
-
It is worth noting that the machines on the Top 500 list have recently
made a turn toward the huge -- with lots of machines now
having more than a million cores! Although many-core
chips (mostly GPUs) are now common in these machines, we're
still on a plateau where scaling up is largely a matter of money
rather than new technology, and it seems there's always more
money for matters of national pride (i.e., countries fighting
for positions at the top of the list). Additionally, when the
price/performance improvements due to new technologies slow,
budget for big machines in general seems to go up to compensate.
Machine cost is not listed, but has definitely gone up sharply
over the last decade. You should notice that the US has a lot
of machines on the list, but right now we don't have the
fastest... and we often don't.
-
A nice reference for standard benchmarks is SPEC, the Standard Performance
Evaluation Corporation. The text has always been fond of SPEC,
but it's good to understand that there are many
benchmark suites out there, and how much they really matter to
you depends on how much your application(s) look like them. For
example, the HPL benchmark is very intensely using
double-precision floating-point multiply-add, but doesn't even
count integer operations. Many applications are dominated by the
performance of integer, or even character, processing.
-
Here's another interesting tidbit: the US government has a
variety of metrics that they use for determining if a computer
can be exported to a particular country. For example, President
Obama set the export limit as 3.0 TFLOPS on March 16, 2012. It's
getting hard to control spread of computing technology given the
dominance of cluster supercomputers built using largely
commodity parts. A document describing how to measure
performance for export control is A PRACTITIONER'S GUIDE TO ADJUSTED PEAK PERFORMANCE from
the U.S. Department of Commerce Bureau of Industry and Security.
The CAPITALIZATION of that title is theirs... I guess this is
important enough to shout about? ;-)
Computer Organization and Design.