References: CPE380 Parallel Processing

To begin, remember that supercomputing is fundamentally about extreme scale use of parallel processing, but parallel processing is really the key to high performance in any modern computer system. The amount of parallel processing used inside a typical cell phone exceeds that used in most supercomputers less than three decades ago. Looking at supercomputers gives a glimpse of what's coming to more mundane systems sooner than you'd expect.

The lecture slides as a PDF provide a good overview of everything you need to know, and a fair bit more. For example, a short virtual tour of Prof. Dietz's lab and supercomputing facilities in the Marksbury building is at the end of those slides, but is not material that would be on an exam. There are also these few slides showing some processor architecture diagrams for various commercial processors; we reviewed these from the point of view of being able to identify what features are implemented in each.

The textbook places emphasis on shared memory multiprocessors (SMP stuff) and cache coherence issues. We covered a bit of that here, and coherence basics in memory systems, but it's a small piece of the whole pie because these systems really don't scale very large. That said, AMD is pushing 256 cores.

More generally, you should be aware of SIMD (including GPUs and the not-so-scalable SWAR/vector models) and MIMD, and also the terms Cluster, Farm, Warehouse Scale Computer, Grid, and Cloud. In the discussion of interconnection networks, you should be aware of Latency, Bandwidth, and Bisection Bandwidth, as well as some understanding of network topologies including Direct Connections (the book calls these "fully connected"), Toroidal Hyper-Meshes (e.g., Rings, Hypercubes), Trees, Fat Trees, and Flat Neighborhood Networks (FNNs), Hubs, Switches, and Routers. The concept of quantum computing as a form of parallel processing without using parallel hardware was also very briefly introduced.

You will find a lot of information about high-end parallel processing at aggregate.org. Professor Dietz and the University of Kentucky have long been leaders in this field, so Dietz has writen quite a few documents that explain all aspects of this technology. One good, but very old, overview is the Linux Documentation Project's Parallel Processing HOWTO; a particularly good overview of network topologies appears in this paper describing FNNs.

A quick summary of what things look like in Spring 2022:

One last note: Tesla's Full Self Driving Chip is a great example of supercomputing moving into mass-market devices


CPE380 Computer Organization and Design.