References: EE380 Parallel Processing

The book places emphasis on shared memory multiprocessors (SMP stuff) and cache coherence issues; read the book for that.

However, you also should be aware of SIMD (SWAR) and MIMD; Cluster, Farm, Warehouse Scale Computer, Grid, and Cloud; Latency, Bandwidth, and Bisection Bandwidth; network topologies including Direct Connections (the book calls these "fully connected"), Toroidal Hyper-Meshes (e.g., Rings, Hypercubes), Trees, Fat Trees, and Flat Neighborhood Networks (FNNs); Hubs, Switches, and Routers. The Spring 2020 slides I used for the material on high-end "supercomputer" architecture are posted as a PDF. These slides give a nice overview of cluster supercomputing (including the terminology) and also very briefly discuss GPUs. The talk presenting these slides is here, and includes a short virtual tour of Prof. Dietz's lab and supercomputing facilities in the Marksbury building.

You will find a lot of information about high-end parallel processing at aggregate.org. Professor Dietz and the University of Kentucky are leaders in this field, so Dietz has writen quite a few documents that explain all aspects of this technology. One good, but very old, overview is the Linux Documentation Project's Parallel Processing HOWTO; a particularly good overview of network topologies appears in this paper describing FNNs.

A quick summary of what things look like in Spring 2019:

Nearly all desktop/laptop processors are pipelined, superscalar, SWAR, implementations with 2-32 cores on each processor chip; Intel's Xeon Phi processors, with up to about 60 cores per chip and 512-bit SWAR, have been discontinued, but AMD is back in the game with a 32-core chip that looks very strong
Nearly all supercomputers are clusters and, since Fall 2017, virtually all 500 of the Top500 supercomputers are Linux clusters; also note that Asia/China now dominate the list, with 55.2% of the systems in November 2018
GPUs are appearing everywhere (although the HW/SW technology for them is still evolving); NVidia GPUs have come to dominate the high-performance computing market
The slow transition to integrating GPUs on the processor chip continues, as does the transition from IA32/AMD64 to ARM64 and there are ARM64 machines on the latest Top500 list
Clouds are a very popular way to handle applications that need lots of memory/storage, but not so much processing resource; there is a particularly strong push for software as a service with cloud subscriptions rather than software purchases
IoT (Internet of Things), the idea that everything should be connected, continues to develop, with various societal issues ranging from simple privacy and ownership rights issues to potentially life-threatening things like "car hacking"
Quantum computing has become a very intense research focus, but it still isn't clear it will ever be practical

One last note: Tesla's Full Self Driving Chip is a great example of supercomputing moving into mass-market devices

Computer Organization and Design.