References: EE380 Parallel Processing (Chapter 9)

The book places emphasis on shared memory multiprocessors (SMP stuff) and cache coherence issues; read the book for that.

However, you also should be aware of SIMD (SWAR) and MIMD; Cluster, Farm, Grid; Latency, Bandwidth, and Bisection Bandwidth; network topologies including Direct Connections (the book calls these "fully connected"), Toroidal Hyper-Meshes (e.g., Rings, Hypercubes), Trees, Fat Trees, and Flat Neighborhood Networks (FNNs); Hubs, Switches, and Routers. The slides I used for the material on high-end architecture are posted as a PDF. These slides give a nice overview of cluster supercomputing (including the terminology).

You will find a lot of information about high-end parallel processing at aggregate.org. Professor Dietz and the University of Kentucky are leaders in this field, so Dietz has writen quite a few documents that explain all aspects of this technology. One good overview is the Linux Documentation Project's Parallel Processing HOWTO; a particularly good overview of network topologies appears in this paper describing FNNs.

Over the past few years, the desktop market has been slowly moving to 64-bit processors. Here's the status as of Spring 2007. The early favorite was the Intel Itanium Architecture (IA64), which has not disappeared, but certainly is not the lead player. AMD's 64-bit instruction set, AMD64 (x86-64), has essentially won the battle, partly because it is fully IA32 compatible and partly because the Athlon64 and Opteron implementations are very good. In fact, they were too good for Intel to ignore, and in early 2004, Intel quietly began shipping what they carefully avoided calling an AMD64-compatible P4 design; the Intel name is IA32 with EM64T. Early discussions of EM64T called it 64-bit emulation technology, essentially giving it second class status relative to IA64; however, the latest Intel pages talk about "Intel Extended Memory 64 Technology" now having the official name "Intel 64" architecture. As of Spring 2007, Intel's name change hasn't taken -- the instruction set is still most often called either AMD64, x86-64, or EM64T. Only time will tell if Intel is able to rewrite history and get Intel 64 to be the accepted name instead of AMD64. In the meantime, the Apple/IBM G5, which had once been the third serious contender, has faded as Apple has shifted to using Intel EM64T processors, largely due to delays in dual-core G5 chips for laptops.

The movement to multi-core processor chips is now going full force; it soon will be impossible to find single-core desktops. As of Fall 2006, dual-core chips have become the norm, with the expectation of more cores per chip over the next few years -- pretty much what everbody had predicted. An early article describing the then-soon-to-be-released AMD dual-core processors and contrasting them with those Intel had already released is here. Basically, AMD had an architectural edge in making modest numbers of cores on a chip work well, but as of Spring 2007, Intel has at least caught up. In fact, the latest Core2 chips (an architectural diagram is here) are faster than AMD's offerings for many applications; using a similar hack to how Intel beat AMD to shipping dual-core in a single socket, Intel also has become first to ship quad core. Power consumption also has gone down, partly by dropping clock rates and partly by adopting laptop versions of processors for desktop use. Intel is shipping quad core desktop parts that use only 50W -- they are really two 25W dual core laptop processors in a single socket.

The biggest recent surprise is that AMD bought ATI in Fall 2006, and announced that ATI's GPUs will be fully integrated on-chip with AMD's processors by 2008 (and very likely sooner). Perhaps in preparation for that, ATI opened-up the instruction set (CTM -- Close To the Metal) of their latest GPUs (Graphics Processing Units) to facilitate use of the GPUs for non-graphical computation. CTM is a big step in that it doesn't use GPU terminology and ideas in describing the architectural structures; it is really general-purpose virtualized SIMD parallelism. NVidia also has taken some baby steps in this direction, keeping the instruction set proprietary, but announcing a new C-based language and compiler (CUDA -- Compute Unified Device Architecture) also aimed at "GPGPU" (General Purpose GPU) computing. You can expect a lot of activity along these lines over the next few years: adding specialized parallel hardware to multi-core chips.


EE380 Computer Organization and Design.