This is the home page for our 15th major research exhibit at the IEEE/ACM Supercomputing conference. The exhibit is again under the title Aggregate.Org / University of Kentucky, the informal research consortium led by our KAOS (Compilers, Hardware Architectures, and Operating Systems) group here at the University of Kentucky's Department of Electrical and Computer Engineering.
The big thing in our exhibit this year is a technical demonstration consisting of a large wooden maze with four balls in it. Each of the colored balls has a different path to take (MIMD), yet it is perfectly feasible to get all the balls to their respective destinations by a series of tilts of the table (SIMD). Yes, you really can execute MIMD code on SIMD hardware with good efficiency... and that's what our latest software does for GPUs (Graphics Processing Units). Specifically, it can take shared-memory MIMD code written in C and efficiently execute it on an NVIDIA CUDA GPU.
Why do this? GPUs thus far have not had stable, portable, programming support for general-purpose use, so there is virtually no code base for supercomputing applications. Our technology allows codes written for popular cluster and SMP target models to be used directly. We plan to support both C and Fortran with both shared memory and MPI message passing. The current shared memory model uses "parallel subscripting" in which a[||b] means a in processor b's memory; we initially assumed OpenMP would be the prefered model, but have had requests for Posix Threads. It is surprisingly easy to support dynamic thread creation, although there are performance issues involving memory bank conflicts in sharing the complete memory map.
How well does MIMD code perform? It is too early to give a definitive answer, but there are two different ways to run, and they have very different performance. The MIMD On GPU Simulator, mogsim, gets the same order of magnitude performance as the host running optimized native code, with macho GPUs around 8X the host and wimpy ones about 2X slower than the host. The MIMD On GPU Meta-State Converter, mogmsc, generates pure native code for the target GPU -- with no interpreter overhead -- and is as much as 100X faster than the simulator.
The one-page technical overview PDF is A Maze Of Twisty Little Passages. We have prepared a MOG homepage which contains more technical details. The following are the two key publications from when we invented the basic MIMD-on-SIMD technology more than a decade and a half ago -- the MOG environment is heavily based on this work.
There also has been significant work done (an MS project completed) on dynamic underclocking of GPUs to maintain a desired energy use/temperature profile, but this was not incorporated into any of the handouts.
The camera is mounted on top of the sign in the front right corner of our exhibit, facing toward the Indiana University exhibit. Later, we'll post a time-lapse movie.