Thus far, there have been six public demonstrations (research exhibits) of PAPERS systems at conferences/workshops:
This document provides a brief overview of our exhibits.
Our first public demonstration of PAPERS was at the International Conference on Parallel Processing in August 1994, but things were very informally arranged. Originally, we had hoped to demo a cluster as part of our presentation of the paper on the new barrier mechanism, but instead we were given permission to demo our PAPERS cluster off to one side during the wine and cheese party.
That cluster, shown above, consisted of Four IBM ValuePoint 486DX 33MHz running Linux connected by both the second TTL_PAPERS prototype and PAPERS1. We simply placed everything on a standard AV cart and wheeled it into the wine and chesse party. The party was noisy, so demonstrations were difficult, but this was the first full public demonstration of PAPERS.
Our first formal public demonstration of PAPERS was at the IEEE/ACM Supercomputing conference in November 1994.
Although none of us had previously arranged a formal research exhibit at a conference, we decided to apply for research exhibit space at Supercomputing to demonstrate several PAPERS clusters. Much to our surprise, we were awarded Research Booth R4, a full 20' by 20' space located next to the sign that introduced the research area of the exhibit floor. Prime real estate and a lot of it. After much perseveration (and several manufacturers breaking their promises of loaner machines and/or donations), we finally settled on showing the following four clusters.
Four DEC Alphas running OSF.
These machines were two 166MHz and two 233MHz units, all provided to
us by DEC as loaners for the show. These machines were truly state of
the art; the 233MHz machines were only announced a week before
Supercomputing. Unfortunately, we only had them for two weeks before
the show, and it wasn't easy to get OSF to let us directly access the
parallel port. Just in time, we were able to get direct port access
by compiling some of our code into the OSF kernel. TTL_PAPERS was
used to connect the machines for the demonstrations, which were
benchmarks of the PAPERS C library support routines (interactively
selected from a TCL/TK menu). The 233MHz boxes are standing upright
with the 166MHz ones horizontally on top of them, the TTL_PAPERS box
is barely noticable sitting to the right of the montor.
Eight Intel 386 running Linux
Unlike the DEC machines, these 386DX 33MHz systems were not exactly
state of the art; in fact, they were essentially discarded when Intel
upgraded one of our undergraduate labs to 486DX2 66MHz machines.
However, each of the 386 boxes has floating point hardware which we
have measured at about 4MFLOPS, so it isn't quite as bad as it first
sounds. The same is true of the $30 homemade wooden "rack mount" --
which recieved almost as much praise as our research work did. This
cluster was connected using an 8-processor version of PAPERS that is
similar to the November 1994 TTL_PAPERS, but essentially doubles the
communication bandwidth. The monitor on the right was displaying
continuous benchmark results for the complete PAPERS C library,
showing both the time and variation in time for each operation.
Two Four-Machine IBM Clusters
The last two clusters demonstrated in our booth consisted of four
machines each: a cluster of Four IBM PowerPC 601 80MHz running
AIX to the left, and a cluster of Four IBM ValuePoint
486DX 33MHz running Linux to the right (the plant-stand on
the near right held a display of the evolution of PAPERS through the
first six prototypes). The PowerPC machines didn't perform very well
and the parallel port access was very slow because we had to use an OS
call, but these machines were pre-release prototypes given to us as
loaners.... We connected them using one of the TTL_PAPERS units, and
ran a very simple demonstration showing barrier synchronization time
and variation in time. On the other hand, the ValuePoint machines are
very "normal" 486 systems, and we took advantage of this by actually
having two different versions of PAPERS simultaneously connecting the
cluster, with a wide range of demonstrations available for each unit
(partly because this was the cluster we demonstrated at ICPP).
Despite this, most of the time we simply had the ValuePoint cluster
run a SPMD program that plays multivoice music by assigning each new
note to be played by a randomly selected machine -- using PAPERS to
ensure that timing of the notes is preserved.
Since we had a paper on the PAPERS library at the 8th International Workshop on Languages and Compilers for Parallel Computing, at Ohio State University, we decided it might be a nice idea to bring a little hardware. So, just after the last presentation ended on August 12, 1995, we set-up our hardware and gave a brief demonstration....
We didn't want to make a big deal out of this, so we did everything using a pair of 486DX2/66 laptops (running Linux) and the November 1994 version of the four-processor TTL_PAPERS. No, that library wasn't designed to work with just two machines connected to a four-processor unit, but we made the appropriate change to the check-in procedure so that we could run a simple barrier speed benchmark. It is strange to see the unit working with two cables dangling.
We also demonstrated a few things with the TTL_LIB_950614 library's four-processor VAPERS simulator on one of the laptops... while the TTL_PAPERS hardware demo continued to run across the laptops.
Not many people saw our little demo; anyway, it's quality, not quantity, that counts. ;-)
For various reasons (including the ever-shrinking travel budget and the fact that I got back from LCPC 1995 just a day before), I was not expecting to attend ICPP this year... but I believe I've only missed one year since 1984, and 1995 wasn't it. The inspiration for me to drive up this time was a last-minute request that I fill-in for a missing panelist on "SPMD: on a collision course with portability?"
Of course, I couldn't resist the temptation to bring along the same two laptops and TTL_PAPERS unit that I had just demoed at LCPC 1995. As in 1994, I was given permission to demo the cluster off to the side during one of the evening parties; this time, it was the cocktail party. Not a very impressive demo; still, after two years at ICPP, people are getting used to the idea that PAPERS isn't a joke. Well, I did hear a few chuckles when people saw that the VAPERS simulator display not only duplicated the PAPERS unit's flashing lights, but also the wood grain....
Supercomputing 1994 had been a very positive experience for us and a lot of good exposure for Purdue and our work, so we wanted to do much the same for 1995. However, San Diego isn't a road trip like Washington D.C. was, so we had to cut back on the quantity of stuff being brought to the show. Thus, instead of a 20' by 20' booth with four clusters, we simply had a 10' by 20' booth (R22) with two clusters on one end and software displays on the other end.
The hardware side.
The hardware side of the booth displayed two clusters. One cluster
was the rather familiar group of four IBM ValuePoint 486DX33 machines
running Linux. The other four-machine cluster was built specifically
to be easy to carry around for demonstrations. Named the "TTL_PAPERS
Microcluster," it consists of a group of four Compaq Aero subnotebook
computers and a miniature oak rack mount.
TTL_PAPERS Microcluster.
Since they only have 486SX25 processors (without floating point
hardware), the microcluster machines are not fast. However, the
entire cluster only weighs about 30 pounds and fits within a 1' cube.
The miniature oak rack mount houses a four-machine TTL_PAPERS unit,
power supplies, and space to pack both an extension cord and the
cables for the TTL_PAPERS unit. For travel, there is an oak top plate
that secures the laptops and provides a shoulder strap, while a plate
with three wheels attaches to the bottom with velcro... in summary,
it can go anywhere and can even be run using battery power.
Heterogeneous cluster.
Because Supercomputing '95 also marked the release of our scalable
eight-machine TTL_PAPERS 951201 design, all day December 6, 1995, we
demonstrated an eight-machine cluster using the obviously
heterogeneous combination of the four ValuePoints and the four Aeros.
A variation of the multi-voice music demo clearly demonstrated the
tight coupling of machines within this cluster.
PAPERS history.
In addition to the TTL_PAPERS units that were operating in our booth,
there was a TTL_PAPERS 950801 unit and a wooden plant rack holding
various earlier PAPERS prototypes. While we are on the topic, the
quality of the woodwork for the PAPERS cabinets was yet again heading
the list of comments from visitors to our booth... maybe there is a
message there for commercial computer vendors...?
The software side.
On the other side of our booth, just past the circle of chairs gathered
around the ValuePoint cluster, were two tables for the software
demonstrations for our booth. The KIWI project took
one table, the TTL_PAPERS (and TTL_VAPERS simulator) library took the
other. We also had a handout on the new giveioperm()
system call for secure direct port access under Linux.
PAPERS booth people.
Well, after the description of what we did in our booth, it's kinda
nice to have a photo of the folks who made the booth happen. From
left to right, the PAPERS booth people are R. Hoare, R. Fisher, T.
Mattox, S. Kim, and H. Dietz.
Incidentally, a lot of things are available on-line from the Supercomputing 1995 conference. The complete proceedings, abstracts for the exhibits, etc., are available from http://sc95.sdsc.edu/SC95.
Supercomputing 1994 and 1995 both were good experiences for us, and we had much more to show for November 18-21, 1996. Because the 1996 conference was in Pittsburgh, which is well within "road trip" range, we decided to bring lots of equipment. We were given a 20' x 20' display area (booth R24), and filled it with as much as we could easily carry in our 15-foot rental truck... but we move our clusters in their racks, so quite a lot of stuff fit.... Our research booth held 37 separate computers and 27 monitors; this was apparently more separate machines and video displays than any other research or commercial exhibitor.
Exhibit Overview.
Because our booth faced a corner of the exhibit hall, we really didn't
have a "front side" to our booth. Thus, we set-up our area so that
people could move from display to display in a circle within
the booth. From the left edge of the above photo, we had the 16 PC
VGA video wall, SMP demos, 4 PC VGA video wall, history display,
Pentium cluster, and CASLE demos.
16 PC VGA Video Wall.
The prime demonstration within our booth was the large video wall
constructed as a 4 x 4 array using the VGA displays of 16 PCs. Each
machine was connected only to its own VGA display, and the new field
scalable 960801 PAPERS units were the only connection between
the machines of the cluster. Our demos ranged from modified "screen
savers" that treat the wall as a single display (a Qix-like one is
shown above), synchronously drawing each point or line, to an
interactive video game in which up to four players battle, each using
a mouse to control the leader of their swarm. We also had a variety
of pieces of multi-voice music playing across the PC speakers, with
each new note given to a processor selected at random. Further, a
new, somewhat crude, mini-OS running on top of Linux was used to
control the execution of the cluster.
4 PC VGA Video Wall.
The bad news about the 16-machine cluster is that it was built using
386-based machines... including eight 386DX25 PS2 systems that were
incredibly slow for floating point (no hardware) and had very slow MCA
parallel ports (4us per register access). The four 486DX33 machines
of the 4 PC VGA video wall are no speed demons, but they were able to
run things like our combined N-body/thermal decay trace simulation in
addition to the things we built for the 16 display wall. In the photo,
they are running our four-player swarm video game.
Mandelbrot Demos.
Another interesting thing we did for the first time at Supercomputing
1996 was a side-by-side comparison of the new AFAPI (Aggregate Function
Application Program Interface) running on both a PAPERS cluster and an
SMP Linux box, as shown above. The application we used was a fully
dynamically load-ballanced shared-memory version of Mandelbrot fractal
computation, using AFAPI Replicated Shared Memory to implement a
shared display map and an array used to asynchronously claim each scan
line. Although AFAPI works well on both systems, the four Pentium 90s
in the 960801 cluster were much faster than the two Pentium 100s
within the SMP... even about 20% faster per processor. Why? Well,
the SMP has significant memory system interference between
processors....
History Display.
Supercomputing '96 had the subtheme of marking the 50th anniversary of
the field, so history displays were strongly enouraged. Thus, we
expanded the PAPERS history to include not only all previous PAPERS
models, but also a write-up summarizing some of Purdue ECE's major
contributions to the field of parallel processing. This write-up is
available on-line both as HTML and as
separate postscript files for the front and
back sides of the one-page handout.
The CASLE Project.
We also demonstrated CASLE on both Linux PCs and DEC Alphas. CASLE is
the Compiler/Architecture Simulation for Learning and Experimenting, a
teaching tool that allows undergraduate students to develop an
understanding of the total system impact of interactions between
compiler optimizations and (modestly parallel) architectural features.
More information about CASLE, including a live system that you can use
with any WWW browser, is available at http://purcell.ecn.purdue.edu/~casle/.
Booth people.
Well, after the description of what we did in our booth, it's kinda
nice to have a photo of the folks who made the booth happen. From
left to right, the PAPERS booth people are T. Mattox, R. Hoare, R.
Fisher, and S. Kim, with the two faculty, H. Dietz and G. Adams,
sitting in front. Of course, in addition to us, we like to thank the
over 500 people who visited our research exhibit....
The Polaris Project.
This year, Purdue's presence at Supercomputing was not limited to our
exhibit. Prof. Rudy Eigenmann also organized a second 20'x20'
research exhibit, for the Polaris source-to-source parallelizing
Fortran compiler. More information about the Polaris project is
available at http://dynamo.ecn.purdue.edu/~eigenman/polaris/.
Incidentally, a lot of things are available on-line from the Supercomputing 1996 conference. The complete proceedings, abstracts for the exhibits, etc., are available from http://scxy.tc.cornell.edu/sc96/.
The next public demonstration of PAPERS at a conference has not happened yet... but when it does, it will be listed here. Click here to go back to the main listing.