Aggregate Function Network Hardware

You've probably heard of a disturbingly large number of parallel supercomputer manufacturers having financial trouble, and some people might even say that parallel processing is dead, but that isn't true. What is true is that you cannot just go off for a few years building complex custom hardware for scientific applications and still be competitive with commodity computer technology.

In contrast, clusters or NOWs (Networks Of Workstations) seem to be a very cost-effective way to get lots of parallel compute power. However, a NOW using a conventional network really only gives you CPU power and block-transfer bandwidth. Traditional supercomputers have vastly superior ability for processors to closely coordinate their operation, perhaps even offering SIMD and VLIW execution modes.

As discussed in the Aggregate Function Network: Architecture & Theory page, our primary interest was in further improving the global coordination capability of parallel supercomputers. The problem was that, even with Purdue's tradition of building custom parallel hardware, we realized that we couldn't build a worthwhile full custom machine. And so it was that PAPERS came to look like a cluster or NOW... but it really isn't.

There are a variety of other projects working to reduce latency by building custom networks for clusters/NOWs supporting traditional message-passing, but that isn't what PAPERS is. We have no problem with using conventional network hardware for block transmission; it was, after all, designed for that. Thus, all that PAPERS adds to a cluster/NOW is support for tightly-coupled parallel processing... which a conventional-network cluster or NOW otherwise lacks even more desperately than most parallel supercomputers do.

In terms of programming, a PAPERS cluster behaves like a tightly-coupled parallel supercomputer. It just happens to be built by taking a conventional cluster or NOW and adding a PAPERS unit.

General Hardware Information

PAPERS Hardware Lineage Image Map (.html): This clickable in-line image map diagrams the lineage of the various different PAPERS designs and provides links to concise descriptions and photos of each of the units. It is intended to replace the slow-to-load photo-laden PAPERS Museum (see link below) as the best overview of how the PAPERS project hardware has evolved.
PAPERS Museum (.html): Since February 1994, we have built quite a few different versions of PAPERS. This hypertext document gives photographs and descriptions of each prototype, pictorially tracing the evolution of PAPERS.
Evolution Of PAPERS (up to PAPERS0) (.html, .pdf, .ps, .ps.Z, .ps.gz): T. Muhammad, Hardware Barrier Synchronization For A Cluster Of Personal Computers, Purdue University School of Electrical Engineering, MS Thesis, May 1995 (defended February 2, 1995; click here for the slides he used for his defense.)
Bitwise Aggregate Networks (.pdf, .ps): R. Hoare, H. Dietz, T. Mattox, and S. Kim. "Bitwise Aggregate Networks," to appear in Proceedings of SPDP, 1996. This paper describes some of the marvelous things that can be done using only a simple bitwise logic function to implement all aggregate functions... which is what all the TTL_PAPERS units do.
TTL_PAPERS 951201 Overview (.pdf, .ps): H. G. Dietz, R. Hoare, and T. Mattox, "A Fine-Grain Parallel Architecture Based On Barrier Synchronization," Proceedings of the International Conference on Parallel Processing, pp. 247-250, August 1996. This paper gives a good overview of both the generic TTL_PAPERS logic and how these systems can be scaled.
Parallel Port Information: Thus far, all PAPERS units are connected to machines via "standard" parallel port (SPP) connections, so you might want some general information on the SPP. Parallel Port Central has pointers to quite a few things. There is also an on-line description of the new IEEE 1284 parallel port standard, and an overview of the old SPP stuff in the IBM Parallel Port FAQ/Tutorial. We didn't write any of these documents and we did not necessarily follow their recommendations; we have kept PAPERS very simple and relied on empirical testing to confirm that what we did works with a wide range of parallel port hardware.
List Of PAPERS Vendors (.html): PAPERS is a public domain university research project, but PAPERS is also available as COTS (Commercial Off The Shelf) units. This hypertext document is intended to make it easier for potential users to locate vendors of PAPERS-related products. Note that the fact that a product is listed here in no way implies that either the authors of PAPERS or Purdue University endorse that product.

To Build A Several-Processor WAPERS

WAPERS, the Wired-AND Adapter for Parallel Execution and Rapid Synchronization, is a moderately-scalable fully passive network. In other words, this network hardware uses no active components, and consists entirely of a wiring pattern that takes advantage of SPP open-collector outputs to implement wired-AND logic. Two complete alternative designs for WAPERS are described in detail (.pdf, .ps, .ps.gz, .html).

WAPERS supports the full user-level AFAPI, and WAPERS AFAPI is included in the unified AFAPI distribution. The primary disadvantages of WAPERS are that it is lower performance than TTL_PAPERS, does not scale to very large clusters, and can fry your port hardware if things are not configured correctly. However, this is the simplest way to connect a cluster containing up to about 8 machines.

To Build A Two-Processor CAPERS

The cable shown in the photo is what we call CAPERS, the Cable Adapter for Parallel Execution and Rapid Synchronization. Although CAPERS differs from a standard "LapLink" in that CAPERS makes use of several additional ground wires, a "LapLink" cable can be used instead.

CAPERS is designed to passively connect the parallel ports of two PCs or workstations. Using just this cable, it is not possible to implement the OS interrupt handling support found in TTL_PAPERS; however, it is possible to efficiently implement the complete user-level AFAPI for the special case of just two machines. Because WAPERS is no more complex to build and allows larger clusters, CAPERS is now essentially obsolete.

To Build A Four-Processor TTL_PAPERS

The unit shown in the photo isn't the most sophisticated PAPERS version, but it is by far the most popular. Four processors are enough for reasonable experiments, and yet the unit is simple enough to be built in a day or two (well, at least we can do it that fast ;-). Although the unit uses just 8 TTL chips, it provides the complete TTL_PAPERS functionality.

The Full Plans (.pdf, .ps, .ps.Z, .ps.gz): H. G. Dietz, T. Muhammad, and T. I. Mattox, TTL Implementation of Purdue's Adapter for Parallel Execution and Rapid Synchronization, Purdue University School of Electrical Engineering, Technical Report, December 1994.
Overview (.html, .pdf, .ps, .ps.Z, .ps.gz): H. G. Dietz, T. M. Chung, T. I. Mattox, and T. Muhammad, Purdue's Adapter for Parallel Execution and Rapid Synchronization: The TTL_PAPERS Design, Purdue University School of Electrical Engineering, Technical Report, January 1995.

To Build A Scalable TTL_PAPERS

Although we have built two other types of modularly-scalable eight-processor TTL_PAPERS units, and those designs are fully operational, we no longer recommend them for new construction. The TTL_PAPERS 960801 four-processor module, shown in its test mounting, is functionally equivalent, but is somewhat easier to build, field scalable, and a lot easier to debug.

There were a variety of delays in getting the full documentation written, formatted, and posted, but the 960801 design has been solid since late 1996 - and we use it extensively at Purdue. The 960801 hardware documentation is available only as HTML with links to figures rather than in-line figures (i.e., be sure to print copies of the figures you need as well as the body of the HTML document).

The only thing set in stone is our name.