slide 2 of 35
To be able to achieve speedup through parallel execution, a program must be able to be decomposed into sections of code that can execute simultaneously, each on its own processing element. However, simultaneous execution does not imply independent execution; all sections of the code are really working together to produce the program's unified, coherent, final state. The odd thing is that most parallel computing hardware and software models do not directly provide means for accessing this global, aggregate, information.
PAPERS (Purdue's Adapter for Parallel Execution and Rapid Synchronization) is fast, simple, hardware that provides such access for a cluster of PCs or workstations. This talk begins by outlining the PAPERS hardware and the synchronized aggregate communication library developed for it. This very strong execution model can be used to efficiently implement a variety of weaker (but more familiar) models, including SIMD and VLIW control flow, coherent shared memory, and even asynchronous message-passing. Both the implementations and performance measurements will be presented.