This paper describes a clever method for handling nested tracking of nested SIMD enable/disable without use of a bit stack.
@inproceedings{ keryell93activity, author = "Roman Keryell and Nicolas Paris", title = "Activity Counter: New Optimization for the Dynamic Scheduling of {SIMD} Control", booktitle = "Proceedings of the 1993 International Conference on Parallel Processing", volume = "II - Software", publisher = "CRC Press", address = "Boca Raton, FL", pages = "II--184--II--187", year = "1993", url = "citeseer.ist.psu.edu/keryell93activity.html" }
This paper describes the ICL DAP, another early SIMD machine.
@inproceedings{803971, author = {S. F. Reddaway}, title = {a distributed array processor}, booktitle = {ISCA '73: Proceedings of the 1st annual symposium on Computer architecture}, year = {1973}, pages = {61--65}, doi = {http://doi.acm.org/10.1145/800123.803971}, publisher = {ACM Press}, address = {New York, NY, USA}, }
This paper describes Ken Batcher's SIMD MPP design at Goodyear Aerospace.
@inproceedings{285977, author = {Kenneth E. Batcher}, title = {Architecture of a massively parallel processor}, booktitle = {ISCA '98: 25 years of the international symposia on Computer architecture (selected papers)}, year = {1998}, isbn = {1-58113-058-9}, pages = {174--179}, location = {Barcelona, Spain}, doi = {http://doi.acm.org/10.1145/285930.285977}, publisher = {ACM Press}, address = {New York, NY, USA}, }
A (relatively late) version of the "Connection Machine Model CM-2 Technical Summary, Version 6.0, November 1990." This includes description of the (CM-200) floating-point hardware to the design.
The SWAR slides I used in class... originally from a talk given in February 1997 at Purdue University.
One of the best generic descriptions of the concepts of SWAR. The above link is direct from Springer-Verlag.
@inproceedings{663771, author = {Randall J. Fisher and Henry G. Dietz}, title = {Compiling for SIMD Within a Register}, booktitle = {LCPC '98: Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing}, year = {1999}, isbn = {3-540-66426-2}, pages = {290--304}, publisher = {Springer-Verlag}, address = {London, UK}, }
This site contains a variety of news, paper links, etc., about use of GPUs (Graphic Processing Units) for General-Purpose computing -- commonly known as GPGPU. Note that general-purpose is a misnomer; it is really about programming GPUs for tasks that are not entirely graphical.
The first paper on ATI's CTM (Close To the Metal) software interface to GPUs (Graphics Processing Units) for general-purpose computing. Referenced directly from ATI's site, which is now part of AMD's site. There are also slides and a full manual at the ATI/AMD site.
The first paper on nanocontrollers -- bit-serial SIMD-style hardware for use in control of massively parallel arrays of sensors, actuators, and other devices.
@article{359336, author = {Richard M. Russell}, title = {The CRAY-1 computer system}, journal = {Commun. ACM}, volume = {21}, number = {1}, year = {1978}, issn = {0001-0782}, pages = {63--72}, doi = {http://doi.acm.org/10.1145/359327.359336}, publisher = {ACM Press}, address = {New York, NY, USA}, }