Assignment 0: EE686 Advanced Computer Architecture Design

Use this WWW form to submit your assignment. You may submit your assignment as many times as you wish; when the submission deadline is reached, your last recorded submission will be the one counted for grading.

Your email address is .
Your password is .

Which one of the following five statements about SIMD enable handling is most generally true?
A disabled processor can only be enabled by rebooting the machine
Disable of a PE can be simulated using arithmetic operations
A SIMD machine runs faster when more PEs are disabled
The hardware requires that at least one PE is enabled
A disabled PE consumes no power
Many SIMD architectures have used bit-serial processing elements, but many other SIMD architectures have not. Briefly explain how a bit-serial design might achieve higher performance than a SIMD machine using wider PEs.
Many SIMD architectures have incorporated either a "Global OR" network or a similar network serving the same purpose. What is the primary purpose of such a network?
Communication of data values between PEs
Determining how to route data between PEs
Performing arithmetic operations in the PEs
Telling the CU if any PE should be enabled
Performing I/O with the outside world
Nested SIMD where (or if) constructs can be implemented using a bit stack to track enable states. However, the bit stack implementation is not very efficient, especially on machines that directly support operations on multi-bit words. The activity counter mechanism can be much more efficient. In the most literal sense, what does the integer value of an activity counter really count?
Many SIMD machines connect each PE to its neighbors in a K-dimensional mesh -- most often, with K=2. One would expect a 2D mesh to connect only four neighbors using four wiring paths, however, "X Net" connections connect to 8 neighbors with only four wiring paths leaving each PE. Briefly explain how this is possible.
Which of the following is not associated with SWAR?
Operations like addition are implemented by cutting the carry chain in the appropriate places
Comparison operations don't set condition code registers, but set mask values
The SWAR datapath is usually fairly wide, currently most often 64 or 128 bits
Each PE can fetch from a memory address it provides
Parallelism width is a function of data object width
Poly (or plural) data reside in the PE memories. Mono (or scalar) data reside in the CU of some machines, but actually are in the PE memories of some SIMD systems. Given that they are both stored in the PE memories, what is really different about poly and mono data?
GPUs implement a "virtualized SIMD" model that some folks at Berkeley have been calling a "streaming" model. How does this model differ from more traditional non-virtualized SIMD models? (I.e., what is being virtualized?)
Name one commercially marketed SIMD machine that had bit-serial PEs. Also name one commercially-marketed SWAR instruction set.
Give C code using SWAR techniques to sum four 8-bit unsigned values packed into the 32-bit unsigned integer x; the resulting sum should be stored in the 32-bit unsigned integer y.

Although this is not a secure server, users are bound by the UK code of conduct not to abuse the system. Any abuses will be dealt with as serious offenses.

Advanced Computer Architecture Design.