KySMet Reference: EE480 Advanced Computer Architecture

Instruction set design is hard. Prof. Dietz has designed dozens of instruction sets in the three decades he's been a professor, and it still isn't easy for him to get things right. Thus, rather than giving you complete freedom to design your own instruction set, we're going to walk through the design logic for a reasonably well-crafted one that he built specifically for Spring 2018 EE480. However, this design is not complete -- each student must devise their own encoding of the instructions implement their own assembler.

KySMet Overview

KySMet is a slightly strained acronym for KentuckY Simd Machine with --et added to mean "diminutive" -- a wimpy version of such a machine. SIMD is a nested acronym meaning Single Instruction stream, Multiple Data stream, which means performing the same instruction simultaneously on many data. Of course, KySMet is a misspelling of Kismet. Kismet means destiny or fate, but it comes from the Arabic word "qisma" which means portion, lot, or fragment... which seems appropriate enough for an instruction set in which one processing element is only a fragment of the final machine you are destined to build. Yeah, I spent too much time thinking about the name.... ;-)

The machine has sixteen 16-bit registers, 16-bit datapaths, and 16-bit addresses, and each address in memory holds one 16-bit word. It can operate only on 16-bit integers. The odd thing is the memory structure: there is one instruction memory, but there is a separate data memory for each processing element. Of course, it's not that odd because it still works for having just one processing element.

This instruction set is complete enough that I hope to be giving you a compiler (including full C source code) that translates programs written in a little SIMD dialect of C into KySMet code. It's not going to be a particularly smart compiler (ok, it's really dumb), but it will show you how KySMet can be used for complete programs.

The KySMet Instruction Set

The KySMet instruction set is quite straightforward, a general-register model encoding most instructions as a single 16-bit word:

Instruction Description Functionality
add $d, $s, $t ADD int $d = $s + $t
allen enable ALL processing elements en = (en | 1)
and $d, $s, $t bitwise AND integers $d = $s & $t
call addr CALL addr if (active) { stack(pc+1); pc = addr }
gor $d, $s Global bitwise OR $d = gor($s)
jump addr JUMP addr pc = addr
jumpf $d, addr JUMP False addr if ($d==0) en = (en & ~1); if (none_active) pc = addr
left $d, $s copy from LEFT processing element $d = PE[(IPROC+1)%NPROC].$s
li8 $d, i8 Load Immediate 8-bit integer $d = signed_extend(i8)
lnot $d, $s Logical NOT (zero becomes 1, non-zero becomes 0) $d = ! $s
load $d, $s LOAD $d = memory[$s]
lu8 $d, i8 Load Upper immediate 8-bit integer $d = ($d & 0x00ff) | (i8 << 8))
mul $d, $s, $t MULtiply int $d = $s * $t
neg $d, $s NEGate int $d = - $s
or $d, $s, $t bitwise OR integers $d = $s | $t
popen POP ENable state en = (en >> 1)
pushen PUSH ENable state en = ((en << 1) | (en & 1))
ret RETurn if (active) pc = unstack()
right $d, $s copy from RIGHT processing element $d = PE[(IPROC+NPROC-1)%NPROC].$s
sll $d, $s, $t Shift Left Logical $d = $s << ($t & 15)
slt $d, $s, $t Set Less Than int $d = ($s < $t)
sra $d, $s, $t Shift Right Arithmetic $d = $s >> ($t & 15)
store $d, $s STORE memory[$s] = $d
trap TRAP to operating system (end execution) halt the user program
xor $d, $s, $t bitwise eXclusive OR $d = $s ^ $t

Determining how to encode the above instructions as bit patterns is a key part of your project. However, there are a few rules:

The KySMet Registers

The KySMet processor actually has three different types of registers, but only one of them is random access. The other two are essentially shift registers.

The KySMet General Registers

There are 16 general-purpose registers, some of which have special purposes -- a lot like MIPS. They all have names as well as numbers. Perhaps the best way to give both is the following specification (formatted as an AIK specification):

.const {zero	IPROC	NPROC	sp	fp	rv	u0	u1
	u2	u3	u4	u5	u6	u7	u8	u9 }

Registers $u0 through $ub (aka, registers $4 through $15) are "user" registers to be used in any way the programmer sees fit. However, it is expected that the assembler or compiler would use registers starting at $u9 for "internal" things and starting at $u0 for normal coding. The first six registers have special meanings:

Register Number Register Name Read/Write? Use
$0 $zero Read Only ZERO; constant 0x0000
$1 $NPROC Read Only NPROC; number of processing elements
$2 $IPROC Read Only IPROC; number of this processing element, unique in [0..NPROC-1]
$3 $sp Read/Write the Stack Pointer
$4 $fp Read/Write the Frame Pointer
$5 $rv Read/Write the Return Value

Note that use of $0, $1, or $2 as the destination for a result is illegal -- they are read only registers. Thus, if you wish, you could use those bit patterns for other things. In other words, add $0, ..., ... is illegal, so you could use that bit pattern for something else, such as neg $d, $s. You'll have to be slightly clever to cram all these instruction formats into a structure that can't always reserve more than 4 bits for the opcode, but there are lots of ways to solve this problem. Simpler ways are better. :-)

The KySMet Enable Stack

It is useful for SIMD processing elements to have the ability to temporarily disable themselves for the duration of a construct. Since constructs can nest, there is actually an enable stack which is here called en. The enable stack should be at least 32 bits deep, initialized to all 1s. To be precise, there is one enable stack per processing element.

The current enable state is en[0]; if it is a 0, then most instructions are essentially treated as null operations. The exceptions are allen, call, jump, jumpf, popen, pushen, ret, and trap, which are always executed even if the processor is disabled. When a new construct is entered, pushen basically makes en = {en[30:0], en[0]}, pushing a copy of the current enable state on the stack. At the end of a construct, popen pops the enable stack: en = {en[31], en[31:1]}. The jumpf instruction makes en[0] = 0 if the register tested is false, while allen essentially makes en[0] = 1.

The KySMet Call Stack

One of the little details about SIMD code is that you really can't have processors disagree about what code to execute next. Thus, processing elements are not allowed to jump to a locally-computed address: all target addresses, addr, are constants. The catch is, then how do we do subroutine or function calls? Many SIMD machines (e.g., most GPUs) simply didn't allow it. However, we'll allow it using a hardware call stack, retaddr, containing 64 bits -- enough for four-level nested calls.

When a call instruction is executed, the return address is pushed into retaddr: retaddr = {retaddr[47:0], pc}. Similarly, the ret instruction pops an address: pc = retaddr[15:0]; retaddr = {retaddr[63:48], retaddr[63:16]}. Actually, it doesn't really matter what the top 16 bits of retaddr are left as after a ret because they are no longer valid.

You have noticed that there is no way for multiple processing elements to be executing different instructions simultaneously because there is no way to jump to an address that is computed to be different on different processing elements -- not even ret can cause such a "divergence of control flow." Thus, there could be only one call stack rather than one per processing element; however, there is nothing wrong with having one per processing element.

KySMet Data

The KySMet instruction set deals only with 16-bit int data which is generally treated as being signed.

In a SIMD machine with n processing elements, it is not uncommon to have n+2 memories. There is one instruction memory, n data memories (one local to each processing element), and one data memory for the control unit. Data structures that are "plural" have independent versions in each of the local memories while "singular" data structures exist only in the control unit data memory. Well, we don't have a separate control unit data memory -- instead, we simply replicate the "singular" data in each of the local data memories.

Normally, SIMD machines have a variety of instructions that allow communication between processors. Often this is done by allowing processing elements to directly access registers of adjacent processing elements. Here, we provide three communication mechanisms:

Obviously, in the case of having just a single processing element, gor, left, and right all do exactly the same thing.

Also be sure to force .lowfirst to be 0 so that bits are packed into 16-bit words starting with the MSB working down.


EE480 Advanced Computer Architecture.