No, not that Tangled...

Quantum computing! Ok, everybody excited? Scared? Well, no need to be... because we're not really doing that. However, we are going to be building a processor that takes full advantage of superposition and entanglement. It also allows the type of operations quantum computers implement using phase interference, but does that without modeling phase. The way our system does all this is quite parallel, but does not depend on exotic quantum phenomena, nor does it simulate them. That said, like a "real" quantum computer, the quantum-like computer you'll be building isn't really a stand-alone computer. It's an "attached accelerator" or "coprocessor" that needs a conventional computer to control it. You'll be building both, all intertwined together in one system design. So, we'll call this system Tangled, because it certainly is.

Of course, there's this other thing called Tangled -- an excellent Disney movie with this song that has become a bit of an anthem for the pandemic. Actually, "When will my life begin?" sounds like a great question for quantum computing too. Then again, for our Tangled processor, we know the answer is this semester. ;-)

Tangled Overview

Long ago, a drop of liquid sunlight sprouted a magical healing flower.... Then there was this dude named Schrodinger who was obsessed with ways to murder his cat, or not.... Confussed? Don't be. You don't need to understand the Disney movie nor the bizzare world of quantum phenomena to design and implement our Tangled processor. It's really just a conventional processor with a strange type of massively-parallel SIMD coprocessor... sort-of like having a mutant GPU built-in.

Our Tangled is fundamentally a straight-forward (yes, that was a bad pun) 16-bit processor. Registers are 16-bits wide, there are 16-bit memory addressess, and each instruction is generally encoded within a single 16-bit word. The biggest complication is the coprocessor, which we'll call Qat (pronounced "k-ha-t"), standing for Quantum-like Accelerator for Tangled. (I'll leave it up to you to make your own bad jokes about this being a UK wild Qat or perhaps of a legendary rescue from Schrodinger's box sadly leaving it still live/dead.) Anyway, it's more than a little unusual, and has some instructions that need to be encoded using 32 bits (two words), but the Qat instruction set is fundamentally even simpler than that of the base processor.

Qat Overview

You probably thought nothing about quantum computing was simple, so I think you'll be shocked at how straightforward Qat is. To begin, quantum computers don't have memory-like addressing, so neither does Qat. Instead, Qat just has 256 registers, which are most easily understood as approximating 256 qubits. A Qat register is not a qubit, nor does it model a qubit, but it does provide very similar properties. The main difference is that it does not implement superposition, entanglement, and intereference in the same ways normally assumed for qubits.

In Qat, entangled superposition is represented by literally having 2^entanglement ordinary bit values encode the value of each qubit. For example, an 8-way entangled qubit would be represented by a vector of 2⁸ ordinary bit values. This is very different from the way that quantum computers do this; neither is it very similar to the Bloch sphere models normally used to simulate quantum execution using complex numbers. The wonderful thing about Qat's representation is that it doesn't need to do complex matrix math to perform quantum gate operations: it simply applies the ordinary logic gate to all 2^entanglement bit values. That's a lot faster than the complex matrix math, but it is still exponentially more work than an actual quantum machine would have to do -- the catch is, that work can be perfectly parallelized, so Qat uses exponentially more gates to perform each operation in unit time!

Instead of phase and probability densities, Qat's odd representation introduces the concept of entanglement channels: index positions for selecting corresponding single bit values from each vector register. For example, suppose one register holds {1,0,1,1} and another register holds {0,1,0,0}. This entangled state can be summarized as 75% probability of {1,0} and 25% probability of {0,1}. With a maximum entanglement of k, this does mean probabilities can only be discrete values in parts to the 2^k, but it's not clear that the continuous model normally used offers any benefit other than modeling noise in a quantum computer's implementation.

And that brings us to the last plus for Qat: qubit values quickly decohere due to noise, limiting how long a value can be held and how many operations can be performed... but there isn't any noise in Qat. In fact, decoherence means that real quantum computers cannot implement a persistent state as computer engineers define it, and hence they aren't Turing capable; so all quantum computers are really just "special ALUs" that need a conventional computer to use them -- so the way Qat is intertwined in Tangled is really the kind of thing you should expect from quantum computers.

Beyond the time-to-decoherence problem, the act of measuring a qubit's value causes the qubit's superposition to collapse -- so you can compute on many values, but only can read out one result (selected at random). Qat registers can be freely read without collapsing the superposed state. This is a huge benefit. The really strange things done in quantum computers using interference to indirectly sample state without measuring it are entirely unnecessary for Qat.

Are We Saying Qat's More Powerful Than A Real Quantum Computer?

All the above tricks make Qat fairly practical -- and Qat's got 256 qubit-like registers with up to 16-way entanglement while current quantum computers generally have far fewer qubits and typically allow entanglement only between the nearest neighboring qubits. Does that mean Qat outperforms "real" quantum computers? Well, there's no answer to that which will make everybody happy.

Honeywell is claiming to have built the most powerful quantum computer thus far, with a quantum volume of 64. It gets such a high score despite having just 6 qubits because the error rate is low and connectivity is high. However, Qat's error rate is essentially zero and connectivity is not constrained by layout. In sum, by most metrics, Qat is in fact significantly more powerful than the current crop of quantum computers... but Qat's parallel execution model does not scale up to handle huge entanglements, while quantum computers might. There is a way to improve Qat's model so it might scale, but that's ongoing research that we'll discuss later in class....

So, all of that didn't really make you understand quantum computing nor even how Qat works... but that's OK, because all you need to do right now is to build the assembler. We'll tell you more as your projects need you to understand more. ;-)

The Tangled Instruction Set

The basic Tangled instruction set is really quite clean and simple, and every instruction fits within a single 16-bit word:

Instruction Bits Description Functionality

add $d, $s 16 Add 16-bit integers $d[15:0] += $s[15:0]

addf $d, $s 16 Add 16-bit floats $d[15:0] += $s[15:0]

and $d, $s 16 bitwise AND 16-bit $d[15:0] &= $s[15:0]

br addr pseudo BRanch (unconditional), 8-bit offset PC = addr

brf $c, addr 16 BRanch False (zero), 8-bit offset if ($c[15:0] == 0) PC += (addr-PC)

brt $c, addr 16 BRanch True (non-zero), 8-bit offset if ($c[15:0] != 0) PC += (addr-PC)

copy $d, $s 16 COPY $d = $s

float $d 16 16-bit integer to FLOAT $d[15:0] = (float)$d[15:0]

int $d 16 16-bit float to INTeger $d[15:0] = (int)$d[15:0]

jump addr pseudo JUMP to 16-bit address PC = addr

jumpf $d, addr pseudo JUMP False (zero) to 16-bit address if ($d == 0) PC = addr

jumpr $a 16 JUMP Register PC = $a[15:0]

jumpt $d, addr pseudo JUMP True (non-zero) to 16-bit address if ($d != 0) PC = addr

lex $d, imm8 16 Load sign EXtended 8-bit to 16-bit $d = ((imm8 & 0x80) ? 0xff00 : 0) | (imm8 & 0xff)

lhi $d, imm8 16 Load HIgh 8-bits with 8-bit immediate $d[15:8] = imm8

load $d, $s 16 LOAD $d[15:0] = memory[$s[15:0]]

load $d, imm16 pseudo LOAD immediate 16-bit value $d = imm16

mul $d, $s 16 Multiply 16-bit integers $d[15:0] *= $s[15:0]

mulf $d, $s 16 Multiply 16-bit Float $d[15:0] *= $s[15:0]

neg $d 16 Negate 16-bit integer $d[15:0] = -$d[15:0]

negf $d 16 Negate 16-bit Float $d[15:0] = -$d[15:0]

not $d 16 bitwise NOT 16-bit $d[15:0] = ~$d[15:0]

or $d, $s 16 bitwise OR 16-bit $d[15:0] |= $s[15:0]

recip $d 16 RECIProcal 16-bit float $d[15:0] = 1.0 / $d[15:0]

shift $d, $s 16 shift 16-bit $d[15:0] = (($s[15:0] > 0) ? ($d[15:0] << $s[15:0]) : ($d[15:0] >> -$s[15:0]))

slt $d, $s 16 Set Less Than 16-bit integer $d[15:0] = (($d[15:0] < $s[15:0]) ? 1 : 0)

sltf $d, $s 16 Set Less Than 16-bit Float $d[15:0] = (($d[15:0] < $s[15:0]) ? 1 : 0)

store $d, $s 16 STORE memory[$s[15:0]] = $d[15:0]

sys 16 SYStem call to OS

xor $d, $s 16 bitwise XOR 16-bit $d[15:0] ^= $s[15:0]

Instruction	Bits	Description	Functionality
`add $d, $s`	16	Add 16-bit integers	`$d[15:0] += $s[15:0]`
`addf $d, $s`	16	Add 16-bit floats	`$d[15:0] += $s[15:0]`
`and $d, $s`	16	bitwise AND 16-bit	`$d[15:0] &= $s[15:0]`
`br addr`	pseudo	BRanch (unconditional), 8-bit offset	`PC = addr`
`brf $c, addr`	16	BRanch False (zero), 8-bit offset	`if ($c[15:0] == 0) PC += (addr-PC)`
`brt $c, addr`	16	BRanch True (non-zero), 8-bit offset	`if ($c[15:0] != 0) PC += (addr-PC)`
`copy $d, $s`	16	COPY	`$d = $s`
`float $d`	16	16-bit integer to FLOAT	`$d[15:0] = (float)$d[15:0]`
`int $d`	16	16-bit float to INTeger	`$d[15:0] = (int)$d[15:0]`
`jump addr`	pseudo	JUMP to 16-bit address	`PC = addr`
`jumpf $d, addr`	pseudo	JUMP False (zero) to 16-bit address	`if ($d == 0) PC = addr`
`jumpr $a`	16	JUMP Register	`PC = $a[15:0]`
`jumpt $d, addr`	pseudo	JUMP True (non-zero) to 16-bit address	`if ($d != 0) PC = addr`
`lex $d, imm8`	16	Load sign EXtended 8-bit to 16-bit	`$d = ((imm8 & 0x80) ? 0xff00 : 0) \| (imm8 & 0xff)`
`lhi $d, imm8`	16	Load HIgh 8-bits with 8-bit immediate	`$d[15:8] = imm8`
`load $d, $s`	16	LOAD	`$d[15:0] = memory[$s[15:0]]`
`load $d, imm16`	pseudo	LOAD immediate 16-bit value	`$d = imm16`
`mul $d, $s`	16	Multiply 16-bit integers	`$d[15:0] *= $s[15:0]`
`mulf $d, $s`	16	Multiply 16-bit Float	`$d[15:0] *= $s[15:0]`
`neg $d`	16	Negate 16-bit integer	`$d[15:0] = -$d[15:0]`
`negf $d`	16	Negate 16-bit Float	`$d[15:0] = -$d[15:0]`
`not $d`	16	bitwise NOT 16-bit	`$d[15:0] = ~$d[15:0]`
`or $d, $s`	16	bitwise OR 16-bit	`$d[15:0] \|= $s[15:0]`
`recip $d`	16	RECIProcal 16-bit float	`$d[15:0] = 1.0 / $d[15:0]`
`shift $d, $s`	16	shift 16-bit	`$d[15:0] = (($s[15:0] > 0) ? ($d[15:0] << $s[15:0]) : ($d[15:0] >> -$s[15:0]))`
`slt $d, $s`	16	Set Less Than 16-bit integer	`$d[15:0] = (($d[15:0] < $s[15:0]) ? 1 : 0)`
`sltf $d, $s`	16	Set Less Than 16-bit Float	`$d[15:0] = (($d[15:0] < $s[15:0]) ? 1 : 0)`
`store $d, $s`	16	STORE	`memory[$s[15:0]] = $d[15:0]`
`sys`	16	SYStem call to OS
`xor $d, $s`	16	bitwise XOR 16-bit	`$d[15:0] ^= $s[15:0]`

The Qat coprocessor instruction set is smaller, but has longer instructions. The reason the instructions are longer is simply that the standard quantum logic gates ccnot and cswap both act on three qubits at a time, so there's really no getting around having three operands for them. With the 256 registers, you need 8 bits to name each register. Thus, even without an opcode, ccnot is at least 24 bits long. Here are the Qat instructions, which also must be fetched and decoded by Tangled, although the work of executing them is primarily up to Qat.

Instruction Bits Description Functionality

and @a, @b, @c 32 AND @a = @b & @c

ccnot @a, @b, @c 32 Controlled-controlled NOT (Toffoli) @a ^= (@b & @c)

cnot @a, @b 32 Controlled NOT @a ^= @b

cswap @a, @b, @c 32 Controlled swap (Fredkin) where (@c) swap(@a, @b)

had @a, imm4 16 HADamard initializer for entanglement @a = imm4

meas $d, @a 16 Measure value of entanglement channel $d = @a[$d]

next $d, @a 16 Entanglement channel of next 1 $d = next($d, @a)

not @a 16 NOT (Pauli-X) @a = ~@a

or @a, @b, @c 32 OR @a = @b | @c

one @a 16 ONE @a = 1

swap @a, @b 32 Swap swap(@a, @b)

xor @a, @b, @c 32 eXclusive OR @a = @b ^ @c

zero @a 16 ZERO @a = 0

Instruction	Bits	Description	Functionality
`and @a, @b, @c`	32	AND	`@a = @b & @c`
`ccnot @a, @b, @c`	32	Controlled-controlled NOT (Toffoli)	`@a ^= (@b & @c)`
`cnot @a, @b`	32	Controlled NOT	`@a ^= @b`
`cswap @a, @b, @c`	32	Controlled swap (Fredkin)	`where (@c) swap(@a, @b)`
`had @a, imm4`	16	HADamard initializer for entanglement	`@a = imm4`
`meas $d, @a`	16	Measure value of entanglement channel	`$d = @a[$d]`
`next $d, @a`	16	Entanglement channel of next 1	`$d = next($d, @a)`
`not @a`	16	NOT (Pauli-X)	`@a = ~@a`
`or @a, @b, @c`	32	OR	`@a = @b \| @c`
`one @a`	16	ONE	`@a = 1`
`swap @a, @b`	32	Swap	`swap(@a, @b)`
`xor @a, @b, @c`	32	eXclusive OR	`@a = @b ^ @c`
`zero @a`	16	ZERO	`@a = 0`

A few details about the above:

Each instruction must be encoded in the number of bits specified, but it is your job to decide the field layout and bit patterns to use. Notice that some instructions require you to fit an opcode along with 12 bits of operands in a 16-bit word, but there are more than 16 instruction types, so you'll need to do something a bit more clever.... Instructions marked as pseudo are not really instructions, but should be implemented by an efficient choice of one or more instructions.
Any Tangled register name (which starts with $) must have a value in 0..15 -- hence, a 4-bit field. Register names that start with @ are registers within the Qat coprocessor, and those range from 0..255 -- hence, an 8-bit field.
Notice that some instruction names appear more than once. For example and is used as the name of both the Tangled and Qat bitwise AND instructions. That's not a mistake. The arguments they take are different, so the AIK patterns will match them to the different instructions appropriately. There is similarly no ambiguity between the load instruction and the load pseudo instruction.

Qat `had` Operands

The Hadamard transform is an operation that essentially creates an equiprobable distribution of 0 and 1 over any number of entangled qubits. However, Qat's method for representing entanglement leads to a somewhat different notion of this operation because the set of bit vector positions to be used for the entangled value must be specified. In truth, computers using quantum phenomena also must specify the connections to be used to implement entanglement, but those connections are implied by the set of qubits being operated upon. By making entanglement more explicit, Qat is actually more flexible.

Qat essentially uses an alternating pattern of 2^k 0s intermixed with 2^k 1s to represent the k^th entanglement dimension. In other words, had 0 generates a vector that looks like {0,1,0,1,...0,1}, had 1 generates {0,0,1,1,0,0,1,1,...0,0,1,1}, etc. Thus, to make @42, @4, and @44 represent all 8 possible 3-bit patterns, one might execute had @42,0, had @43,1, and had @44,2. In this way, the had instruction can establish arbitrary entanglements while operating on only one register (bit vector) at a time.

Tangled Registers

Tangled only has 16 registers, but it would be problematic if, for example, code didn't standardize on which one is the stack pointer. Just as in MIPS, each register has a number and a symbolic name reflecting its preferred use. All those names should all be predefined using AIK's .const construct. In reality, none of registers are special in the hardware, but we have names and suggested uses for all 16:

.const r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 at rv ra fp sp

The registers $r0 through $r10 would normally be used by the compiler or assembly-language programmer as general-purpose registers. The $at register is reserved for use as an assembler temporary in implementing pseudo-instructions. The $rv and $ra registers are used for the return value and return address in function calls. As you probably expected, $fp and $sp are intended to serve respectively as the frame pointer and stack pointer.

Qat Registers

Qat has 256 registers, but none of them are special in any way. Thus, we will not bother giving them names -- they'll be reference by values from 0 to 255.

There is one sematic constraint on register use you should be aware of. In conventional quantum computers, it is not legal to use the same qubit as more than one operand within a single instruction. For example, ccnot @1,@2,@2 would not be allowed. Qat does allow that sort of reference, but only in cases where the same register will not be written twice. Thus, or @1,@1,$1 is OK (although rather useless); swap @1,@1 is not OK. This constraint is to ensure that the hardware will never see two simultaneous writers to the same register.

Well, that's about it.

Advanced Computer Architecture.