Assignment 1: Logick Encoding And Assembler

Instruction set design is hard. Prof. Dietz has designed dozens of instruction sets in the three decades he's been a professor, and it still isn't easy for him to get things right. Thus, rather than giving you complete freedom to design your own instruction set, we're going to walk through the design logic for a reasonably well-crafted one that he built specifically for Fall 2017 EE480. However, this design is not complete -- each student must devise their own encoding of the instructions implement their own assembler.

Logick Overview

"Logick" is the archaic spelling of "logic" in English. However, this isn't really an archaic design... the name is because it supports Logarithmic Number System (LNS) arithmetic -- ick! LNS is a very useful alternative to the floating-point arithmetic you know, using a simpler structure than IEEE floats to deliver higher accuracy, while making multiply and divide simple and fast. The catch is that add and subtract are a bit of a nightmare in LNS. Oh well... there really isn't any free lunch. Anyway, Logick is what our target machine will be for Fall 2017 EE480.

The machine has sixteen 16-bit registers, 16-bit datapaths, and 16-bit addresses, and each address in memory holds one 16-bit word. It can operate on both 16-bit integers and 16-bit LNS values. can operate on data as either a single 32-bit value or a four-element vector of 8-bit values. Thus, data memory (i.e., the .data segment) looks like an array of 32-bit data objects in which adding 1 to an address gets you the next 32-bit data object. Instruction memory (i.e., the .text segment) is different; each instruction is 16 bits long and adding one to an instruction address gets you the next 16-bit instruction. For simulation purposes, you should assume each of the .data and .text segments can hold 65536 of their size "words."

This instruction set is complete enough that I hope to be giving you a compiler (including full C source code) that translates programs written in a significant subset of C into Logick code. It's not a particularly smart compiler (ok, it's really dumb), but it will show you how Logick can be used for complete programs.

The Logick Instruction Set

Logick's instruction set is quite straightforward, a general-register model encoding each instruction as a single 16-bit word. Although implementing the LNS operations isn't very familiar (ok, it's actually really complex for add and subtract), the bulk of the operations are really quite ordinary -- and so is the assembly langauge:

Instruction Description Functionality

ad $d, $s, $t ADd integers $d = $s + $t

al $d, $s, $t Add Log numbers $d = $s + $t

an $d, $s, $t bitwise ANd integers $d = $s & $t

br c, lab BRanch conditionally to label (encode lab as 8-bit lab-(PC+1)) if c then PC = lab

cl $s, $t Compare Log numbers condition codes set by $s vs. $t

co $s, $t COmpare integers condition codes set by $s vs. $t

dl $d, $s, $t Divide Log numbers $d = $s / $t

eo $d, $s, $t bitwise Exclusive Or integers $d = $s ^ $t

jr c, $d Jump conditionally to Register if c then PC = $d

li $d, i8 Load Immediate 8-bit integer $d = signed_extend(i8)

lo $d, $s LOad integer $d = memory[$s]

ml $d, $s, $t Multiply Log numbers $d = $s * $t

mi $d, $s MInus $d = - $s

nl $d, $s Negate Log number $d = - $s

no $d, $s logical NOt (zero becomes 1, non-zero becomes 0) $d = ! $s

or $d, $s, $t bitwise OR integers $d = $s | $t

si $d, i8 Shift In 8-bit integer $d = (($d) << 8) | (i8 & 0xff)

sr $d, $s, $t Shift Right signed integers $d = $s >> $t

st $d, $s STore integer memory[$s] = $d

sy SYstem (system call; end execution) halt

Instruction	Description	Functionality
`ad $d, $s, $t`	ADd integers	`$d = $s + $t`
`al $d, $s, $t`	Add Log numbers	`$d = $s + $t`
`an $d, $s, $t`	bitwise ANd integers	`$d = $s & $t`
`br c, lab`	BRanch conditionally to label (encode lab as 8-bit lab-(PC+1))	`if c then PC = lab`
`cl $s, $t`	Compare Log numbers	`condition codes set by $s vs. $t`
`co $s, $t`	COmpare integers	`condition codes set by $s vs. $t`
`dl $d, $s, $t`	Divide Log numbers	`$d = $s / $t`
`eo $d, $s, $t`	bitwise Exclusive Or integers	`$d = $s ^ $t`
`jr c, $d`	Jump conditionally to Register	`if c then PC = $d`
`li $d, i8`	Load Immediate 8-bit integer	`$d = signed_extend(i8)`
`lo $d, $s`	LOad integer	`$d = memory[$s]`
`ml $d, $s, $t`	Multiply Log numbers	`$d = $s * $t`
`mi $d, $s`	MInus	`$d = - $s`
`nl $d, $s`	Negate Log number	`$d = - $s`
`no $d, $s`	logical NOt (zero becomes 1, non-zero becomes 0)	`$d = ! $s`
`or $d, $s, $t`	bitwise OR integers	`$d = $s \| $t`
`si $d, i8`	Shift In 8-bit integer	`$d = (($d) << 8) \| (i8 & 0xff)`
`sr $d, $s, $t`	Shift Right signed integers	`$d = $s >> $t`
`st $d, $s`	STore integer	`memory[$s] = $d`
`sy`	SYstem (system call; end execution)	`halt`

Determining how to encode the above instructions as bit patterns is a key part of your project. However, there are a few rules:

You do not need to use .alias and other "fancy" features of AIK to build your assembler. Writing a separate pattern for each instruction type is fine.
Each instruction is one 16-bit word long.
As for MIPS, $ signifies a register number. Unlike MIPS, Logick has 16 registers numbered 0 through 15. It takes a 4-bit field to hold a register number.
The values named i8 and lab above must each be encoded with an 8-bit field in the instruction. The i8 values are obvious. The lab value is to be encoded as the lab value minus the sum of the address of the br instruction plus 1. For example, the encoding of a br t, 42 in memory location 30 would have an 8-bit field containing the value 11 encoding 42-(30+1).
It takes more than one instruction to load an arbitrary 16-bit constant into a register. For example, to place 0x1234 into register $u0, one could use the sequence li $u0, 0x12 followed by si $u0, 0x34. You are to implement a single "macro" instruction, la (Load Address) that will use just an li where possible, but the li, si sequence where needed.
There is actually a similar issue for selecting between a br and a jr depending on whether the target address is in range. You can ignore this issue for now. However, in the future projects the assembler with be expected to implement a jb "macro" that becomes the shortest possible sequence: br, li followed by jr, or li followed by si followed by jr.

The Logick Registers

The Logick processor has two different types of registers: general-purpose registers and condition code registers.

The Logick General Registers

There are 16 general-purpose registers, some of which have special purposes -- a lot like MIPS. They all have names as well as numbers. Perhaps the best way to give both is the following specification (formatted as an AIK specification):

.const {zero	sp	fp	ra	rv	u10	u9	u8
	u7	u6	u5	u4	u3	u2	u1	u0 }

Registers $u10 through $u0 (aka, registers $5 through $15) are "user" registers to be used in any way the programmer sees fit. However, it is expected that the assembler or compiler would use registers starting at $u10 for "internal" things and starting at $u0 for normal coding. The first five registers have special meanings:

Register Number Register Name Read/Write? Use

$0 $zero Read Only ZERO; constant 0x0000

$1 $sp Read/Write the Stack Pointer

$2 $fp Read/Write the Frame Pointer

$3 $ra Read/Write the Return Address

$4 $rv Read/Write the Return Value

Register Number	Register Name	Read/Write?	Use
`$0`	`$zero`	Read Only	ZERO; constant 0x0000
`$1`	`$sp`	Read/Write	the Stack Pointer
`$2`	`$fp`	Read/Write	the Frame Pointer
`$3`	`$ra`	Read/Write	the Return Address
`$4`	`$rv`	Read/Write	the Return Value

Note that use of $0 (or $zero) as the destination for a result is illegal. Thus, if you wish, you could use those bit patterns for other things. In other words, ad $0, ..., ... is illegal, so you could use that bit pattern for something else, such as sy or even ne $d, $s. You'll have to be slightly clever to cram all these instruction formats into a structure that can't always reserve more than 4 bits for the opcode, but there are lots of ways to solve this problem. Simpler ways are better. :-)

The Logick Condition Code Registers

There appear to be eight 1-bit condition registers that can only be written by executing a co (compare) instruction. They are:

f	lt	le	eq	ne	ge	gt	t

These names (or the values they imply) are to be used in the br and jr instructions to indicate the desired condition. For example, to branch to place only if the condition codes indicate "Greater than or Equal to" is true, you would write the instruction as br ge, place.

However, you have the free choice of what values should encode the choice of which of these eight conditions should be applied. You could encode each as a 3-bit value from 0 to 7 and then simply use the condition value to index the appropriate one of the eight condition code registers. Then again, there don't need to be eight condition code registers at all. All the conditions listed above can be derived from just two one-bit actual condition code registers: one set for "Less Than" and one set for "Greater Than." For example, the eq condition would then be checking that both "Less Than" and "Greater Than" are 0.

Logick Data

The plan is for the Logick C-subset compiler to understand four different base data types: char, short, int, and lognum. All of those data types are treated as being signed. It should come as no surprise that char, short, and int are really fully equivalent data types: each is a 16-bit value encoded in 2's complement. A lognum is also encoded in a 16-bit word, although it has a somewhat different internal structure that allows it to behave a lot like a signed floating-point value. The thing you need to know is that the Logick assembler does not need to understand how a lognum is encoded. In other words, a lognum constant in Logick assembly langauge is written as the integer value that produces the desired 16-bit bit pattern.

In your hardware implementations, you'll have a free choice to have two separate or one shared memory space for code (.text) and data (.data), but your assembler should provide for both segments with a word size of 16 bits, 0x10000 addresses each with a default start address of 0, and generating output machine code in Verilog memory image format. Also be sure to force .lowfirst to be 0 so that bits are packed into 16-bit words starting with the MSB working down.

Your Project

Your project is simply to design the instruction set encoding and implement an assembler using AIK. Here's a simple test case:

	.text
	.origin	0x0000
start:	ad	$1,$2,$3
	al	$4,$5,$6
	an	$sp,$fp,$zero
	br	lt,start
	cl	$ra,$rv
	co	$u0,$u1
	dl	$u2,$u3,$u4
	eo	$u5,$u6,$u7
	jr	ge,$u8
	li	$u9,-1
	lo	$u10,$u0
	.data
	.origin	0x0100
fluff:	.word	42
	.text			; continue where we left off
	ml	$u0,$u1,$u2
	mi	$7,$8		; was ne instruction
	nl	$9,$10
	no	$11,$12
	or	$13,$14,$15
	si	$u0,42
	sr	$u0,$u0+1,$0
	st	$u0,$u1
	sy
	la	$1,42		; just an li
	la	$1,fluff	; forces li, si
	la	$1,-2		; just an li

No, the above isn't a useful program. Worse still, I obviously can't show you sample output without giving-away how I've encoded the instructions....

Due Dates

The recommended due date for this assignment is before class, Friday, September 22, 2017. This submission window will close when class begins on Monday, September 25, 2017. You may submit as many times as you wish, but only the last submission that you make before class begins on Monday, September 25, 2017 will be counted toward your course grade. The deadline has been now been extended by one class period -- to before class Wednesday, September 27, 2017 -- because of the accidental conflict involving use of ne for both negate and "not equal"; the instruction is now called mi (for minus).

Note that you can ensure that you get at least half credit for this project by simply submitting a tar of an "implementor's notes" document explaining that your project doesn't work because you have not done it yet. Given that, perhaps you should start by immediately making and submitting your implementor's notes document? (I would!)

Submission Procedure

For each project, you will be submitting a tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf. It also may include test cases, sample output, a make file, etc., but should not include any files that are built by your Makefile (e.g., no binary executables). For this particular project, name the AIK source file logick.aik.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok).

Advanced Computer Architecture.