Assignment 1: PinKY Encoding And Assembler

Instruction set design is hard. Prof. Dietz has designed dozens of instruction sets in the three decades he's been a professor, and it still isn't easy for him to get things right. Thus, rather than giving you complete freedom to design your own instruction set, we're going to walk through the design logic for a reasonably well-crafted one that he built specifically for Fall 2018 EE480. However, this design is not complete -- each student must devise their own encoding of the instructions implement their own assembler.

Why PinKY?

Although PinKY was built for Fall 2018 EE480, it wasn't just built for that. Hopefully, you recall taking either EE383 or EE287 -- the embedded systems course. Well, that had you programming ARM processors. However, in EE380 everything suddenly became about MIPS processors. Now, MIPS is a fine processor design to teach, but it isn't anywhere near as commonly used as ARM, so wouldn't it be nice to have everything target ARM? There's a version of the textbook we use for EE380 that is supposedly ARM based, but in fact it simply changes the chapter on assembly language programming to ARM and then uses a MIPS-with-instructions-renamed supposedly ARM-ish thing called LEGv8. Well, ARM is not a simple instruction set, so using it would be a pain, but LEGv8 really doesn't have anything very ARM-ish about it; in fact, the ARM version of the textbook doesn't even redraw the architectural implementation diagrams from MIPS. So, that's why we created PinKY: to use for Fall 2018 EE480 and, maybe, for future offerings of EE380....

PinKY is a somewhat strained acronym for PINkie from KentuckY. PINkie? Well, there's ARM, then there is the Thumb subset, and now there's PinKY. If instead it reminds you of this, well, that's OK too. The key point is that it is a very simple little architecture with a variety of similarities to ARM. Most significantly, compared to ARM, PinKY is a very regular instruction set. In fact, there are really only three instruction formats needed to encode it -- fewer than for MIPS, and way fewer than for ARM.

PinKY is a completely 16-bit machine. Everything is 16 bits wide: instructions, addresses, data. It isn't even byte-addressed for memory. However, PinKY is a fairly beefy processor in that it even supports floating-point arithmetic. Well, 16-bit floats. In short, it's small enough to be feasibly implemented inside a single not-too-exotic FPGA... which we will not make you do, but again, is a nice possibility for the future.

PinKY Assembly Language

I'm not going to repeat PinKY's instruction set here. Instead, I'll simply point you at this PinKY reference. You will need to read and understand that document very thoroughly -- including the encoding hints on that page. However, there are two things that make ARM-style assembly langauge really annoying to implement using AIK:

The condition is expressed as part of the opcode name. This basically means that ADD, ADDS, ADDNE, and ADDEQ are more naturally specified by four separate patterns. That doesn't sound too bad until you realize that Op2 has three forms, so you naturally get 12 different specifications of ADD. Yuck!
LDR and STR have brackets ([]) around their Op2. This means you can't factor-out these opcodes with a generic pattern that could handle all regular instructions.

It really isn't hard to deal with the assembly language as written, but the obvious AIK specification could be over 100 lines long! Thus, at your option, you may use a slightly simplified format for the assembly langauge, as described here. The changes are very simple:

Instead of using the suffix on the name, use a suffix after the name. In other words, rather than writing:
```
ADDEQ r2,r1
ADD   r2,r1
```
one would write:
```
ADD EQ r2,r1
ADD AL r2,r1
```
This can greatly simplify the AIK spec. However, to keep things fully regular, you'll need the AL condition, as seen above, to mean "always" executed. You would give AL, S, NE, and EQ values using a .const specifications.
LDR and STR would be written without []. In other words, rather than writing:
```
LDRS r2,[r1]
STR r2,[r1]
```
one would write:
```
LDR S r2,r1
STR AL r2,r1
```

Be sure to note which assembly language syntax you implemented.

PinKY Data

Everything is one 16-bit word long, and integer types are treated as 2's complement signed values. Memory is not byte addressed; the contents of an address is one 16-bit word. The AIK assembler understands 16-bit integers and .word can be used to initialize data in memory, but AIK doesn't understand float data. That's ok -- floating-point constants simply need to be entered as integer values with teh same bit pattern. Here's a CGI form that lets you enter a floating-point value and will show you the 16-bit integer value that represents it in hexadecimal. For example, 1.0 is represented by 0x3f80.

In your hardware implementations, you'll have a free choice to have two separate or one shared memory space for code (.text) and data (.data), but your assembler should provide for both segments with a word size of 16 bits, 0x10000 addresses each with a default start address of 0, and generating output machine code in Verilog memory image format. Also be sure to force .lowfirst to be 0 so that bits are packed into 16-bit words starting with the MSB working down.

Your Project

Your project is simply to design the instruction set encoding and implement an assembler using AIK. Here's a simple test case:

	.text
	.origin	0x0000
start:	ADD	r2,r1
	ADDS	r2,r1
	ADDNE	r2,r1
	ADDEQ	r2,r1
	ADD	r2,#1
	ADDS	r2,#1
	ADDNE	r2,#1
	ADDEQ	r2,#1
	ADD	r2,#8
	ADDS	r2,#8
	ADDNE	r2,#8
	ADDEQ	r2,#8
	ADD	r2,#place
	ADDS	r2,#place
	ADDNE	r2,#place
	ADDEQ	r2,#place
	BIC	r2,r1
	EOR	r2,r1
	FTOI	r2,r1
	ITOF	r2,r1
	LDR	r2,[r1]
	MOV	r2,r1
	MUL	r2,r1
	MULF	r2,r1
	.data		; switch to data segment
	.origin	0x0100
place:	.word	42
	.text		; continue where we left off
	NEG	r2,r1
	ORR	r2,r1
	PRE	#0x123
	RECF	r2,r1
	SHA	r2,r1
	STR	r2,[r1]
	SUB	r2,r1
	SUBF	r2,r1
	SYS

And here is the same assembly code in the alternative, simplified, format:

	.text
	.origin	0x0000
start:	ADD AL	r2,r1
	ADD S	r2,r1
	ADD NE	r2,r1
	ADD EQ	r2,r1
	ADD AL	r2,#1
	ADD S	r2,#1
	ADD NE	r2,#1
	ADD EQ	r2,#1
	ADD AL	r2,#8
	ADD S	r2,#8
	ADD NE	r2,#8
	ADD EQ	r2,#8
	ADD AL	r2,#place
	ADD S	r2,#place
	ADD NE	r2,#place
	ADD EQ	r2,#place
	BIC AL	r2,r1
	EOR AL	r2,r1
	FTOI AL	r2,r1
	ITOF AL	r2,r1
	LDR AL	r2,r1
	MOV AL	r2,r1
	MUL AL	r2,r1
	MULF AL	r2,r1
	.data		; switch to data segment
	.origin	0x0100
place:	.word	42
	.text		; continue where we left off
	NEG AL	r2,r1
	ORR AL	r2,r1
	PRE AL	#0x123
	RECF AL	r2,r1
	SHA AL	r2,r1
	STR AL	r2,r1
	SUB AL	r2,r1
	SUBF AL	r2,r1
	SYS AL

No, the above isn't a useful program. Worse still, I obviously can't show you sample output without giving-away how I've encoded the instructions....

Due Dates

The recommended due date for this assignment is before class, Monday, September 24, 2018. This submission window will close when class begins on Wednesday, September 26, 2018. You may submit as many times as you wish, but only the last submission that you make before class begins on Monday, September 26, 2018 will be counted toward your course grade.

Note that you can ensure that you get at least half credit for this project by simply submitting a tar of an "implementor's notes" document explaining that your project doesn't work because you have not done it yet. Given that, perhaps you should start by immediately making and submitting your implementor's notes document? (I would!)

Submission Procedure

For each project, you will be submitting a tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf. It also may include test cases, sample output, a make file, etc., but should not include any files that are built by your Makefile (e.g., no binary executables). For this particular project, name the AIK source file pinky.aik.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok).

Advanced Computer Architecture.