#### Introduction

#### CPE380/CS380, Fall 2023

#### Hank Dietz

http://aggregate.org/hankd/



## **Course Overview**

- You know how to write a simple program... from CS courses
- You know how to build simple combinatorial and sequential logic circuits from ECE courses (especially CPE282 or EE280/EE281)
- This course fills the gap between the two:
  - So you can better **specify & use** that stuff
  - So you can create the stuff in between
  - There will be implementations in Verilog

```
module mul(ready, c, a, b, reset, clk);
parameter BITS = 32;
input [BITS-1:0] a, b;
input reset, clk;
output reg [BITS*2-1:0] c;
output reg ready;
reg [BITS-1:0] d;
reg [BITS-1:0] state;
reg [BITS:0] sum;
always @(posedge clk or posedge reset) begin
  if (reset) begin
    ready \leq 0;
    state <= 1;</pre>
    d <= a;
    c <= {{BITS{1'b0}}, b};</pre>
  end else begin
    if (state) begin
      sum = c[BITS*2-1:BITS] + d;
      c <= (c[0] ? {sum, c[BITS-1:1]} :</pre>
             (c >> 1));
      state <= {state[BITS-2:0], 1'b0};</pre>
    end else begin
      ready \leq 1;
    end
  end
end
endmodule
```

# Verilog 32-bit Multiplier



## Textbook

- The text is: *Computer Organization & Design, 5<sup>th</sup> Edition: The Hardware/Software Interface* by Patterson & <u>Hennessy</u>
- You can use any MIPS edition from 2<sup>nd</sup> 6<sup>th</sup>, but we'll reference sections from the 5<sup>th</sup>
- We will not assign problems from the text
- Lots of additional materials at the course URL and presented in class... text is reference only



# Grading & Such

- Two exams, ~1/8 each
- One final exam, ~1/4
- Material from lectures, the text as cited, canvas, or from the course URL: http://aggregate.org/CPE380/
- Homework and projects, ~1/2
- You are expected to regularly attend class
- I try not to curve much; always in your favor

## **Course Content**

| Торіс                                                                | Lectures | Exam |  |
|----------------------------------------------------------------------|----------|------|--|
| Introduction                                                         | 3        | 0    |  |
| Verilog                                                              | 2        | 0    |  |
| A simple multi-cycle machine                                         | 4        | 0    |  |
| Review for Exam 0                                                    |          |      |  |
| Machine and assembly language (generic, SIMD/MIMD, MIPS)             | 4        | 1    |  |
| Arithmetic integer and floating point, with Verilog implementations  | 4        | 1    |  |
| Single-cycle machine design                                          | 2        | 1    |  |
| Review for Exam 1                                                    |          |      |  |
| Pipelined machine design                                             | 4        | 2    |  |
| Memory hierarchy and I/O                                             | 3        | 2    |  |
| Performance and parallel processing, advanced architecture, lab tour | 2        | 2    |  |
| Review for Final Exam                                                |          |      |  |

#### **Schedule Notes**

- You'll be reading and writing Verilog code
  - You'll see lots of Verilog implementations
  - There are 3 modest team projects
- There will be a few online class meetings
  - October 12: I'll be hosting LCPC23
  - November 14, 16: I'll be at SC23

# Me (and why I'm biased)

- Hank Dietz, ECE Professor and James F. Hardymon Chair in Networking
- Office: 203 Marksbury
- Research in parallel compilers & architectures:
  - Built 1<sup>st</sup> Linux PC cluster supercomputer
  - Antlr, AFNs, SWAR, FNNs, MOG, ...
  - Various awards & world records for best price/performance in supercomputing
- Lab: 108/108A Marksbury I have TOYS!











# Let's Talk About Computers

- Embedded computers, IoT (Internet of Things)
- Personal Mobile Devices (PMDs)... usually "smart phones" and tablets
- Personal Computers (PCs)
- Servers
- Supercomputers
- Clusters, Farms, Grids, and Clouds (Warehouse Scale Computers – WSC, Software as a Service – SaaS)

#### **M-Unit Sales, Global Personal Electronics**



https://www.grandviewresearch.com/industry-analysis/personal-consumer-electronics-market

### What's Inside?





















# Memory Terminology

- Volatile power off, data fades away
- ROM non-volatile Read Only Memory
- PROM, EPROM, OTP, EEROM, Flash, 3DXPoint types of non-volatile programmable memory
- RAM Random Access Memory (mostly volatile)
  - Core non-volatile magnetic RAM technology
  - SRAM Static RAM, fast but big cells
  - DRAM Dynamic RAM, slow but small cells
  - EDO, SDRAM, DDR, RamBus DRAM types
  - CXL Compute eXpress Link
- Registers, Cache fast working memories

# More Memory Terminology

- Punched cards
- Punched paper tape
- Tape, Magtape
- Drum
- Disks:



Floppy, Hard, Magneto-optical, Compact Disc, Digital Video (Versatile?) Disc, Blu-ray

• Solid State Disk, Optane

## **Network Terminology**

- SAN, LAN, MAN, WAN Area Network; System/Storage, Local, Metropolitan, Wide
- Ethernet, DSL (Digital Subscriber Line)
- USB, FireWire
- Hub, Switch, Router
- WiFi, Bluetooth, NFC
- Bandwidth, Latency



# **Other I/O Terminology**





- Keyboard
- Mouse, Trackball, Touchscreen, Lightpen, Touchpad, etc.
- Pixel Picture Element
- Camera: Charge-Coupled Device, CMOS, BackSide Illuminated, Stacked

## **Other I/O Terminology**



 Display: Cathode Ray Tube, Plasma, Liquid Crystal Display, Digital Micromirror Device aka Digital Light Processor, Organic Light Emitting Diode, elnk

## **Processor Terminology**

- CPU Central Processing Unit
- PE, Core Processing Element
- Processor CPU or chip containing PEs
- "Computer Family" same ISA
- x86, IA32, x64/AMD64 Intel 386-based ISAs
- MIPS, ARM, SPARC other common ISAs
- DSP Digital Signal Processor
- GPU Graphics Processing Unit
- Tensor Matrix support for neural networks
- Quantum Combinatorial use of superposition

### Why Multi-Core?



- Hit the "power wall"
- Lower voltage & slower clock reduce power more than performance
- Software companies changed license fees



## Complexity

- Things are much more complex now
- Lots of things you use every day have
   BILLONS of components!
- You don't live long enough to know it all

### Abstraction "Onion"

| Applications       |                                     |
|--------------------|-------------------------------------|
| Operating System   |                                     |
| HLLs               |                                     |
| Assembly Language  | $\setminus$ $\setminus$ $\setminus$ |
| Machine Code       | $\setminus \setminus \setminus$     |
| Fn Units & Modules |                                     |
| Gates              |                                     |
| Transistors        |                                     |
|                    |                                     |
| Materials          |                                     |
|                    |                                     |
|                    |                                     |
|                    |                                     |
|                    |                                     |

# Software Layers

- Applications...
- Operating Systems (OS)...
- High-Level Languages (HLLs) Aka, High Order Languages (HOLs)
  - Designed for humans to write & read
  - Modularity
  - Abstract data types, type checking
  - Assignment statements
  - Control constructs
  - I/O statements

## Instruction Set Architecture

- ISA defines HW/SW interface
- Assembly Language
  - Operations match hardware abilities
  - Relatively simple & limited operations
  - Mnemonic (human readable?)
- Machine Language
  - Bit patterns 0s and 1s
  - Actually executed by the hardware



### Hardware Layers

- Function-block organization
- Gates & Digital Logic (CPE282 stuff)
- Transistors
  - Used as bi-level (saturated) devices
  - Amplifiers, not just on/off switches
- Materials & Integrated Circuits
  - Implementation of transistors, etc.
  - Analog properties

#### Who Does What?

- Instruction Set Design, by Architect
  - Machine & Assembly Languages
  - "Computer Architecture"
  - Instruction Set Architecture / Processor
- Computer Hardware Design, by *Engineer* 
  - Logic Design & Machine Implementation
  - "Processor Architecture"
  - "Computer Organization"

## How To Use Layers

- Things are too complex to "know everything"
- Need to know only layers adjacent
  - Makes design complexity reasonable
  - Makes things reusable
- Can tunnel to lower layers
  - For efficiency
  - For special capabilities

#### **8 Great Ideas**

- Design for Moore's Law
- Abstraction
- Make the common case fast
- Pipelining
- Parallelism
- Prediction
- Hierarchy of memories
- Dependability via redundancy



# Architecture Is Evolving

- Applications DVDs → MMX; Doom → 3DNow! & SSE; embedded systems; cell phones
- Programming Languages  $C \rightarrow call stack$ , flat memory addresses
- OperatingSystems Windows  $\rightarrow$  execute permission
- Technology Power density  $\rightarrow$  power management, multicore

# **Chip Terminology**

- Silicon Ingot sausage-like single crystal
- Wafer slice from above
- Die one chip's area on a wafer
- Chip a mounted die
- Yield fraction that are good
- SSI, MSI, LSI, VLSI, WSI Scale Integration; Small, Medium, Large, Very Large, Wafer

#### **Moore's Law**

"Cramming more components onto integrated circuits," Electronics, Vol. 38, No. 8, April 19, 1965.



Number of Components Per Integrated Circuit

#### Moore's Law – The number of transistors on integrated circuit chips (1971-2018)



Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important as other aspects of technological progress – such as processing speed or the price of electronic products – are linked to Moore's law.



Data source: Wikipedia (https://en.wikipedia.org/wiki/Transistor\_count)

The data visualization is available at OurWorldinData.org. There you find more visualizations and research on this topic.

## **Chip Fabrication**



# **Chip Fabrication**

Cost per die =  $\frac{\text{Cost per wafer}}{\text{Dies per wafer } \times \text{Yield}}$ Dies per wafer ≈ Wafer area/Die area  $\text{Yield} = \frac{1}{(1+(\text{Defects per area} \times \text{Die area}/2))^2}$ 

- Moore's Law is primarily about density, not speed
- Fab cost ~ cube of the die area

#### IC Costs: Dies To Chips

IC cost = Die cost + Testing cost + Packaging cost

#### Final test yield

Packaging Cost: depends on pins, heat dissipation

| Chip        | Die   | Package |      |      | Test &   | Total |
|-------------|-------|---------|------|------|----------|-------|
| -           | cost  | pins    | type | cost | Assembly |       |
| 386DX       | \$4   | 132     | QFP  | \$1  | \$4      | \$9   |
| 486DX2      | \$12  | 168     | PGA  | \$11 | \$12     | \$35  |
| PowerPC 601 | \$53  | 304     | QFP  | \$3  | \$21     | \$77  |
| HP PA 7100  | \$73  | 504     | PGA  | \$35 | \$16     | \$124 |
| DEC Alpha   | \$149 | 431     | PGA  | \$30 | \$23     | \$202 |
| SuperSPARC  | \$272 | 293     | PGA  | \$20 | \$34     | \$326 |
| Pentium     | \$417 | 273     | PGA  | \$19 | \$37     | \$473 |

### **Technology Trends**

CapacityLogic2X in 2 yearsDRAM4X in 3 yearsDisk4X in 3 years

Speed 2X in 3 years 1.4X in 10 years 1.4X in 10 years

Different rates mean relationships change; e.g., memory used to be faster than Add logic, now it's ~2000X slower!

# SI Terminology Of Scale

| 1000^1 | kilo | K | 1000^-1 | milli | m |
|--------|------|---|---------|-------|---|
| 1000^2 | mega | Μ | 1000^-2 | micro | U |
| 1000^3 | giga | G | 1000^-3 | nano  | n |
| 1000^4 | tera | Т | 1000^-4 | pico  | р |
| 1000^5 | peta | Ρ | 1000^-5 | femto | f |
| 1000^6 | exa  | Ε |         |       |   |

- 1000^x vs. 1024^x
- 1 Byte (B) is 8-10 bits (b), 4 bits in a Nybble
- Hertz (Hz) is frequency (vs. period)

Frontier Supercomputer: 8,730,112 cores, 1.102 Exaflop/s

IIIII

ENERGY

Hewlett Packard

AMD

FRØNTIER

#### Conclusion

- LOTS of stuff to know about... this course just does the basic stuff around the ISA and its implementation
- New technologies & applications mean new architectures & architectural concepts
- Look at the history references on the WWW: not to memorize who, what, when, & where, but to see trends...