## AUTOMATICALLY EXPLORING COMPUTER SYSTEM DESIGN SPACES

Bobby R. Bruce

### THE GENERAL IDEA



### THE GENERAL IDEA

Genetic Improvement

Typically, in GI we optimize software with respect to a static target computer system.

#### THE GENERAL IDEA

We *could* also use GI to optimize a computer system with respect to a static software target 4

Isn't source-code the definition of computer system behavior?

Genetic Improvement

### A SIMPLE MOTIVATING EXAMPLE



these workloads. No need for general purpose systems. They could be specialized.

## OTHER MOTIVATIONS

#### Modes & ISA On-Chip Memory Ports Security Debug & Trace Interrupts Design For Test Clocks and Reset Branch Prediction RTL Options

| ISA Extensions                           |                   |                   |  |  |  |
|------------------------------------------|-------------------|-------------------|--|--|--|
| ✓ Multiply (                             | (M Extension) 🛛 – |                   |  |  |  |
| Multiply Pe                              | erformance        |                   |  |  |  |
|                                          | 1 Cycle           |                   |  |  |  |
|                                          |                   |                   |  |  |  |
| Floating Point                           | 0                 |                   |  |  |  |
| No FP                                    | Single FP (F)     | Double FP (F & D) |  |  |  |
| Half-Precision Hardware Floating Point 0 |                   |                   |  |  |  |
| ✓ Atomics                                | (A Extension) 🕜   |                   |  |  |  |

Bit Manipulation (B Extension)

Extensions

SiFive Custom Instruction Extension (SCIE)

| Untitled S7 Core Core Complex                                                    |                                      |                           |                                  |   |  |  |
|----------------------------------------------------------------------------------|--------------------------------------|---------------------------|----------------------------------|---|--|--|
| S7 SERIES CORE 10                                                                |                                      | Front Port<br>64-bit AXI4 |                                  |   |  |  |
| Machine Mode + User Mode<br>Multiply (1 Cycle) + Atomics + FP (F & D)<br>No SCIE |                                      |                           | v <b>stem Port</b><br>I-bit AXI4 | _ |  |  |
| Area Optimized Branch Prediction                                                 |                                      |                           | eripheral Port<br>I-bit AXI4     | _ |  |  |
| Clock Gating                                                                     | PMP 8 Regions                        | Memory Port               |                                  |   |  |  |
| Instruc. Cache<br>32 KiB - 2-way                                                 | Data Cache<br>32 KiB • 4-way         | L                         | L2 Cache                         |   |  |  |
| Instruc. TIM<br>32 KiB                                                           | <b>Data Loc. Store</b><br>32K        |                           |                                  |   |  |  |
| Adr. Remappers<br>None                                                           |                                      |                           |                                  |   |  |  |
| No Raw Trace Port - 2 Perf Counters                                              |                                      |                           |                                  |   |  |  |
| Debug Module                                                                     | PLIC                                 |                           | CLINT                            |   |  |  |
| JTAG - SBA<br>4 HW Breakpoints<br>0 Ext Triggers                                 | 7 Priority Levels<br>127 Global Int. |                           | 0 Local<br>Interrupts            | 5 |  |  |
|                                                                                  |                                      |                           |                                  |   |  |  |

#### This is the SiFive Core Design Tool

Open Architecture projects such as RISC-V are making this easier.

6

Costs of silicon customization, "design to tape-out" are dropping dramatically

#### Base: S76 Standard Core 🗹



# AT WHAT LEVEL SHOULD WE WORK?



Logic gates, transistors, circuit-level?

7

A few cons:

- It's been done!
- Run into scalability problems
- Customizing at this level is very expensive



# AT WHAT LEVEL SHOULD WE WORK?



Clusters, computer networks, etc.?

A few cons:

- Difficult to simulate workloads sufficiently for optimization

- (I'm not really all that interested!)



# AT WHAT LEVEL SHOULD WE WORK?

Some Pros:



#### **Computer architecture?**

- Common APIs and standards allow for interchanging components (think genes and alleles)
- We can utilize off the shelf components and designs.
- Plenty of research opportunity --- computer architecture designing and optimization is very manual.



### HOW DO WE EVALUATE ARCHITECTURE DESIGNS?

SIMULATION

10



## MODERN SIMULATORS

RESE

z

TION

ITLE

11

#### They provide a language for GI to modify



### MODERN SIMULATORS

#### That language follows a grammar like any other

| 1  | <pre>from gem5.components.boards.simple board import SimpleBoard</pre>                  |
|----|-----------------------------------------------------------------------------------------|
| 2  | <pre>from gem5.components.cachehierarchies.classic.no cache import NoCache</pre>        |
| 3  | from gem5.components.memory.single channel import SingleChannelDDR3_1600                |
| 4  | <pre>from gem5.components.processors.simple processor import SimpleProcessor</pre>      |
| 5  | from gem5.components.processors.cpu types import CPUTypes                               |
| 6  | from gem5.resources.resource import Resource                                            |
| 7  | <pre>from gem5.simulate.simulator import Simulator</pre>                                |
| 8  |                                                                                         |
| 9  | # Obtain the components.                                                                |
| 0  | cache_hierarchy = NoCache()                                                             |
| 1  | <pre>memory = SingleChannelDDR3_1600("1GiB")</pre>                                      |
| 2  | <pre>processor = SimpleProcessor(cpu_type=CPUTypes.ATOMIC, num_cores=1)</pre>           |
| 3  |                                                                                         |
| .4 | #Add them to the board.                                                                 |
| .5 | <pre>board = SimpleBoard(</pre>                                                         |
| 6  | clk_freq="3GHz",                                                                        |
| .7 | processor=processor,                                                                    |
| .8 | memory=memory,                                                                          |
| .9 | <pre>cache_hierarchy=cache_hierarchy,</pre>                                             |
| 20 |                                                                                         |
| 21 |                                                                                         |
| 2  | # Set the workload.                                                                     |
| 3  | <pre>binary = Resource("x86-hello64-static") baard set so binary workload(binary)</pre> |
| .4 | <pre>board.set_se_binary_workload(binary)</pre>                                         |
| 5  |                                                                                         |

# Setup the Simulator and run the simulation.
simulator = Simulator(board=board)
simulator.run()

26

27

The "variables" in a design are the components and their proporties

For example, we can swap out the memory system with a different type (single channel or dual) or configure its size

### MODERN SIMULATORS

#### A good target for Grammar-based GP





#### WHAT'S NEEDING **DONE?**

**STATS** Can we create simulations with good enough fidelity?

# ACCURACY COST MODEL

SPEED

What stats do we need from our simulator, what are we optimizing for??

> How do we estimate the cost of a design so we can determine the trade off?

How do we do 1000s of evaluations when 1 can take hours?

## BENCHMARKS What workloads should we optimize and are they mean

optimize and are they meaningful?

## ANY (NICE) QUESTIONS?

Bobby R. Bruce

bbruce@ucdavis.edu

https://www.bobbybruce.net

