INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache,...

38
INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining

Transcript of INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache,...

Page 1: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

INF2270 — Spring 2010

Philipp Häfliger

Lecture 6: Microcode, Cache, Pipelining

Page 2: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 2

Page 3: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 3

Page 4: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Microarchitecture

æ So far we have considered a so called hardwired CUarchitecture, where a hardwired FSM issues the rightsequence of control signals in response to a machinecode in the IR.

æ A more flexible alternative is to use microcode and asimple ’processor’ within the processor that simplyissues a sequence of control signals stored asmicroinstructions in the microprogram memory,typically a fast read only memory (ROM) butsometimes also a flash memory (i.e. electricallyerasable programmable read only memory (EEPROM)).

Lecture 6: Microcode, Cache, Pipelining 4

Page 5: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Hardwired and Microprogrammed CU

Lecture 6: Microcode, Cache, Pipelining 5

Page 6: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Pros and ConsMicroarchitecture Hardwired

Occurrence CISC RISC

Flexibility + -

Design Cycle + -

Speed - +

Compactness - +

Power - +

Lecture 6: Microcode, Cache, Pipelining 6

Page 7: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 7

Page 8: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Memory Hierarchy

Lecture 6: Microcode, Cache, Pipelining 8

Page 9: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 9

Page 10: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Cache

Cache is used to ameliorate the von Neumann memoryaccess bottleneck. Cache refers to a small high speed RAMintegrated into the CPU or close to the CPU. Access time tocache memory is considerably faster than to the mainmemory. Cache is small to reduce cost, but also becausethere is always a trade off between access speed andmemory size. Thus, modern architectures include alsoseveral hierarchical levels of cache (L1, L2, L3 ...).

Lecture 6: Microcode, Cache, Pipelining 10

Page 11: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Locality of Code and Data

Cache uses the principle of locality of code and data of aprogram, i.e. that code/data that is used close in time isoften also close in space (memory address). Thus, insteadof only fetching a single word from the main memory, awhole block around that single word is fetched and storedin the cache. Any subsequent load or write instructionsthat fall within that block (a cache hit,) will not access themain memory but only the cache. If an access is attemptedto a word that is not yet in the cache (a cache miss) a newblock is fetched into the cache (paying a penalty of longeraccess time).

Lecture 6: Microcode, Cache, Pipelining 11

Page 12: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Checking for Hits

Checking for hits or misses quickly is a prerequisite forthe usefulness of cache memory.

æ associative cacheParallel search (extremely specialized HW) amongmemory block tags in the cache.

æ direct mapped cacheA hash-function assigns each memory block to onlyone cache slot, only one tag needs to be checked

æ set-associative cacheA combination of the previous two: each memoryblock is hashed to one block-set in the cache. Quicksearch for the tag needs only to be conducted withinthe set.

Lecture 6: Microcode, Cache, Pipelining 12

Page 13: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Cache Coherency

A write operation will lead to a temporary inconsistencybetween the content of the cache and the main memory.Several strategies are used in different designs to correctthis inconsistency with varying delay. Major strategies:

write through : a simple policy where each write to thecache is followed by a write to the mainmemory. Thus, the write operations do notreally profit from the cache.

write back : delayed write back where a block that hasbeen written to in the cache is marked asdirty. Only when dirty blocks are reused foranother memory block will they be writtenback into the main memory.

Lecture 6: Microcode, Cache, Pipelining 13

Page 14: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 14

Page 15: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Repetition 4-Stage Pipeline

Lecture 6: Microcode, Cache, Pipelining 15

Page 16: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

4-Stage Pipeline Simplified Block Diagram

Lecture 6: Microcode, Cache, Pipelining 16

Page 17: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Pipelines with More Stages

The 4-stage pipeline is the shortest pipeline that has beenused for CPU design and modern processors use generallymore stages. The Pentium III had 16 and the Pentium 4 has31 stages.

Lecture 6: Microcode, Cache, Pipelining 17

Page 18: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Effective Speed-Up (1/2)

The speed-up is the ratio of the time T needed to execute aspecific program for pipelined and non-pipelinedexecution. The maximal speed-up of a 4-stage pipeline isnot exactly a factor 4, since the pipeline first needs to be’filled up’, before it finishes 4 times more instructionsthan a purely sequential execution. For example, if aprogram contains only one single instruction thepipelined architecture is obviously no faster at all.

Lecture 6: Microcode, Cache, Pipelining 18

Page 19: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Effective Speed-Up (1/2)

In a k-stage pipeline requiring one clock cycle per stagethe execution of n instructions with a clock cycle time twill be finished in:

T � �k� �n� 1�� t (1)

The speed-up is, thus:

knt�k� �n� 1�� t

� knk�n� 1

(2)

It may approach k for very long programs according tothis formula. Unfortunately there are other reasons why itnever quite gets there: pipelining hazards

Lecture 6: Microcode, Cache, Pipelining 19

Page 20: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Pipelining Hazards

Other causes that limit the pipelining speed-up are calledpipelining hazards. There are three major classes of thesehazards:

æ resource hazards

æ data hazards

æ control hazards

Hazards can be decimated by clever program compilation.In the following however, we will look at hardwaresolutions. In practice both are used in combination.

Lecture 6: Microcode, Cache, Pipelining 20

Page 21: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 21

Page 22: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Resource Hazard Example: Memory Access

We have earlier referred to the von Neumann bottle neckas the limitation to only one memory access at a time. Forpipelined operation, this means that only one instructionin the pipeline can have memory access at a time. Sincealways one of the instructions will be in the instructionfetch phase, a load or write operation of data to thememory is not possible without stalling the pipeline.

Lecture 6: Microcode, Cache, Pipelining 22

Page 23: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Improvement 1: Register File

To ameliorate the problem of the memory bottle neck,most instructions in pipelined architectures use localregisters organized in a register file for data input andoutput. The register file is in effect a small RAM (e.g. withonly a 3bit address space) with (commonly) two parallelread ports (addresses and data) and (commonly) oneparallel write port. It does, thus, allow three parallelaccesses at the same time. In addition it is a specializedvery fast memory within the CPU allowing extremely shortaccess times. Still, also registers in the register file can because for resource hazards if two instructions want toaccess the same port in different pipeline stages.

Lecture 6: Microcode, Cache, Pipelining 23

Page 24: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Improvement 2: Separate Data andInstruction Cache

Another improvement is the so called Harvardarchitecture, different from the von Neumann modelinsofar as there are two separate memories again for dataand instructions, on the level of the cache memory. Thus,the instruction fetch will not collide with data accessunless there is a cache miss of both.

Lecture 6: Microcode, Cache, Pipelining 24

Page 25: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

About Memory Access

Memory access still constitutes a hazard in pipelining. E.g.in the first 4-stage SPARC processors memory access uses5 clock cycles for reading and 6 for writing, and thusimpede pipe-line speed up.

Lecture 6: Microcode, Cache, Pipelining 25

Page 26: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Other Resource Hazards

Dependent on the CPU architecture a number of resourcesmay be used by different stages of the pipeline and maythus be cause for resource hazards, for example:

æ memory, caches,

æ register files

æ buses

æ ALU

æ ...

Lecture 6: Microcode, Cache, Pipelining 26

Page 27: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 27

Page 28: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Data Hazards

Data hazards can occur when instructions that are in thepipeline simultaneously access the same data (i.e.register). Thus, it can happen that an instruction reads aregister, before a previous instruction has written to it.

Lecture 6: Microcode, Cache, Pipelining 28

Page 29: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Data Hazard Illustration

Lecture 6: Microcode, Cache, Pipelining 29

Page 30: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

A Solution: Stalling

A simple solution is to detect a dependency in the IF stageand stall the execution of subsequent instructions untilthe crucial instruction has finished its WB

Lecture 6: Microcode, Cache, Pipelining 30

Page 31: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Improvement: Shortcuts/Forwarding

In this solution there is a direct data path from the EX/WBintermediate result register to the execution stage input(e.g. the ALU). If a data hazard is detected this direct datapath supersedes the input from the DE/EX intermediateresult register.

Lecture 6: Microcode, Cache, Pipelining 31

Page 32: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

content

Microarchitecture

Memory HierarchyCache

PipeliningResource HazardsData HazardsControl HazardsConclusion

Lecture 6: Microcode, Cache, Pipelining 32

Page 33: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Control Hazards

Pipelining assumes in a first approximation that there areno program jumps and ’pre-fetches’ always the nextinstruction from memory into the pipeline. The target ofjump instructions, however, is usually only computed inthe EX stage of a jump instruction. A this time, two moreinstructions have already entered the pipeline and are inthe IF and DE stage. If the jump is taken these instructionsshould not be executed and be prevented from writingtheir results in the WB stage or accessing the memory inthe EX stage.

Lecture 6: Microcode, Cache, Pipelining 33

Page 34: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Control Hazard Illustration

Lecture 6: Microcode, Cache, Pipelining 34

Page 35: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

A solution: Always Stall

Simply do not fetch any more instructions until it is clearif the branch is taken or not.

Lecture 6: Microcode, Cache, Pipelining 35

Page 36: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Improvement 1: Jump Prediction

Make a prediction and fetch the predicted instructions.Only if the prediction is proven wrong, flush the pipeline.Variants:

æ assume branch not taken (also a static prediction)

æ static predictions

æ dynamic predictions

Lecture 6: Microcode, Cache, Pipelining 36

Page 37: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Improvement 2: Hardware Doubling

By doubling the hardware of some of the pipeline stagesone can continue two pipelines in parallel for bothpossible instruction addresses. After it is clear, if thebranch was taken or not, one can discard/flush theirrelevant pipeline and continue with the right one.Of course, if there are two jumps or more just after eachother, this method fails on the second jump and thepipeline needs to stall.

Lecture 6: Microcode, Cache, Pipelining 37

Page 38: INF2270 --- Spring 2010 · INF2270 — Spring 2010 Philipp Häfliger Lecture 6: Microcode, Cache, Pipelining. content Microarchitecture Memory Hierarchy Cache Pipelining Resource

Pipelining Conclusion

Pipelining speeds up the instruction throughput (althoughthe execution of a single instruction is not accelerated).The ideal speed-up cannot be reached, because of this, andbecause of instruction inter-dependencies that sometimesrequire that an instruction is finished before another canbegin. There are techniques to reduce the occurrence ofsuch hazards, but they can never be avoided entirely.

Lecture 6: Microcode, Cache, Pipelining 38