Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel...

41
Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany

Transcript of Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel...

Page 1: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

Peter MarwedelInformatik 12

Technische Universität Dortmund

Reconfigurable Hardware

Peter MarwedelInformatik 12TU Dortmund

Germany

Page 2: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 2 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Embedded System Hardware

Embedded system hardware is frequently used in a loop(„hardware in a loop“):

actuators

Page 3: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 3 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Target technologies

Processing units Power efficiency of target technologies ASICs Processors

• Energy efficiency• Code size efficiency and code compaction• Run-time efficiency• DSP processors• Multimedia processors• Very long instruction word (VLIW) & EPIC

machines• MPSoCs

Reconfigurable Hardware Memory

Page 4: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 4 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Energy Efficiency of FPGAs

Cou

rtes

y: P

hilip

s

© Hug

o D

e M

an,

IME

C,

200

7

Page 5: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 5 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Reconfigurable Logic

Full custom chips may be too expensive, software too slow.

Combine the speed of HW with the flexibility of SW

HW with programmable functions and interconnect.

Use of configurable hardware;common form: field programmable gate arrays (FPGAs)

Applications: bit-oriented algorithms like

encryption,

fast “object recognition“ (medical and military)

Adapting mobile phones to different standards.

Include programmable interconnected logic blocks

Page 6: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 6 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Types of logic blocks

Multiplexer-basedSRAM-based

(volatile)

EPROM-based(non-volatile)

Inputs depend on configuration

Page 7: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 7 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Types of interconnections

Anti-fuses: Isolation between conductions removed by short high-voltage peakNon-volatile, non-reversible.

RAM-controlled MOS-switchesVolatile, reversible

Page 8: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 8 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Generating a configurationExample: ISE design flow

htt

p:/

/ww

w.x

ilin

x.co

m/s

up

po

rt/

sw_

ma

nu

als

/xili

nx8

/do

wn

loa

d/q

st.z

ip

Design entry (VHDL, Verilog, State diagrams, pre-designed cores)

Net lists

Placement, routing

Page 9: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 9 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Configuring a volatile FPGA with an EPROM

Configuration can be copied from non-volatile memory at power-up

© Xilinx

Page 10: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 10 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

XILINX VIRTEX II FPGAs

RAM-based FPGAs

Page 11: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 11 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Virtex II Configurable Logic Block (CLB)

Page 12: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 12 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Virtex II Slice (simplified)

Look-up tables LUT F and G can be used to compute any Boolean function of 4 variables.

a b c d G0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 0

Example:

Page 13: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 13 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Virtex II (Pro) Slice

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

Page 14: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 14 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

2 carry paths perCLB (Vertex II Pro)

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

Enables efficient implementation of adders.

Enables efficient implementation of adders.

Page 15: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 15 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Implementing sums of products

Dedicated or chain for computing sum of products

16-in

put

AN

D

gate

Page 16: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 16 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Number of resourcesavailable in Virtex II Pro devices

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

Page 17: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 17 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Embedded Multipliers

A Virtex-II Pro multiplier block is an 18-bit by 18- signed multiplier.

Device Columns Multipliers

XC2VP2 4 12

XC2VP4 4 28

XC2VP7 6 44

XC2VP20 8 88

XC2VP30 8 136

XC2VPX20 8 88

XC2VP40 10 192

XC2VP50 12 232

XC2VP70 14 328

XC2VPX70 14 308

XC2VP100 16 444

Multipliers are connected to a switch matrix, share some bits with RAM (MAC instruction).

Multipliers are connected to a switch matrix, share some bits with RAM (MAC instruction).

Page 18: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 18 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

InterconnectH

iera

rchi

cal R

outin

g R

esou

rces

Page 19: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 19 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Virtex II Pro Devices includeup to 4 PowerPC processor cores

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

Page 20: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 20 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Summary

Processing units Power efficiency of target technologies ASICs Processors

• Energy efficiency• Code size efficiency and code compaction• Run-time efficiency• DSP processors• Multimedia processors• Very long instruction word (VLIW) & EPIC machines• MPSoCs

Reconfigurable Hardware Memory

Page 21: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

Peter MarwedelInformatik 12

Technische Universität Dortmund

Memory

Peter MarwedelInformatik 12TU Dortmund

Germany

Page 22: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 22 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Memory

For the memory, efficiency is again a concern: speed (latency and throughput); predictable timing energy efficiency size cost other attributes (volatile vs. persistent, etc)

Page 23: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 23 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Access times and energy consumption increases with the size of the memory

Example (CACTI Model): "Currently, the size of some applications is doubling every 10 months" [STMicroelectronics, Medea+ Workshop, Stuttgart, Nov. 2003]

Page 24: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 24 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Access times and energy consumptionfor multi-ported register files

Rixner’s et al. model [HPCA’00], Technology of 0.18 m

Cycle Time (ns) Area (2x106) Power (W)

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

16 32 64 128

Register File Size

0

1

2

3

4

5

6

7

16 32 64 128

GP6M2 GP6M3

0

2

4

6

8

10

12

14

16 32 64 128

Register File Size

Sou

rce

and

© H

. Val

ero,

200

1

Page 25: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 25 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Reducing energy consumption with Sub-banking

shorter wires,lower capacitances,faster access

0

5

10

15

20

25

30

Memory Size (bytes)

Ene

rgy

[uJ]

1-Subbank 2-Subbanks4-Subbanks 8-Subbanks16-Subbanks

http://www.ecse.rpi.edu/frisc/theses/MaierThesis/FIG319.GIF

Page 26: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 26 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

How much of the energy consumption of a system is memory-related?

Mobile PCThermal Design (TDP) System Power

Note: Based on Actual Measurements

600/500 MHz uP37%

LCD 10"19%

HDD9%

Memory+Graphics12%

Power Supply10%

Other13%

Mobile PCAverage System Power

600/500 MHz uP13%

LCD 10"30%

HDD19%

Memory+Graphics15%

Power Supply10%

Other13%

CPU Dominates Thermal Design Power

Multiple Platform Components Comprise

Average Power [Courtesy: N. Dutt; Source: V. Tiwari]

Page 27: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 27 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

„CPU“ Power Dissipation

Power PC

Clock19%

Control L.16%

Cache23%

Data Flow11%

I/O 7%

PLA5%

ROM2%

TLB17%

Strong ARM

Icache26%

Dcache16%

Clock10%

IMMU9%

EBOX8%

DMMU8%

Others5%

Ibox18%

IEEE Journal of SSC Nov. 96

Proceedings of ISSCC 94Based on slide by and ©: Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, University of Massachusetts, Amherst, 2001

42%/40% again memory-related !

Page 28: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 28 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Energy consumption in mobile devices

[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.

Page 29: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 29 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Access-times will be a problem

Speed gap between processor and main DRAM increases

2

4

8

2 4 5

Speed

years

CPU

(1.5

-2 p

.a.)

DRAM (1.07 p.a.)

31

• early 60ties (Atlas):page fault ~ 2500 instructions

• 2002 (2 GHz µP):access to DRAM ~ 500 instructions

penalty for cache miss about same as for page fault in Atlas

• early 60ties (Atlas):page fault ~ 2500 instructions

• 2002 (2 GHz µP):access to DRAM ~ 500 instructions

penalty for cache miss about same as for page fault in Atlas

[P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]

2x every 2 years

10

Page 30: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 30 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Predictability is a problem

Embedded system systems are often real-time systems: Have to guarantee meeting timing constraints.

Predictability: For satisfying timing constraints in hard real-time systems, predictability is the most important concern;pre run-time scheduling is often the only practical means of providing predictability in a complex system [Xu, Parnas] Time-triggered, statically scheduled operating systems

Currently available caches don‘t solve the problem:• Improve the average case behavior• Use „non-deterministic“ cache replacement algorithms

Scratch-pad/tightly coupled memory based predictability

Page 31: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 31 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Hierarchical memoriesusing scratch pad memories (SPM)

Address space

ARM7TDMI cores, well-known for low power consumption

scratch pad memory

0

FFF..

main

SPM

processor

HierarchyHierarchy

ExampleExample

no tag memory

SPM

selectSelection is by an appropriate address decoder (simple!)

SPM is a small, physically separate memory mapped into the address space

Page 32: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 32 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Comparison of currents using measurements

E.g.: ATMEL board with ARM7TDMI andext. SRAM

Current32 Bit-Load Instruction (Thumb)

48,2 50,9 44,4 53,1

11677,2 82,2

1,16

0

50

100

150

200

Prog Main/ DataMain

Prog Main/ DataSPM

Prog SPM/ DataMain

Prog SPM/ Data SPM

mA

Core+SPM (mA) Main Memory Current (mA)

Page 33: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 33 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Comparison of energy consumption

Energy32 Bit-Load Instruction (Thumb)

115,8

51,6

76,5

16,4

0,020,040,060,080,0

100,0120,0140,0

Prog Main/ Data Main Prog Main/ Data SPM Prog SPM/ Data Main Prog SPM/ Data SPM

10 n

J

Energy

Example: Atmel ARM-Evaluation board

Main memory access takes more cyclessavings (86%) larger than for current.

energy reduction:/ 7.06

energy reduction:/ 7.06

Page 34: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 34 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Why not just use a cache ?

[P. Marwedel et al., ASPDAC, 2004]

1. Predictability?Worst case execution time

(WCET) may be large

Page 35: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 35 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Why not just use a cache ? (2)

0

1

2

3

4

5

6

7

8

9

256 512 1024 2048 4096 8192 16384

memory size

En

erg

y p

er

ac

ce

ss

[n

J]

.

Scratch pad

Cache, 2way, 4GB space

Cache, 2way, 16 MB space

Cache, 2way, 1 MB space

[R. Banakar, S. Steinke, B.-S. Lee, 2001]

2. Energy for parallel access of sets, in comparators, muxes.

Page 36: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 36 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Influence of the associativity

Parameters different from previous slides

[P. Marwedel et al., ASPDAC, 2004]

Page 37: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 37 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Flash Memory Based on EPROM/EEPROM-Memory

floating gate FG can be charged by high voltage.Charged transistor non-conducting if row selected.Discharging with UV light.

EPROM storage cell

FG charged

FG not charged

Read amplifierGround

Row (j)

Page 38: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 38 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

EEPROM storage cell

EEPROM= electrically erasable read only memory.EEPROM needs additional transistor per bit

Writing and erasing requires just electrical signals

FG charged

FG not chargedGround

Row (j)

Page 39: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 39 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

NOR- and NAND-Flash

NOR: Transistor between bit line and groundNAND: Several transistor between bit line and ground

[ww

w.s

amsu

ng.c

om/P

rod

ucts

/S

emic

ondu

ctor

/Fla

sh/F

lash

New

s/F

lash

Str

uctu

re.h

tm]

contact

contact

Page 40: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 40 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Properties of NOR-

and NAND-Flash

memories

Type/Property NOR NAND

Random access Yes No

Erase block Slow Fast

Size of cell Larger Small

Reliability Larger Smaller

Execute in place Yes No

Applications Code storage, boot flash, set top box

Data storage, USB sticks, memory cards

[ww

w.s

amsu

ng.c

om/P

rodu

cts/

Sem

icon

duct

or/F

lash

/Fla

shN

ews/

Fla

shS

truc

ture

.htm

]

Page 41: Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

- 41 - P. Marwedel, TU Dortmund, Informatik 12, 2007

TU Dortmund

Summary

Processing units Power efficiency of target technologies ASICs Processors

• Energy efficiency• Code size efficiency and code compaction• Run-time efficiency• DSP processors• Multimedia processors• Very long instruction word (VLIW) & EPIC machines• MPSoCs

Reconfigurable Hardware Memory