Post on 01-Jan-2016
Peter MarwedelInformatik 12
Technische Universität Dortmund
Reconfigurable Hardware
Peter MarwedelInformatik 12TU Dortmund
Germany
- 2 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Embedded System Hardware
Embedded system hardware is frequently used in a loop(„hardware in a loop“):
actuators
- 3 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Target technologies
Processing units Power efficiency of target technologies ASICs Processors
• Energy efficiency• Code size efficiency and code compaction• Run-time efficiency• DSP processors• Multimedia processors• Very long instruction word (VLIW) & EPIC
machines• MPSoCs
Reconfigurable Hardware Memory
- 4 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Energy Efficiency of FPGAs
Cou
rtes
y: P
hilip
s
© Hug
o D
e M
an,
IME
C,
200
7
- 5 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Reconfigurable Logic
Full custom chips may be too expensive, software too slow.
Combine the speed of HW with the flexibility of SW
HW with programmable functions and interconnect.
Use of configurable hardware;common form: field programmable gate arrays (FPGAs)
Applications: bit-oriented algorithms like
encryption,
fast “object recognition“ (medical and military)
Adapting mobile phones to different standards.
Include programmable interconnected logic blocks
- 6 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Types of logic blocks
Multiplexer-basedSRAM-based
(volatile)
EPROM-based(non-volatile)
Inputs depend on configuration
- 7 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Types of interconnections
Anti-fuses: Isolation between conductions removed by short high-voltage peakNon-volatile, non-reversible.
RAM-controlled MOS-switchesVolatile, reversible
- 8 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Generating a configurationExample: ISE design flow
htt
p:/
/ww
w.x
ilin
x.co
m/s
up
po
rt/
sw_
ma
nu
als
/xili
nx8
/do
wn
loa
d/q
st.z
ip
Design entry (VHDL, Verilog, State diagrams, pre-designed cores)
Net lists
Placement, routing
- 9 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Configuring a volatile FPGA with an EPROM
Configuration can be copied from non-volatile memory at power-up
© Xilinx
- 10 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
XILINX VIRTEX II FPGAs
RAM-based FPGAs
- 11 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Virtex II Configurable Logic Block (CLB)
- 12 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Virtex II Slice (simplified)
Look-up tables LUT F and G can be used to compute any Boolean function of 4 variables.
a b c d G0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 0
Example:
- 13 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Virtex II (Pro) Slice
[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]
- 14 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
2 carry paths perCLB (Vertex II Pro)
[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]
Enables efficient implementation of adders.
Enables efficient implementation of adders.
- 15 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Implementing sums of products
Dedicated or chain for computing sum of products
16-in
put
AN
D
gate
- 16 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Number of resourcesavailable in Virtex II Pro devices
[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]
- 17 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Embedded Multipliers
A Virtex-II Pro multiplier block is an 18-bit by 18- signed multiplier.
Device Columns Multipliers
XC2VP2 4 12
XC2VP4 4 28
XC2VP7 6 44
XC2VP20 8 88
XC2VP30 8 136
XC2VPX20 8 88
XC2VP40 10 192
XC2VP50 12 232
XC2VP70 14 328
XC2VPX70 14 308
XC2VP100 16 444
Multipliers are connected to a switch matrix, share some bits with RAM (MAC instruction).
Multipliers are connected to a switch matrix, share some bits with RAM (MAC instruction).
- 18 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
InterconnectH
iera
rchi
cal R
outin
g R
esou
rces
- 19 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Virtex II Pro Devices includeup to 4 PowerPC processor cores
[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]
- 20 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Summary
Processing units Power efficiency of target technologies ASICs Processors
• Energy efficiency• Code size efficiency and code compaction• Run-time efficiency• DSP processors• Multimedia processors• Very long instruction word (VLIW) & EPIC machines• MPSoCs
Reconfigurable Hardware Memory
Peter MarwedelInformatik 12
Technische Universität Dortmund
Memory
Peter MarwedelInformatik 12TU Dortmund
Germany
- 22 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Memory
For the memory, efficiency is again a concern: speed (latency and throughput); predictable timing energy efficiency size cost other attributes (volatile vs. persistent, etc)
- 23 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Access times and energy consumption increases with the size of the memory
Example (CACTI Model): "Currently, the size of some applications is doubling every 10 months" [STMicroelectronics, Medea+ Workshop, Stuttgart, Nov. 2003]
- 24 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Access times and energy consumptionfor multi-ported register files
Rixner’s et al. model [HPCA’00], Technology of 0.18 m
Cycle Time (ns) Area (2x106) Power (W)
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
16 32 64 128
Register File Size
0
1
2
3
4
5
6
7
16 32 64 128
GP6M2 GP6M3
0
2
4
6
8
10
12
14
16 32 64 128
Register File Size
Sou
rce
and
© H
. Val
ero,
200
1
- 25 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Reducing energy consumption with Sub-banking
shorter wires,lower capacitances,faster access
0
5
10
15
20
25
30
Memory Size (bytes)
Ene
rgy
[uJ]
1-Subbank 2-Subbanks4-Subbanks 8-Subbanks16-Subbanks
http://www.ecse.rpi.edu/frisc/theses/MaierThesis/FIG319.GIF
- 26 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
How much of the energy consumption of a system is memory-related?
Mobile PCThermal Design (TDP) System Power
Note: Based on Actual Measurements
600/500 MHz uP37%
LCD 10"19%
HDD9%
Memory+Graphics12%
Power Supply10%
Other13%
Mobile PCAverage System Power
600/500 MHz uP13%
LCD 10"30%
HDD19%
Memory+Graphics15%
Power Supply10%
Other13%
CPU Dominates Thermal Design Power
Multiple Platform Components Comprise
Average Power [Courtesy: N. Dutt; Source: V. Tiwari]
- 27 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
„CPU“ Power Dissipation
Power PC
Clock19%
Control L.16%
Cache23%
Data Flow11%
I/O 7%
PLA5%
ROM2%
TLB17%
Strong ARM
Icache26%
Dcache16%
Clock10%
IMMU9%
EBOX8%
DMMU8%
Others5%
Ibox18%
IEEE Journal of SSC Nov. 96
Proceedings of ISSCC 94Based on slide by and ©: Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, University of Massachusetts, Amherst, 2001
42%/40% again memory-related !
- 28 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Energy consumption in mobile devices
[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.
- 29 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Access-times will be a problem
Speed gap between processor and main DRAM increases
2
4
8
2 4 5
Speed
years
CPU
(1.5
-2 p
.a.)
DRAM (1.07 p.a.)
31
• early 60ties (Atlas):page fault ~ 2500 instructions
• 2002 (2 GHz µP):access to DRAM ~ 500 instructions
penalty for cache miss about same as for page fault in Atlas
• early 60ties (Atlas):page fault ~ 2500 instructions
• 2002 (2 GHz µP):access to DRAM ~ 500 instructions
penalty for cache miss about same as for page fault in Atlas
[P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]
2x every 2 years
10
- 30 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Predictability is a problem
Embedded system systems are often real-time systems: Have to guarantee meeting timing constraints.
Predictability: For satisfying timing constraints in hard real-time systems, predictability is the most important concern;pre run-time scheduling is often the only practical means of providing predictability in a complex system [Xu, Parnas] Time-triggered, statically scheduled operating systems
Currently available caches don‘t solve the problem:• Improve the average case behavior• Use „non-deterministic“ cache replacement algorithms
Scratch-pad/tightly coupled memory based predictability
- 31 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Hierarchical memoriesusing scratch pad memories (SPM)
Address space
ARM7TDMI cores, well-known for low power consumption
scratch pad memory
0
FFF..
main
SPM
processor
HierarchyHierarchy
ExampleExample
no tag memory
SPM
selectSelection is by an appropriate address decoder (simple!)
SPM is a small, physically separate memory mapped into the address space
- 32 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Comparison of currents using measurements
E.g.: ATMEL board with ARM7TDMI andext. SRAM
Current32 Bit-Load Instruction (Thumb)
48,2 50,9 44,4 53,1
11677,2 82,2
1,16
0
50
100
150
200
Prog Main/ DataMain
Prog Main/ DataSPM
Prog SPM/ DataMain
Prog SPM/ Data SPM
mA
Core+SPM (mA) Main Memory Current (mA)
- 33 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Comparison of energy consumption
Energy32 Bit-Load Instruction (Thumb)
115,8
51,6
76,5
16,4
0,020,040,060,080,0
100,0120,0140,0
Prog Main/ Data Main Prog Main/ Data SPM Prog SPM/ Data Main Prog SPM/ Data SPM
10 n
J
Energy
Example: Atmel ARM-Evaluation board
€
Main memory access takes more cyclessavings (86%) larger than for current.
energy reduction:/ 7.06
energy reduction:/ 7.06
- 34 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Why not just use a cache ?
[P. Marwedel et al., ASPDAC, 2004]
1. Predictability?Worst case execution time
(WCET) may be large
- 35 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Why not just use a cache ? (2)
0
1
2
3
4
5
6
7
8
9
256 512 1024 2048 4096 8192 16384
memory size
En
erg
y p
er
ac
ce
ss
[n
J]
.
Scratch pad
Cache, 2way, 4GB space
Cache, 2way, 16 MB space
Cache, 2way, 1 MB space
[R. Banakar, S. Steinke, B.-S. Lee, 2001]
2. Energy for parallel access of sets, in comparators, muxes.
- 36 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Influence of the associativity
Parameters different from previous slides
[P. Marwedel et al., ASPDAC, 2004]
- 37 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Flash Memory Based on EPROM/EEPROM-Memory
floating gate FG can be charged by high voltage.Charged transistor non-conducting if row selected.Discharging with UV light.
EPROM storage cell
FG charged
FG not charged
Read amplifierGround
Row (j)
- 38 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
EEPROM storage cell
EEPROM= electrically erasable read only memory.EEPROM needs additional transistor per bit
Writing and erasing requires just electrical signals
FG charged
FG not chargedGround
Row (j)
- 39 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
NOR- and NAND-Flash
NOR: Transistor between bit line and groundNAND: Several transistor between bit line and ground
[ww
w.s
amsu
ng.c
om/P
rod
ucts
/S
emic
ondu
ctor
/Fla
sh/F
lash
New
s/F
lash
Str
uctu
re.h
tm]
contact
contact
- 40 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Properties of NOR-
and NAND-Flash
memories
Type/Property NOR NAND
Random access Yes No
Erase block Slow Fast
Size of cell Larger Small
Reliability Larger Smaller
Execute in place Yes No
Applications Code storage, boot flash, set top box
Data storage, USB sticks, memory cards
[ww
w.s
amsu
ng.c
om/P
rodu
cts/
Sem
icon
duct
or/F
lash
/Fla
shN
ews/
Fla
shS
truc
ture
.htm
]
- 41 - P. Marwedel, TU Dortmund, Informatik 12, 2007
TU Dortmund
Summary
Processing units Power efficiency of target technologies ASICs Processors
• Energy efficiency• Code size efficiency and code compaction• Run-time efficiency• DSP processors• Multimedia processors• Very long instruction word (VLIW) & EPIC machines• MPSoCs
Reconfigurable Hardware Memory