$ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data...

31
$1HZ'LVWULEXWHG'63$UFKLWHFWXUH $1HZ'LVWULEXWHG'63$UFKLWHFWXUH %DVHGRQWKH,QWHO,;6IRU:LUHOHVV %DVHGRQWKH,QWHO,;6IRU:LUHOHVV &OLHQWDQG,QIUDVWUXFWXUH &OLHQWDQG,QIUDVWUXFWXUH Ernest Ernest Tsui Tsui Communications and Interconnect Technology Lab Communications and Interconnect Technology Lab Corporate Technology Group Corporate Technology Group Kumar Kumar Ganapathy Ganapathy Edge Access Division Edge Access Division Intel Communications Group Intel Communications Group Contributors: Contributors: Hooman Honary Hooman Honary, Rich , Rich Nicholls Nicholls, Tony Chun, and Lee Snyder , Tony Chun, and Lee Snyder HOT CHIPS 14 HOT CHIPS 14 Session 7: Digital Signal Processors Session 7: Digital Signal Processors 20 August 2002 20 August 2002

Transcript of $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data...

Page 1: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

$�1HZ�'LVWULEXWHG�'63�$UFKLWHFWXUH$�1HZ�'LVWULEXWHG�'63�$UFKLWHFWXUH%DVHG�RQ�WKH�,QWHO�,;6�IRU�:LUHOHVV%DVHG�RQ�WKH�,QWHO�,;6�IRU�:LUHOHVV

&OLHQW�DQG�,QIUDVWUXFWXUH&OLHQW�DQG�,QIUDVWUXFWXUH

Ernest Ernest TsuiTsuiCommunications and Interconnect Technology LabCommunications and Interconnect Technology Lab

Corporate Technology GroupCorporate Technology Group

Kumar Kumar GanapathyGanapathyEdge Access DivisionEdge Access Division

Intel Communications GroupIntel Communications Group

Contributors:Contributors:Hooman HonaryHooman Honary , Rich , Rich NichollsNicholls , Tony Chun, and Lee Snyder, Tony Chun, and Lee Snyder

HOT CHIPS 14HOT CHIPS 14Session 7: Digital Signal ProcessorsSession 7: Digital Signal Processors

20 August 200220 August 2002

Page 2: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

2

OutlineOutline–– Vision for Wireless Networks - Vision for Wireless Networks - ubiquitousubiquitous

–– Anticipated Issues – Anticipated Issues – plethora of “standards”plethora of “standards”

–– Future Wireless Requirements – Future Wireless Requirements – ““ soft” withsoft” with intelligence tointelligence toincrease capacityincrease capacity

–– Architectural Objectives – Architectural Objectives – flexible and low powerflexible and low power

–– How will we go about it? – How will we go about it? – distributed at the “right granularity”distributed at the “right granularity”

–– Distributed Architectural Summary – Distributed Architectural Summary – based on power, size,based on power, size,and wireless protocols we can derive a “good” (near optimal?)and wireless protocols we can derive a “good” (near optimal?)distributed architecturedistributed architecture

–– Comparison to other Wireless DSP research – Comparison to other Wireless DSP research – flexible butflexible butwithin 2x of Berkeley Research Wireless Center’s Pleiades Arch.within 2x of Berkeley Research Wireless Center’s Pleiades Arch.

–– Summary – Summary – infrastructure architecture is near-optimum ininfrastructure architecture is near-optimum ingranularity and powergranularity and power

–– Next Steps Next Steps – – client architecture nextclient architecture next

Page 3: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

3

Vision for Wireless NetworksVision for Wireless Networks––Ubiquitous Internet Connections for allUbiquitous Internet Connections for all

Mobile Client DevicesMobile Client Devices–– HandheldsHandhelds , , PDAsPDAs , Tablet PCs, and Laptops, Tablet PCs, and Laptops

–– Always-onAlways-on

––New Paradigm for Wireless New Paradigm for Wireless BasestationsBasestations–– Proliferation of Proliferation of basestations basestations due to lack ofdue to lack of

spectrumspectrum

–– Agility across Multiple BandsAgility across Multiple Bands

–– Multi-Network (WLAN, WWAN)Multi-Network (WLAN, WWAN)

Page 4: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

4

Wireless Protocol PlethoraWireless Protocol Plethora

––PAN, WLAN, and WANPAN, WLAN, and WAN–– PAN: PAN: Bluetooth Bluetooth (UWB, Wireless USB2)(UWB, Wireless USB2)

–– WLAN (4 protocols): 802.11b/a (11g,WLAN (4 protocols): 802.11b/a (11g, Hiperlan HiperlanII)II)

–– WAN (9 protocols):WAN (9 protocols):– 2G: IS-95, GSM

– 2.5G: GPRS/EGPRS, cdma2000

– 3G: WCDMA (FDD, TDD, SC), CDMA 1xE DV

Antici pated Future Issues

Page 5: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

5

Wireless Re quirements Summar yWireless Re quirements Summar y–– Soft Radios at Soft Radios at Basestations Basestations (deployed initially)(deployed initially)

–– Low Power (<1 W) but highly flexibleLow Power (<1 W) but highly flexible–– Large no. of channels per coreLarge no. of channels per core–– ScaleableScaleable

–– ReconfigurableReconfigurable Client Radios (deployed later) Client Radios (deployed later)–– Seamless Client RoamingSeamless Client Roaming

– Two Concurrent Wireless Protocols– Selected 802.11a and WCDMA as the most intensive protocols

–– Variable User Environments require “adaptive”Variable User Environments require “adaptive”resource allocationresource allocation

–– Adaptive to Broadband AFE distortionsAdaptive to Broadband AFE distortions–– Very Low Power (<< 1W)Very Low Power (<< 1W)

– Digital Baseband is < 10% of total PHY pwr

–– ReconfigurableReconfigurable to allow to allow Si Si Re-useRe-use–– ScaleableScaleable

Page 6: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

6

64 PointFFT

P/SChannel

Correction(FEQ)

QAM Demap& Soft MetricGeneration

DeinterleaverAdaptive IQImbalanceCorrection

y 48 complex MULTs

y 204 complex MULT

LMS updatey 48 X 8 realcoefficent updates

ResidualFrequencyCorrection

y 48 complex MULTy 4 arctany 8 real ADDsy 1 SUBTRACTy 1 SHIFT

y 3 stagesy 16 Radix-4 butterfles per stagey Re-ordering

DecimationFilter

GuardInterval

Removalfrom Analog Front End

Automatic FrequencyCorrection

S/P

Fixed IQImbalanceCorrection

from TimingSynchronization

y 2 X 13 real MACs y 1 complex MULT

y 1 complex MULTy 1 modulo ADDy 1 bit extracty 2 LUTs

ViterbiACS+Traceback

y 64 1/2 ACS + 64 leveltraceback

Data streamDescrambler

y 6 register LFSRy 2 XORs

Decryption/Lower MAC

Viterbi PathNormalizations

Bit RateFEC

Sample/Symbol Rate IXS

802.11a Signal Processing Flow Example

Rich Nicholls and Hooman Honary

Page 7: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

7

802.11a Initial Acquisition Flow802.11a Initial Acquisition Flow

Short Training(8 useconds)

40 M

20 M

250 K

Output Data Rate(Max: 54M)

Sample Rates

1.6useconds:

4useconds:

y 1 complex MULTy 2 complex ADDsy 1 LUT

2.4useconds:

AGC DecimationFilter

PreambleDetection

from Analog Front End 1

AGC Lock

y 2 X 13 real MACs

y 1 complex MULTy 2 complex ADDsy 2 real ADDsy 1 COMPARE

SNRCalculation

AGCfrom Analog Front End 1

AGCDecimation

Filterfrom Analog Front End 2

y 2 X 13 real MACsy 1 complex MULTy 2 complex ADDsy 1 LUT

SNRCalculation

DiversitySelection

y 2 real MULTsy 4.25 COMPAREsy 3.06 real ADDsy .125 SHIFTsy 1.06 LUTs

y 2 real MULTsy 4.25 COMPAREsy 3.06 real ADDsy .125 SHIFTsy 1.06 LUTs

start SNR

y 1 COMPARE

start Second AGC

Once per Packet

AFC Coarse Tuning

y 1 ARCTANy 1 MULT

from Pream ble Detector

DecimationFilter

Tim ingSynchronization

from Analog Front End

y 2 X (8+8) com plex MACy 1 complex MAGy 1 COMPARE

y 2 X 13 real MACs

Automatic FrequencyCorrection

y 1 complex MULTy 1 modulo ADDy 2 LUTs

For new Packet Communications schemes – significantprocessing goes on during very short intervals of thepreambles

Hooman Honary

Page 8: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

8

ReceiveFilter

RakeFingers

(DPDCH 2)

RakeFingers

(DPDCH 3)

RakeFingers

(DPDCH 1)

ChannelEstimator(CPICH)

CombinerQPSKSymbol

Demapping

DeinterleavingDemultiplexing

ChannelAggregation

1M 2M

TurboDecoder

6M

RakeFingers

(DPCCH)

Combiner

Combiner

CombinerQPSKSymbol

Demapping

QPSKSymbol

Demapping

2M

2M

QPSKSymbol

Demapping

DeinterleavingDemultiplexing

ViterbiDecoder

32K

1M

1M

16K 32K

2M

Sample Rate 318Mips

(1 per antenna)

FEC60Mips/iteration

CRC

CRC

TransmitFilter

Spreading

13 M

20 M

20 M

20 M

10 M

10 M

10 M

75M

31 M 10 M

10 M

14 M

14 M

14 M

62M

Chip/Symbol Rate 299Mips

ChannelEstimator(CPICH)

ChannelEstimator(CPICH)

SIREstimator

+1

-1

-

TPC

SIR Reference

QPSKSymbolMapping

InterleavingMultiplexing

FECEncoding

20 M 20 M

CRC

+

Z-1

TPC

+D

-D

To TX GainEarly-LateGate

14 M 54 M

20 M

75M

75M

Clock/VoltageControl

Samples

/Chip

Clock/VoltageControl

No of Fingers

No of Iterations

y 400Mhz @ 2samples/chipy 2X400Mhz or 800Mhz@ 4samples/chip

y 400Mhz @ 2samples/chipy 500Mhz @ 4samples/chip y 400Mhz up to 6 iterations

y 500Mhz up to 8 iterations

WCDMA Signal Processing Flow

The high data rates in 3G result in multi-code, -antenna, and -despreader(finger) processing requirements

Tony Chun and Hooman Honary

Page 9: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

9

Computational Mix for WirelessComputational Mix for WirelessProtocolsProtocols

0

5

10

15

20

25

30

35

40

Filtering Correl. Sync FEC MisclDSP

MAC

802.11aWCDMA

• Miscl. DSP ~ 10 separate signal processing threads

%

Page 10: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

10

How will we go aboutHow will we go aboutit?it?

Page 11: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

11

Flexibility, Power, and Cost TradesFlexibility, Power, and Cost Trades(Pick two only)(Pick two only)

Flex.

Power Cost

DedicatedH/W

DSPIXS PE

Page 12: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

12

Present Status of Soft RadiosPresent Status of Soft Radiosyy Prior Infrastructure Approaches Prior Infrastructure Approaches

–– DSP + ASICDSP + ASIC–– Inflexible ASIC and Costly DSPInflexible ASIC and Costly DSP

–– DSP + Closely Coupled AcceleratorsDSP + Closely Coupled Accelerators–– Increased Power and Costly DSPIncreased Power and Costly DSP

–– ReconfigurableReconfigurable–– Hard to ProgramHard to Program

–– CostlyCostly

–– High PowerHigh Power

–– Granularity problem has not been completely solvedGranularity problem has not been completely solved

yy Need Evolved ArchitectureNeed Evolved Architecture

Page 13: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

13

Architectural Ob jectivesArchitectural Ob jectives––Client:Client:

–– 2-3x Power/Size2-3x Power/Size of Dedicated Hardware for of Dedicated Hardware forthe most intensive protocol as a goalthe most intensive protocol as a goal

–– Related to no. of protocols possibly in theRelated to no. of protocols possibly in theclient deviceclient device

––BasestationBasestation ::–– 5-10x Power/Size 5-10x Power/Size of Dedicated Hardware forof Dedicated Hardware for

the most intensive protocol as a goalthe most intensive protocol as a goal

–– Related to no. of protocols possibly in theRelated to no. of protocols possibly in theinfrastructure deviceinfrastructure device

Page 14: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

14

General Architectural IssuesGeneral Architectural Issues

––Low power requires a Low power requires a highly distributedhighly distributedarchitecturearchitecture

–– Low voltage helps Low voltage helps quadratically quadratically lower powerlower power–– Low clock frequency linearly lowers powerLow clock frequency linearly lowers power–– Large size penalties associated withLarge size penalties associated with

distributed elements must be avoideddistributed elements must be avoided

––What is the low power What is the low power interconnectinterconnectstrategy?strategy?

––How do we simplify the distributedHow do we simplify the distributedprocessor processor programmingprogramming problem? problem?

Page 15: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

15

Architecture A pproachArchitecture A pproach–– Investigate Homogeneous Processing Elements (PE)Investigate Homogeneous Processing Elements (PE)

– Easy to Scale and to Program for Basestations– Heterogeneous better for Client

–– Interconnect with Nearest Neighbor MeshInterconnect with Nearest Neighbor Mesh– Eliminates High Speed (and power) buses [J. Rabaey, Silicon

Architectures for Wireless, Hotchips 2001 Tutorial]– PHY connections are 95% nearest neighbor

–– Number of Distributed Processing ElementsNumber of Distributed Processing Elements– Driven by:

• Computational Load• Size and Power Constraints• Feature parameters, e.g., Average Load Capacitance, Vdd,

etc.

–– Type of ElementType of Element– General Purpose DSP combined with:– Acceleration of “Standard Operations” with the right granularity

–– S/W programming via High Level LanguageS/W programming via High Level Language– Explicitly indicates parallelism and connections

Page 16: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

16CMOS AFECMOS AFECMOS AFECMOS AFECMOS AFECMOS AFECMOS AFECMOS AFE

Embedded ControllerEmbedded Controller

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

PEPE

PE PE

System ArchitectureSystem Architecture

Page 17: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

'RHV�D�*RRG'RHV�D�*RRG�QHDU�RSWLPDO��3(�QHDU�RSWLPDO��3(6ROXWLRQ�([LVW"6ROXWLRQ�([LVW"

Page 18: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

18

Macro-architectural ConstraintsMacro-architectural Constraints–– First, must meet Power, Size, and Computational LoadFirst, must meet Power, Size, and Computational Load

constraintsconstraints– Computational Load = Rc (ops/sec.)

• Nop = No. of parallel significant operations (multiplies,etc.) in one cycle [R. Brodersen, ISSCC’02]

• Fclk = Clock frequency• Nop x Fclk > Rc

– Power Constraint = Po (mW)• Power (dynamic, leakage (Pleak), short circuit (Psc)) < Po

– Size Constraint = Ac (mm2)• Nop x Aop < Ac

• Aop = Average area of a significant computational unit(e.g., multiplier-memory-address-decoder, etc.) (mm2)

• Aop ~ Granularity Factor– Constraints on Fclk

• Rc / Nop < Fclk

• Rc x Aop/Ac < Fclk

Page 19: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

19

Clock Rate BoundsClock Rate Bounds–– FFclkclk is upper bounded by power constraintsis upper bounded by power constraints

– a x Csw x Vdd2 x Fclk + Pleak < Po/(b x Ac)• where Pleak is the average pwr leakage density in mW/mm2• Csw is the average switching (load) capacitance per mm2• ‘a’ is the activity factor• ‘b’ is the average active area (incl. Datapath, cache, cache

memory bus, etc. and excl. L2 memory, etc.)• ‘b’ varies from ~ 10% for microprocessors to ~ 80% for

dedicated hardware and also is a function of clock gatingstrategies

–– FFclkclk is lower bounded by computational andis lower bounded by computational andarea constraintsarea constraints

– Rc x Aop/Ac < Fclk < (Po/(b x Ac) – Pleak)/ (a x Csw x Vdd2)– Key Issues:

• Find the Fclk that meets upper and lower bounds• Derive the Aop and Nop

Page 20: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

20

General Power, Area, General Power, Area, FFclkclk Trends Trends

Fclk

Power

(Po)

Fixed

Aop

FixedArea(Ac)

Area

(Ac)

Voltage/Power

~ 50 MHz ~ 500 MHz ~ 5 GHz

100

10

1

OptimumArea

Flexibility

InterconnectPower

Granularity

Granularity

Nop

Page 21: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

21

Reconfi gurableReconfi gurable Power Trend Summar y Power Trend Summar y

–– There is an optimum There is an optimum FFclkclk for a fixed for a fixed AAopop– (Recall that Aop is the fundamental processing size)– The optimum meets Size and Computational requirements and

minimizes power for the above– Higher Fclk increases power and lower Fclk increases area and

interconnect power

–– Is there a similar optimum as Is there a similar optimum as AAopop is Varied?is Varied?– As Aop decreases – interconnect Power increases exponentially

• Simpler elements must be connected in a more complexmanner to retain flexibility

– As Aop increases - the voltage requirement (and Power) increases• More complex element requires time-multiplexing

–– Thus, is there a globally “good” design?Thus, is there a globally “good” design?– Conjecture:

• Determine the Minimum Aop (for the flexibility desired) andfind the optimum Fclk

Page 22: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

22

Example of “Good”Example of “Good”Architecture Parameters in theArchitecture Parameters in the

optimum areaoptimum area

––NNopop (No. of parallel Significant (No. of parallel Significantoperations), for 90 nm:operations), for 90 nm:

–– NNopop ~ 50 ~ 50

–– AAopop ~ 0.6 mm~ 0.6 mm 22

– Is this an optimum Granularity A op??

–– FFclkclk ~ 400 MHz~ 400 MHz

–– PPoo ~ 750 ~ 750 mWmW

–– RRcc ~ 20 ~ 20 GOPsGOPs

Page 23: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

.H\�&RPSXWLQJ.H\�&RPSXWLQJ(OHPHQW(OHPHQW,;6�&RUH,;6�&RUH

Page 24: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

24

text

text

X-B

us

Y-B

us

Z-B

us

text

MULT

M-ADD

D-ADD

S-ADD

S-MULT

A0A1A2A3

MULT

M-ADD

D-ADD

S-ADD

S-MULT

A0A1A2A3

MULT

M-ADD

D-ADD

S-ADD

S-MULT

A0A1A2A3

MULT

M-ADD

D-ADD

S-ADD

S-MULT

A0A1A2A3

ALIGN ALIGN ALIGN ALIGN

DMA (64 bits)

text

DAU-Y

DAU-Z

ALU32-bits

MULT16x16

DAU-X

BARRELSHIFT

ControlRegisters

Register File

LOOPBUFFER

ProgramSequencer

Loop, Branch

Instructiondecoder &control unit

textProgram

memory bankProgram

memory bankProgram

memory bank

P-B

us

textData

memory bankData

memory bankData

memory bank

DMABridge

R-Bus

DMA (64 bits)

LXLY

AX

AY

AZ

JTAGController

DebugUnit

VECTOR REGISTER FILE

IXS coreIXS core

¾Efficient Vector processingarchitecture

¾Octal-MAC architecturewith 8/16/32-bit arithmetic

¾Quick loop entry/exitmechanisms

¾Loop buffer¾Data alignment unit¾Resource management

engine¾ Integrated address

generation and control-processing pipeline

Page 25: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

25

Architecture Summar yArchitecture Summar yyy IXS Processor Octal MAC unitsIXS Processor Octal MAC units

–– RISC-tightly coupledRISC-tightly coupled–– Acceleration H/WAcceleration H/W

–– ViterbiViterbi /Turbo/Turbo–– Correlation, De-spreading, etc. Correlation, De-spreading, etc.–– Filter Filter

–– Parameters Parameters within the within the NNopop Range (50)Range (50)–– 5 5 PEsPEs x 9 x 9 MACs MACs = 45 = 45 MACsMACs–– 32 – 8 bit adders per PE32 – 8 bit adders per PE

yy Mesh-Connected to SurroundingMesh-Connected to SurroundingProcessors (5 Processors (5 PEsPEs total) total)

yy Do we have the optimal Do we have the optimal AAopop??–– Lower Lower AAopop will start to increase interconnectwill start to increase interconnect

PowerPower

Page 26: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

26

How does the IXS PEHow does the IXS PECompare againstCompare against

Dedicated Hardware?Dedicated Hardware?

Page 27: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

27

Power and Area Efficiency of IXS PE Power and Area Efficiency of IXS PE vsvsDedicated H/W for WLAN BenchmarkDedicated H/W for WLAN Benchmark

Still 5-7x Dedicated H/WStill 5-7x Dedicated H/W

0

1

2

3

4

5

6

7

Power Area

IXS-PE Arra y

W LANAve.Estim.

Baseband PHY and lower MAC estimates

(all scaled to 90 nm)

Page 28: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

28

How do we com pareHow do we com pareagainst otheragainst other

Reconfi gurableReconfi gurableApproaches?Approaches?

Page 29: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

29

How Does our Architecture Compare?How Does our Architecture Compare?Multi-User Detector BenchmarkMulti-User Detector Benchmark

0

5

10

15

20

25

30

Pow er Area

G P-DSP(BW RC )DSP Exten.(BW RC )Intel IXSHomo g eneousBerkele yPleiades DedicatedHardw are

BWRC and Lee Snyder

Page 30: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

30

Die PhotoDie Photo

DSP Core

RISCCore

Page 31: $ 1HZ 'LVWULEXWHG '63 $UFKLWHFWXUH %DVHG RQ WKH … · WCDMA Signal Processing Flow The high data rates in 3G result in multi-code, -antenna, and -despreader (finger) processing requirements

31

Summar ySummar y–– Homogeneous Mesh-Connected Array of IXSHomogeneous Mesh-Connected Array of IXS

Processing Elements for InfrastructureProcessing Elements for Infrastructure–– Low power/size (5-7x dedicated h/w)Low power/size (5-7x dedicated h/w)–– Flexibility where it’s neededFlexibility where it’s needed–– ScaleabilityScaleability–– For given size/power and feature size constraints aFor given size/power and feature size constraints a

“good” solution can be found“good” solution can be found–– Key Processing elementKey Processing element

– Minimum Memory– “Maximum-Datapath” Units

–– Next Steps:Next Steps:–– “What is the optimal “What is the optimal AAopop Size?”Size?”–– “What is the right Arch. for the Client?”“What is the right Arch. for the Client?”