NVIDIA Hardware

of 33 /33
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 1 NVIDIA Hardware NVIDIA Hardware Karl Hillesland Karl Hillesland November 2, 2000 November 2, 2000

Embed Size (px)

description

NVIDIA Hardware. Karl Hillesland November 2, 2000. Major release in fall, improvement in spring NV10: GeForce 256 (Fall 1999) NV15: GeForce2 GTS (Spring 2000) NV11: GeForce2 MX (Summer 2000) NV16: GeForce2 Ultra (Fall 2000) NV20: ??? (Anandtech: Dec 2000 - April 2001) - PowerPoint PPT Presentation

Transcript of NVIDIA Hardware

nVIDIA HardwareNVIDIA Hardware
Karl Hillesland
Cards discussed
NV10: GeForce 256 (Fall 1999)
NV15: GeForce2 GTS (Spring 2000)
NV11: GeForce2 MX (Summer 2000)
NV16: GeForce2 Ultra (Fall 2000)
NV20: ??? (Anandtech: Dec 2000 - April 2001)
NV25?: X-Box (Fall 2001)
GeForce 256
120 MHz core
128 bit, 166 MHz SDR or 150 MHz DDR, up to 128 MB (64 MB biggest I’ve ever heard of)
AGP 4x with fast writes
350 MHz RAMDAC
GeForce 256 Triangles
15 MTris/s (BenMark5 gives 13M. Have seen other references to 14.5M)
Up to 6 triangles “in-flight” at a time
2 matrix Vertex skinning
8 lights
BenMark5
Four Independent Pipelined Engines
Extremely efficient
Up to 6 triangles “in flight” at a time
Super-pipelined design
QuadEngineTM Architecture (from summer 99 notes)
The QuadEngine design affords the PC OEM (and subsequently the end user) a number of benefits.
Performance is predictable and insensitive to CPU speed. Sub-$1000 PCs that include NV10 GPUs will score competitively with much more expensive PCs on graphics intensive benchmarks such as 3DWinbench2000 and GameGuage games based on OpenGL. 3DWinbench99 and other DX6-based benchmarks will benefit from NV10’s rendering speed, but were not designed to use dedicated transform and lighting engines and therefore do not offload as much work from the CPU.
Another benefit of the QuadEngine design is a balanced architecture. Because the engines were explicitly designed at the same time and to the same performance specs, the entire pipeline is tuned for maximum throughput rather than one stage of the pipeline chronically being the bottleneck, as is the case when transform and lighting operations are done on the host CPU..
The final advantage of the single-chip, QuadEngine design is the fact that the latencies between the engines can be managed and minimized as part of the design. Discrete chip solutions cannot achieve this level of optimization because of the challenge of die-to-die data transfer latencies and the fact that many multi-chip solutions use a geometry processor designed by a different company altogether
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
GeForce 256 pixels/texels
4 pixel pipes, one texture each. Can do 2-texture multi-texturing by coupling pipes
24/8 bit Z/stencil, 32 bit color (note: 4*(24+8+32)=256)
Register Combiners
Texture Compression
GeForce 256 -> GeForce2 GTS
2 textures per pipe
25M Tris/s (BenMark5 gives 24M Tris/s)
Flat panel
GeForce2 GTS GeForce2 MX
Remove two pixel pipes (left with 2, 2 textures each)
Dual head support
“Digital Vibrance Control”
Either 64 or 128 bit memory possible
Cheaper: (intended for ~ $100 range)
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
GeForce2 GTS GeForce2 Ultra
Faster memory: 225 MHz DDR ( “450” MHz)
Expensive: ~ $500
GeForce Quadro
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
Bandwidths
PCI: 132 MB/s
Vertex Bandwidth
texture coords, 2 textures: 2 * 2 * 2 = 8 bytes
color: 4 bytes
position 2 * 3 = 6 bytes
color: 4 bytes
Vertex Bandwidth, Q3
PCI: 132 MB/s / 18 = 7.3 M Verts/s
Host: PC100 with SDRAM: 1.6 GB/s / 18 =
88 M Verts/s
Add indices
Each triangle -> 3 indices, 1 new vertex
18 + 2 bytes/index * 3 indicies/tri = 20 bytes/tri
indicies and verticies may come across different busses
Vertex cache can save some bandwidth
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
Texture Compositing
Register Combiners
Provides configurable 8-bit, signed math per-pixel operations
Cascading of register combiners for more sophisticated computations (Hardware limit on levels. Currently 2)
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
Register Combiners
Spare 0
Fragment Color
Input/Output mappings
Input mappings
General Combiner, RGB
-or-
General Combiner, Alpha
-or-
Final Combiner
E F
X-Box (Abrash on Dr. Dobbs)
Intel PIII/733 with 238 KB cache
250-300 MHz Core
DVD, hard disk
X-Box Transform/lighting
+one infinite light, 62.45 MTris/sec,
8 local lights 8 MTris/sec
125 M particles/s (single color front-facing squares)
Vertex Programs
Surface engine “works with CPU” for Catmull-Clark, Bezier, Loop, and uniform B-splines at 50Mtris/sec
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
Vertex Programs
Vertex Programs
Create a transformed vertex
Vertex Programs cont.
17 Instructions:
ARL, MOV, MUL, ADD, MAD, RCP, RSQ, DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, LOG, LIT
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
Vertex Program Registers
Using Vertex Programs (OpenGL)
Created/managed similar to texture objects
No penalty for switching in and out of vertex program mode
execution time ~proportional to length of program
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
X-Box memory bandwidth
64 MB, 128 bit, 200 MHz DDR RAM
1 GPix/sec fill rate + “occlusion circuitry”
“automatic z compression”
X-Box bandwidth diagram
X-Box Textures
4 textures per pixel (but takes two clocks for >2)
One texture can be used as lookup to next texture
8 general register combiners + final combiner
3D Textures
Karl Hillesland - NVIDIA Hardware - 11/2 - Slide *
Texture compression (OpenGL)