8 Vertex Shader Pipelines

Zooming in on the geometry pipelines, G70 has two more vertex
shaders than the NV40/45 with a total of 8 vertex shaders handling geometry
processing. This is to speed up geometry processing in a polygons intensive
scene. NVIDIA has re-architecture the vertex shader unit where each of them can
process more ops per clock than the NV40/45. To be precise, 8 FP MADDs
(Multiply-ADD) operations per clock and in a single cycle. The single-cycle
MADDs can boost up to 30% improvement in scalar math ops. NVIDIA mentioned that
texture fetch efficient is improved significantly especially for large textures
due to new hardware algorithms and better caching to speed up filtering and
blending operations.

24 Pixel Shader Pipelines


NV40/45


G70

G70 is based on the SIMD (single-instruction, multiple-data)
architecture that delivers massive parallelism by executing the Shaders in
parallel for all its 24 pixel pipelines. On the G70, the 24 pixel pipelines are
divided into 6 Quads with each quad having 4 pixel shader units. Certainly we
knew that pipelines play an important role in the performance of today’s games
therefore NVIDIA and ATI will keep adding more and more pipes for their next
generation GPUs. Clearly ATI has taken the road to unified shader as evident in
the Xenon GPU with 48 shader pipes which means that there are no discrete pixel
or vertex shader units but instead combined into a set of general execution
units serving either pixel shader or vertex shader instructions. The advantage
of a unified shader approach is unproven yet but generally it should benefit the
geometry processing more since vertex shader units are always lesser in number
than pixel shader units. 24 pixel shader units vs 8 vertex shader units in the
case of G70.

NVIDIA claims to have modeled 1300 common shader algorithms in
games to determine the best usage model for the shader units on G70 so as to
spot and eliminate bottlenecks. As a result, G70 is retrofitted with a new pixel
shader unit design to deliver twice as many FP ops and much more math than the
NV40/45. On the NV40, the pixel shader pipeline consists of two shader units
with a texture unit in between and each shader unit can process 4 ops per pixel
which translate to up to 8 ops per pixel. Compared to the G70, each shader unit
can process 10 ops which translates to 20 ops per pixel. As such each pixel
pipeline in the 7800GTX is optimized to deliver 50% more efficiency when
comparing clock to clock against the NV40/45. G70 can perform two four-component
MADDs per fragment per clock compared to NV40 one four-component MADD operation
per fragment per clock.

 

16 ROP Pixel Pipelines


NV40/45


G70

The ROPs (Raster Operators) task is to convert fragments into
pixels and apply Multisample AA if need be, does color and Z compression as well
as sending the completed pixel data the frame buffer. In NV40, there are 16 ROPs
to 16 Pixel Shaders where each ROP contains a Z ROP and a C ROP and is capable
of writing one color Z pixel or 2 Z/Stencil value per clock cycle. However, in
G70 there are 16 ROPs to 24 Pixel Shaders so perhaps NVIDIA realize that there
are no bottlenecking with fewer ROPs in the case of 6600GT which as 4 ROPs to 8
Pixel Shaders. Double-speed Z is useful especially in shadow calculations so
that is partially why NV40 excels Doom 3.