NVIDIA GeForce 8800GTX Review
It is here! The first DirectX 10 Graphics Card, the NVIDIA
GeForce 8800GTX! The GeForce 8800 GTX GPU implements a massively parallel, unified
shader design, consisting of 128 individual stream processors running at 1.35
GHz. Being Unified in Shaders, each processor is capable of being dynamically allocated to vertex, pixel,
geometry, or physics operations. We take a thorough look at the architecture, performance, overclocking and image quality of this powerhouse!
With its unified pipeline and shader architecture, GeForce 8800 GPU design significantly reduces the number of pipeline stages, and changes the sequential flow to be more looping oriented. Inputs are fed to the top of the unified shader core, and outputs are written to registers, and then fed back into the top of the shader core for the next operation. The classic pipeline uses discrete shader types represented in different colors, where data flows sequentially down the pipeline through different shader types. The illustration on the right depicts a unified shader core with one or more standardized, unified shader processors.
Data coming in the top left of the unified design (such as vertices), are dispatched to the shader core for processing, and results are sent back to the top of the shader core, where they are dispatched again, processed again, looped to the top, and so on until all shader operations are performed and the pixel fragment is passed on to the ROP subsystem.
The GeForce 8800 design team realized that extreme amounts of hardware-based shading horsepower would be necessary for high-end DirectX 10 3D games. While DirectX 10 specifies a unified instruction set, it does not demand a unified GPU shader design, but NVIDIA GeForce 8800 engineers believed a unified GPU shader architecture made most sense to allow effective DirectX 10 shader program load-balancing, efficient GPU power utilization, and significantly improved GPU architectural efficiency.
Note that the GeForce 8800 unified shaders can be also be used with DirectX 9, OpenGL, and older DirectX versions. No restrictions or fixed numbers of unified shading units need to be dedicated to pixel or vertex processing for any of the API programming models.
In general, numerous challenges had to be overcome with such a radical new design over the four year GeForce 8800 GPU development timeframe. Looking more closely at graphics programming, we can safely say that in general, the number of pixels outnumbers vertices by a wide margin, which is why you saw a much larger number of pixel shader units versus vertex shader units in prior fixed shader GPU architectures.
But different applications do have different shader processing requirements at any given point in time—some scenes may be pixel-shader intensive and other scenes may be vertex shader-intensive. In a GPU with a fixed number of specific types of shader units, restrictions are placed on operating efficiency, attainable performance, and application design. For illustration, the figure below shows a theoretical GPU with a fixed number of four vertex shader units and eight pixel shader units, or a total of 12 shader units altogether.
The top scenario shows a scene that is vertex shader intensive, and it can only attain performance as fast as the maximum number of vertex units, which in this case is “4”. In the bottom scenario, the scene is pixel shader intensive, which might be due to various complex lighting effects for the water. In this case, it is pixel shader limited, and can only attain a maximum performance of “8”, equal to the number of pixel shader units, which is the bottleneck in this case. Both situations are not optimal, because hardware is idle and performance is left on the table so to speak. Also, it’s not efficient from a power (performance/watt) or die size and cost (performance/sqmm) perspective.
With a unified shader architecture, at any given moment when an application might be vertex shader intensive, you can see the majority of unified shader processors are applied to processing vertex data, and in this case, the overall performance is increased to “11”. Similarly, if pixel shader heavy, the majority of unified shader units can be applied to pixel processing, also attaining a score of “11” in the example below.
Unified streaming processors (SPs) in GeForce 8800 GPUs can process vertices, pixels, or geometry—they are effectively general purpose floating point processors. Different workloads can be mapped to the processors, including Physics and other possible workloads we may see in the near future. Note that geometry shading is a new feature of the DirectX 10 specification. The GeForce 8800 unified stream processors can process geometry shader programs, permitting a powerful new range of effects and features, while reducing dependence on the CPU for geometry processing.
The GPU dispatch and control logic can dynamically assign vertex, geometry, or pixel operations to available SPs without worrying about fixed numbers of specific types of shader units. In fact, this feature is just as important to developers, who need not worry as much that certain aspects of their code might be too pixel shader intensive or too vertex shader intensive. Then again, many developers would still be mindful of what type of hardware majority of gamers are running...
Not only does a unified shader design assist in load-balancing shader workloads, it actually helps redefine how a graphics pipeline is organized. In the future, it is possible that other types of workloads can be run on a unified stream processor.
















