Following the release of ATI’s flagship Hemlock, we first witnessed the first picture of a working Fermi based Geforce system, and now at SC ’09, Nvidia have demonstrated Fermi going up against GT200. All in one day!

The demonstration is a N-body simulation running on CUDA with 20,480 body interactions in double precision. This demonstration was designed to show off Fermi’s massive DP performance increase over GT200. Of course, being a supercomputing conference, these results are hardly relevent for gaming. The GT200 GPU was a Tesla C1060 and the Fermi was an Tesla 20 sample. The results? 3.5 fps for the C1060, 21.72 fps for Fermi.

More details next page.


Following the release of ATI’s flagship Hemlock, we first witnessed the
first picture of a working Fermi based Geforce system, and now at SC
’09, Nvidia have demonstrated Fermi going up against GT200. All in one
day!

The demonstration is a N-body simulation running on CUDA with 20,480
body interactions in double precision. This demonstration was designed
to show off Fermi’s massive DP performance increase over GT200. Of
course, being a supercomputing conference, these results are hardly
relevent for gaming. The GT200 GPU was a Tesla C1060 and the Fermi was
an Tesla 20 sample. The results? 3.5 fps for the C1060, 21.72 fps for
Fermi.

The end result is a 6.2x increase in a DP simulation, not far off from previous expectations of 8x. This 8x figure is compared to the GTX 285, whereas the C1060 is based on the slower GTX 280, so we should’ve been expecting 9-10x improvements (768 vs. 78 Gflops). Even considering scaling inefficiencies, it is clear Nvidia have missed their previous expectations by missing their clock target as 6.2x is nowhere near 9-10x. Do remember that this is A2 silicon, and the final A3 silicon might be a tad faster. However, what this demonstration really does is shows GT200′s weakness in double precision computation. The Tesla C1060 boasts of 78 GFlops, whereas Tesla 20 reportedly brings DP performance to over 600 GFlops (short of the 768 GFlops target). AMD has always been stronger in double precision however, with the modest 18 month old HD 4870 managing 240 GFlops. The current generation HD 5870 performs at 544 Gflops, not far away from Fermi in what is supposed to be one of Fermi’s strong points.

So what does double precision have to do with gaming graphics? Almost nothing. The fact remains that in raw theoretical (single precision) compute performance, Fermi is miles behind HD 5870 (<1.5 TFlops vs. 2.72 TFlops). And unlike RV770, which had a significant disadvantage in ROP and TMU over GT200, the gap has narrowed bridged with Cypress vs. Fermi. The only gaming graphics related advantage Fermi holds right now is memory bandwidth. Normally, we could say Nvidia has the more efficient driver team (thanks in part to TWIMTBP) as an advantage, but AMD have just too long a lead time to get their drivers matured. Though, we cannot count out the Geforce Fermi being different from Tesla Fermi and there being specific gaming optimizations. But we are not aware of any such features as of now.

On the bright side, we have seen Fermi – in both Tesla and Geforce form – up and running today, and that is a real sigh of relief! These are likely to be A2 silicon, whereas the final shipping products will be A3. Geforce products are expected in late Q1 2010, and Tesla products in Q2 2010, assuming everything goes smoothly.

You can watch a video of the N-body demonstration on Youtube.

Reference: VizWorld