We all know it is constantly being delayed – but at the end of all those delays, what we were expecting was a stellar product from Nvidia. However, with today’s press release, certain inconvenient details are revealed. Let’s forget about the delays for now, and just consider the product itself.

The first Fermi GPU – GF100 – as we know for a while now, is a 3 billion transistor giant, taking a die size of around 500 mm2. Compare this with the 2.15 billion transistor, 330 mm2 Cypress on the same 40nm TSMC process, and you would be expecting a different class of product. Unfortunately, the details revealed today about cast an uncertain shadow over this basic assumption.

More details next page.

We all know it is constantly being delayed – but at the end of all those
delays, what we were expecting was a stellar product from Nvidia.
However, with today’s press release, certain inconvenient details are
revealed. Let’s forget about the delays for now, and just consider the
product itself.

The first Fermi GPU – GF100 – as we know for a
while now, is a 3 billion transistor giant, taking a die size of around
500 mm2. Compare this with the 2.15 billion transistor, 330
mm2 Cypress on the same 40nm TSMC process, and you would be
expecting a different class of product. Unfortunately, the details
revealed today about cast an uncertain shadow over this basic
assumption.

The first thing worth noticing is a complete and total absence of Single Precision performance figures or any comparison to direct competition – i.e. ATI’s GPGPUs. It is clear that Fermi’s real performance advantage would be Double Precision performance – had it hit the right clock speeds.

However, today’s press release suggests Nvidia have missed target speeds by a lot. To be fair, Tesla products do clock lower, though not by much. In fact, GTX 280 and Tesla C1060 were clocked the same. Even taking a generous increase for Geforce products, things are still uncertain. As a result, DP performance is rated at between 520 GFlops and 630 GFlops. Suddenly, ATI Radeon HD 5870 – which wasn’t even supposed to be a direct competitor – is performing right on par with 544 GFlops against Fermi’s supposed strong point.

Consider Single Precision – far more important for gaming graphics, and things turn rather ugly. GF100′s target speeds were reported to be 1.5 GHz for the shaders. Based on the 520 / 630 GFlops figures, the shader clocks can only be estimated at 1015 MHz and 1230 MHz respectively.

The SP theoretical performance from 512 CUDA cores? Between 1.05 TFlops and 1.26 TFlops. Even less than 1.05 TFlops, considering the lesser part is likely to have units disabled. Now, no amount of overclock can bridge the enormous gap to the smaller, already available HD 5870, which stands pretty at 2.72 TFlops. Even the mainstream HD 5770 clocks in at 1.36 TFlops! Barring a different clock speed for SP units, or other technology we are unaware of, this is a dismal performance from the Fermi shaders. Sure, Nvidia’s shaders are much more efficient, but this is just too massive a gap to claw back.

Then comes the price. A previous-gen C1060 released at $1699, falling to $1199. Compare this with fellow Geforce model, GTX 280, released at $649, falling quick to $500, and finally $300. The price of a next-gen C2070 is a whopping $3999. Nearly double the price as the previous generation C1060. Clearly, these are expensive products to make, so how much can Nvidia sell a Geforce version of Fermi for? Even the cheapest Tesla 20 variant, the C2050 costs $2499, nearly 50% more than the GT200 based C1060 flagship. Can Nvidia sell the $3999 Tesla product at $399 as a Geforce product?

So far, we are comparing GF100 to Cypress. Where, in reality, GF100 should be compared to Hemlock. 4.64 TFlops vs. 1.26 TFlops is not much of a comparison at all, however, CF limitations, ATI’s less efficient shaders aside.

The other, much less common rumour is the possibility of a dual-GPU Fermi product. Well, considering “Typical” power of Tesla 20 is 190W, this will be highly unlikely, at least for a while. Not to mention, Geforce products might end up clocked higher. A HD 5870s peak TDP is 188W, lower than GF100′s “typical” power! TDP is expected to be 220W, at least, and that is just too hot for a dual GPU product.

And we have not factored in the fact that GF100 is nowhere to be seen, and are unlikely to be on shelves in quantity for at least 4-5 months. Any further delays, and we will be looking at new products from AMD.

In the end, AMD have a solid product already available that is efficient, economical, scalable. Nvidia have ink on paper – and even that is not looking as promising as we may have hoped for. At this moment, we can only hope for “hidden” or “magical” gaming features which might bring about a revolution in how GPUs work. Short of that, all signs point to Fermi being in real trouble.

Reference: Nvidia Press Release, Tesla 20 PDF