Nvidia Kepler GPUs – back to games, away from compute?
Nvidia's first 'Kepler' generation GPU, the GK104-based GeForce GTX 680, is finally out officially. The benchmarks show high game performance, but is – compared to the old 'Fermi' at least – GPU compute performance pushed aside here?
Some months ago, when I chatted in Shenzhen with Simon Moy, the ex-Nvidia man who designed the first GPU vertex shader for them, and brought the GPU compute idea into the reality, we discussed the impacts that replacing the hardwired 3-D acceleration with fully programmable, increasingly CPU-like, graphics processors suited also for compute tasks, would have. One agreed impact was, at the beginning, loss in some 3-D performance metrics tied to the fully hardwired accelerators but gain in the other new routines that may in return help back even with the gaming 3-D experience, such as complex effects and physics.
The overall consensus was that, in the long run, it is that extra compute performance that will bring more aggregate benefit than just pure 3-D gaming hardware. And, I guess, that's what to some extent guided Nvidia 'Fermi' GeForce 5xx generation designers few years ago.
It seems that, despite the even more fine grained architecture, the new 'Kepler' generation is back to the gaming acceleration focus. Nvidia was under high pressure to deliver the new benchmark winner ever since AMD took the clear GPU lead with the HD7970 as the first 28nm high end graphics processor implementation. The GK100, successor to the GF100 and GF110 in GTX580, was supposed to be the high end Kepler chip to bring the performance lead back to Nvidia on most fronts, but for reasons known really only to Nvidia itself, that chip never materialised. Instead, it was the 'mid-high end' successor to the GF104 found in the GTX560 cards, the GK104 we see now, that had to take the high end battle position.
You've surely already seen the endless slides published all over the web today about the brand new gaming features, effects and yes, very decent game benchmark results for the new part. Our Lennard has covered those in detail earlier today, and most sites have shown Nvidia's architecture slides. The change from SM shaders to more fine grained SMX is, obviously, the most noticeable change at the hardware architecture level.
However, something else was worth observing – the GPU compute performance of the new part was lacking, in fact it was slower than the GTX580 on about all compute benchmarks. It is kind of understandable, since the GK104 chip in it was to originally replace the GF104 (GTX 460) - which also didn't focus on compute, being not the high-end offering at the time. So, the crippled 1/12 double precision FP rate vs single precision is seen as OK, if compared with the similar ratio on the HD7870, which was (supposed) to be its main competitor. Of course, the other problem is that its single precision FP is also seemingly identical to the HD7870, a mid range US$349 card.
However, Nvidia is positioning the GTX680 squarely against the HD7970 now, and that's a big difference – HD7970 has waaay faster single precision FP, and many times faster DP FP, thanks to keeping the DP FP rate at 1/4 of SP FP, basically not crippling it to force users into plonking thousands of bucks for an otherwise similar 'pro' card.
Since the GPU compute apps have become increasingly popular – look at even Adobe and other big names – curtailing the FP performance just to push overpriced pro parts would constitute ripping off the customers. What if, for instance, Intel (or AMD) crippled the FP performance on their consumer Core i7 processors just to ensure that whoever needs more math capabilities MUST buy the otherwise identical, but more expensive, Xeons? Everyone would cry foul, wouldn't they?
So, while blocking the extra FP in a US$ 200+ card may be acceptable, a US$ 400 or higher top-end 'cream of the crop' should not have such crippling 'features'. One may say 'who cares, this card is only for gamers', but the point is – thanks a lot to Nvidia and its GPU (CUDA) compute promo efforts too - many PC apps these days do benefit from this acceleration, and an increasing number of them used double precision FP for the sake of, ehm, extra precision.
I am quite sure that the planned GK110 part, which now should be the true follow-on to the GF110 from the GTX580 card, should have better GPU compute performance when it comes out. The point is, don't cripple the FP – the users who really want Quadros and Teslas will buy them for the sake of their larger memory, expert support and validation anyway, but those of the users who want to have the FP benefits of GPU in their consumer cards should be let to enjoy it, since it is all already in the hardware, and it will create some extra goodwill as well. Why cripple it?