NVDA Tegra 3 Nvidia Tegra 4 to get GPGPU i.e. GPU Computational Capabilities, Kepler Inside?

In the discussions surrounding the launch of NVIDIA Kepler (GK104 – GTX 680 and GK107 – GT 640M) architecture, we learned a lot of interesting details from NVIDIA engineers and marketing staff. Interested in GPGPU in a smartphone?

During and after the recent Kepler launch, we decided to stay in Silicon Valley for as long as possible and talk to silicon manufacturers such as TSMC and Common Platform, but also talked to NVIDIA engineers and other employees.

One subject of our discussions was Project Denver and the pace of its development, but also the development of Tegra 4 i.e. T40. As we all know, Tegra 2 and Tegra 3 use what is essentially NV4x-class GPU (GeForce 6000, 7000 Series, RSX from Sony PlayStation 3). As such, we were not surprised to hear that benchmark results are not ideal against the competiton. Truth to be told, if Imagination Technologies and ARM took one more year to beat what is an eight year old PC architecture, it would be borderline ridiculous.

In an email which Jen-Hsun Huang, recently sent to all of NVIDIA employees, the outspoken co-founder and CEO of NVIDIA stated the following:

"—–Original Message—–

From: ******@nvidia.com
Sent: Thursday, March 22, 2012 9:48 AM
To: Employees
Subject: Kepler Rising

Today, the first Kepler – GTX 680 – is on shelves around the world!

Three years in the making.  The endeavor of a thousand of the world's best engineers.  One vision – build a revolutionary GPU and make a giant leap in efficient-performance.

Achieving efficient-performance, great performance while consuming the least possible energy, required us to change our entire design approach.  Close collaboration between architecture-design-VLSI-software-devtech-systems, intense scrutiny on where energy is spent, and inventions at every level were necessary. The results are fantastic as you will see in the reviews. 

Kepler also cultivated a passion for craftsmanship – nothing wasted, everything put together with care – with a goal of creating an exquisite product that works wonderfully.  Let's continue to raise the bar and establish extraordinary craftsmanship as a hallmark of our company.

Today is just the beginning of Kepler.  Because of its super energy-efficient architecture, we will extend GPUs into datacenters, to super thin notebooks, to superphones.  Not to mention bring joy and delight to millions of gamers around the world.

I want to thank all that gave your heart and soul to create Kepler.  You've created something wonderful.

Congratulations everyone!

Jensen"

 

As you can see, NVIDIA has plans to extend the Kepler and subsequent architectures in the superphone space. That future might not be that far away since with their 28nm T40 SoC i.e.  "Tegra 4" for superphone/tablet/clamshell market and the first Project Denver (8-Core PD+ SMX) for ultra-thin notebooks, laptops and All-In-One desktops, the situation will change.

According to the sources in the know, Tegra 4 will utilize a GPGPU-compliant, i.e. CUDA-compliant graphics core which is very power savvy. Since a single Kepler-based 20W GPU is enough to run Battlefield 3 with Ultra details in 1366×768 resolution, a 1W GPU should be more than good enough to run Angry Birds in current and next-gen smartphone and tablet designs. Also, with the breakthrough in smartphone i.e. superphone design, you can expect that next generation of superphones such as Motorola's Droid Razr MAXX to carry 2500-3300 mAh batteries, and that will get you both the desired thinness and long battery life.

Using multiple ARM Cortex-A15 cores and a reduced Kepler SMX will enable the company not only to reach the performance of 2013 SOCs powered by competing Mali-T600 Series and PowerVR SGX54x, but to leapfrog the competition in the same manner its mainstream GPU chip beat a high-end competitor from AMD (History Lesson: GK104 was originally scheduled to be the GeForce GTX GTX 670, then GTX 670 Ti and ultimately it was renamed into GeForce GTX 680, and boosted the price by $150 to $499 – originally, it was $299 for 1GB and $349 for 2GB).  

When it comes to computational capabilities, we doubt that we are going to see native support for double precision, as it is pretty unnecessary for the targeted platform (smartphones and tablets and FP64 DP, seriously?). The fact of the matter is that Fermi and Kepler have separate CUDA units for double precision and should you remove those "invisible cores", the power will go down considerably (NVIDIA is not showing special FP64 CUDA cores on the architectural diagrams, otherwise you'd have more than 512 or 1536 units per die).

The amount of cores is limited by thermals, but we were assured that we can expect between 32, 48 or 64 CUDA cores in a mobile chip. Read: Fermi core arrangement (SM was 32/48 in GF100/GF104) with power-saving Kepler execution architecture. The actual T40 chip was taped out very late last year, and NVIDIA has had silicon since the Holiday Season.

All in all, 2013 looks to be a real cut throat year in mobile, this time on the chip side of things.

The 28nm battle so far looks like this:

  • NVIDIA T40 (4-5x Cortex-A15, 32-64? CUDA core Kepler ULP)
  • Apple A6 (4x Cortex-A15, 4x PowerVR SGX54xMP4)
  • Qualcomm MSM8974 (Quad-Core Krait 2-2.5GHz, Adreno 320 3D GPU)
  • TI OMAP6 (4x Cortex-A15, Mali T600)
  • Samsung Exynos (4x Cortex-A15, PowerVR)

You can expect Tegra 4 to debut by the end of 2012 or in the first quarter of 2013, with real availability to follow in 2013. The reason for the late debut is simple – Icera integration (wireless baseband) takes time…