Epic Games’ Tim Sweeney and AMD don’t see eye-to-eye on hUMA
As AMD pushed hUMA as the next revolution in processing, one of the gaming sector’s top developers doesn’t see the value in AMD’s new architecture.
For AMD and its true believers, the forthcoming heterogeneous unified memory architecture (hUMA) will be a revolution in processing similar to the system on a chip found in its Accelerated Processing Unit.
Within an APU, the CPU and GPU exist on the same die space increasing computing efficiency; the CPU’s serial workloads accelerated by the GPU’s parallel workloads can on paper provide an incredible boost to overall floating point performance.
There is however an inherent problem in tying together the CPU and GPU: the chain’s speed is limited by the latency of the CPU and GPU’s discrete memory pools. When used in parallel, the CPU and GPU have to send data back and forth between two memory address spaces creating inherent inefficiencies in the process.
AMD’s hUMA looks to erase this inefficiency by creating a unified memory address space between the GPU and CPU. Hypothetically, having a unified memory address space would allow the CPU and GPU to have a harmonious parallelism which would allow the CPU and GPU to only work on the tasks best suited for the respective chip.
At Computex 2013, AMD’s Lisa Su hyped the virtues of hUMA promising that it would bring an enormous performance boost in gaming applications — one of the most common examples of CPU-GPU computing for the everyman.
One developer was not necessarily convinced. In an email to VR-Zone, Epic Games’ Tim Sweeney says that the non-uniformity of programming languages between the GPU and CPU will still be a barrier even with hUMA.
Uniform memory access to a cache coherent shared address space is a very welcome improvement to current CPU/GPU model. On the PC platform, the challenge is how to expose it. DirectX? OpenGL? Their pace of adopting new technology is mighty slow for this significant a change. And the bigger source of non-uniformity, the programming model uniformity (C++ on CPU vs OpenCL/CUDA on GPU) remains and is an enormous barrier to the productive movement of code among processing units. Ultimately, I would love to see vector computing hardware addressed in mainstream programming languages such as C++ through loop vectorization and other compiler-automated parallelism transformations, rather than by writing code in separate languages designed for GPU.
VR-Zone reached out to AMD to respond to Sweeney’s statement and AMD corporate fellow Phil Rogers, who is seen as the go-to-guy for all things heterogeneous computing at the company, had this to say:
Like AMD, it seems Tim Sweeney clearly sees the future that HSA is aiming at: single source, high level language programs that can run both on the CPU and GPU. This is ultimately what will allow Tim, and hundreds of thousands of other software developers, to easily write software that accelerates on HSA platforms. This is exactly why we are developing HSA – unifying the addressing, providing full memory coherency and extending the capability of the GPU parallel processor to fully support C++. We are re-inventing the heterogeneous platform to eliminate all of the barriers that currently prevent a C++ or Java program offloading its parallel content to the GPU.
Tim is correct that in addition to the arrival of the HSA platform, the programming model must evolve to give easy access to GPU acceleration. OpenCL 2.0 and C++ AMP are very good steps in this direction. OpenCL 2.0 brings unified addressing and memory coherency options. C++ AMP is a single source approach that adds just two new keywords, restrict and arrayview, to allow particular methods to be compiled to both CPU and GPU and offloaded opportunistically. The HSA platform will allow both of these programming models to evolve still further towards the pure C++ model that is natural to the programmer.
The economics of code
A legitimate question that Sweeny asks and AMD dodges is whether hUMA is worth it considering the economics of coding for such a platform.
As this slide points out, the performance payoff for the additional complexity of coding for hUMA is marginal. The inclusion of a hUMA like architecture in the Playstation 4 is being cited as a competitive advantage the console has over the competition. But considering some developers’ hesitation to cite a definite performance advantage that hUMA presents, the idea that a revolution in processing comparable to the introduction of the APU is dubious at best.