Fusion APUs to continue to be memory speed-sensitive?
The current AMD Fusion APUs, Brazos and Llano, do provide a good balance of integrated CPU and GPU performance, although the system designers, or the PC assemblers using them, should note that there is quite high performance dependence on the memory speed used, especially for the quad-core Llano. This is expected to be even more the case for the upcoming 'Trinity' successor to Llano
The AMD Fusion philosophy has come to its initial fruition this past year, with the first two generations, the entry level Brazos and midrange Llano, having successfully been unveiled by now. The challenge of putting together two distinctly different processing unit types with nearly opposite performance and bandwidth requirements does affect the component choices at the board level, especially when it comes to the memory modules.
Right now, entry-level Brazos and mid-range Llano APU families form the current spread of AMD Fusion offerings. In short, they each offer a balanced CPU-GPU combination, with Brazos combining a very low power dual-core CPU with an entry level GPU, and Llano having a quad-core Phenom-class CPU with a mid-range Radeon 6500-class GPU.
From the system architecture point of view, both Brazos and Llano are not very demanding parts, fitting well within the expected limitations of their respective target markets. Since Llano doesn’t have a L3 cache, the benchmark in the reviews up to now showed quite a bit more load on the memory system from the combined CPU and GPU access pattern, so you will very likely notice more substantial overall performance improvements when integrating faster memory – both lower latency for the CPU-bound apps, and higher bandwidth where CPU and GPU parts share the access paths. Think of it as a 20% or more performance headroom when moving from DDR3-1333 CL7 to DDR3-1866 CL8.
As the first Bulldozer core for the desktop doesn't seem to have the impact on the market as expected due to its performance issues, the next Fusion APU from AMD will have an updated ‘Piledriver’ core which, hopefully, will address some of the problems that the initial Bulldozer has. If we do get some 20% extra performance there over the current Bulldozer, the Trinity CPU portion should show good performance jump vs the current Llano one. Else, we may have a problem…
The more critical aspect of Trinity APU is the higher bandwidth with the CPU portion, as well as tighter interconnect to the GPU portion, which by itself should be up to 50% faster than the one in the Llano. At the same time, the socket compatibility with Llano requires the dependence on the very same dual-channel memory system, restricting any performance improvements there to the usage of faster DDR3 modules above DDR3-2000 to provide the extra bandwidth needed especially to the GPU portion. I would in fact recommend supporting as fast as DDR3-2400 CL10 memory speeds on this platform to maximise the graphics performance gains in particular. Why? Well, since the socket compatibility limits us there to just two memory channels, speeding that memory up is the only way to obtain more juice.
The same bandwidth issue would even more affect the eventual future Bulldozer based high end APUs for server and HPC markets, although those are still far, likely not before 2014 – an eternity in the computer world. Fusion approach key benefit, where it will ultimately make the graphics processor just a tightly attached coprocessors with in-line standard programming and shared main memory space, just like any other execution unit within the CPU, results in both CPU and GPU having one single main memory pool at the physical level and, if you have both a fast multi-core CPU and a many-core GPU on the same die, with a common memory management unit and memory accesses, how fast do you have to go to maximise the performance? I guess, if it was, say, a 4-module Piledriver core setup, plus an equivalent of Radeon HD7700 graphics (with double precision FP extra) on a single die, as a speculative assumption, we would need six-channel DDR3-2000 memory for a truly good performance maximising the execution resources potential.
If an efficient and large – not an AMD strength recently, though – L3 cache was in place, handling both random CPU accesses and buffering of GPU streaming loads, then four-channel DDR3-2400 could still do a fair job. However, either way, new sockets are the must for AMD to prepare for such future high end APU infrastructure upfront. Else, we get stuck like right now with AM3 generation dual-channel memory due to socket compatibility pressure.