P1010937 Mainstream desktop CPUs future evolution    more performance or just more integration?

Right now, even after AMD's Bulldozer roll-out, Intel still seems to have quite clear processor performance dominance on the desktop for some time to come. So, since the refinements to the existing processes, and the new semiconductor process nodes will bring the capability to improvements, how to use them? Performance, power or price (read: integration)?

The LGA1155 mainstream desktop platform already has a decent level of integration: 16 PCIe lanes are there on the CPU die already (in fact, there are also 4 extra PCIe lanes on the Xeon versions of LGA1155 Sandy Bridges, a rarely observed fact!), and of course the graphics is there too for those who want to use it. The IO hub is relegated just to that, a bunch of general I/O interfaces connected via dedicated DMI2 lanes, basically PCIe v2, to the CPU. So, the total is a two chip solution.

Now, the March-timeframe Ivy Bridge desktop refresh can't integrate more despite the 22 nm process migration benefit, since there are limitations due to the the LGA1155 plug-in socket compatibility. Simply, the focus is on – to some extent – performance enhancements like 20%++ per-core speed jump after counting the slight IPC improvements and frequency gains, as well as at least double the graphics performance. But, to a much greater extent, the focus will be on reduced power consumption, so that both the desktops, and especially laptops and ultrabooks, can gain from extra power savings for a given performance level.

So, when extra integration isn't possible due to the socket compatibility limitation, there will be some performance focus, but more power saving focus, in the Ivy Bridge generation some six months from now. Watch for some really lovely TDP figures there.

After that, Haswell will keep the same 22 nm process in early 2013, of course with some refinements as Intel tunes up the new process over time. However, mainstream Haswell will have new sockets – LGA1150 for desktops and rPGA947 & BGA1364 for the mobile market - so the door is open for more integration on-die this time. At the same time, if AMD's "Piledriver" and other coming cores don't improve the performance dramatically, Intel is bound to have massive performance advantage by that time still. So, what to focus on in Haswell?

As of now, if you rate the advancement in each category by number of +++++ from 1 to 5, the deep throats say Haswell will focus on Performance (++), Integration (+++) and Power as (++++)! Basically, the performance improvements are expected to be limited to increased instruction per cycle parallelism with new execution ports and better prefetching & branch handling for per-thread performance improvement, as well as AVX2 improvements like fused multiply add (which doesn't always speed up things) with doubled cache bandwidth to feed all that. This by itself is very good and could lead to another above 20% same-clock per-core performance jump even in many current apps, but is still less compared to the effort put in the other two departments. After all, the other key related performance aspects – like the L3 cache size of 8 MB, and the dual-channel DDR3-1600++, are expected to stay the same.

Shark Bay15 Mainstream desktop CPUs future evolution    more performance or just more integration?

On the integration front, there are system level enhancements that would help board and system designers get things done faster and better, and should I say cheaper, the magical word for the uber-thrifty Taiwanese. The voltage regulators are to be fully integrated in the CPU, greatly assisting the power design, not just the overclockers. Also, the display ports are all right there on the CPU die, no need to route through outside IO hubs or such. These two changes may look minor, but actually they help quite a bit in the system design, even reducing the required board area in the case of integrated VR, or simplifying the high speed video routing as HDMI and DisplayPort come out directly from the CPU to the outside connectors, only VGA being routed via Lynx Point chipset. The latter now obviously becomes even simpler, leaving the die area to implement more USB3, SATA3 and such interfaces, including built-in Ethernet. Here, of course, extra integration means also lower costs, therefore lower end user price.

The most interesting development, though, is on the power consumption front here. As Intel mentioned during the past IDF keynotes, the idle power consumption on Haswell will be reduced by an order of magnitude. That doesn't benefit tablets or Ultrabooks only – it is very useful for desktops as well, which are often left running when hundreds of millions of company employees worldwide go out for their lunch breaks, for instance.

This alone would be untold gigawatts of power saved at the global level. And yes, you could have an iPad-sized fully PC-capable tablet too. And, there is more: the converged platform power management, or Intel Power Optimizer, now includes new even deeper power states for further savings and rapid transitions between them due to that integrated voltage regulator.

The massive reduction in idle power, accompanied also by noticeable reduction in full-load power for a given performance level, will help Haswell also be a very very 'green' CPU, something not usually expected from X86 processors with their awful baggage from 30 years of backward compatibility to what was, even in the '80s, considered the most awkward CPU architecture around – compared to the Motorola 68K of the day, which powered Steve Job's first Mac then, remember?

Another important thing, related to this, is that as the AVX instruction extensions with their RISC-like 3-address instruction format and much more elegant programming are expanded to integer in the Haswell generation, the next generation X86 code with AVX can in fact look less and less like X86 code, and more and more like elegant RISC code, the way PowerPC, Alpha and MIPS did it all those years ago? This would be a very smooth way for Intel (and AMD) to gradually de-emphasise the X86 baggage portion and use the extended AVX as a base of a new 'X86' instruction base, using those SSE registers as the main pool.

Why is that so important? It's not just performance or design simplicity, but also power saving. Look at how well the Chinese RISC CPUs, based on MIPS and Alpha respectively, perform so well even when 2 process generations behind Intel or AMD. They run at similar performance at least in FP, and, if you look at their recent supercomputers, consume far less power than their X86 based US counterparts, partly because their CPUs save millions of gates by not having the X86 oddities and complexities. A future X86 PC rehash where AVX , not old X86, is the base, and X86 is left to some 'compatibility box' aside, would provide the next much needed power, performance, and parallelism scaling push for the mainstream PC CPUs.