haswell Intels Haswell to push the desktop PC CPU packaging frontiers

Got used to the simple packaging migration from LGA1156 to LGA1155? Intel's next generation – Haswell – is widely expected to up the ante in the PC CPU packaging excellence, and offer a spread of choices

For a typical mainstream or mid-high end desktop user, Intel's CPU packaging policy was pretty clear: the LGA1156 socket for Nehalem and Westmere based quad-core processors was replaced by LGA1155 socket for the Sandy Bridge and Ivy Bridge generation, and then the LGA1150 socket (another five pins or, more correctly, holes, gone) will come for the Haswell and Broadwell processors in 2013.

So, every microarchitecture has its own socket and the associated chipset and board platform that, with the BIOS updates and reasonable advance design fixes allowing easy upgrades, should also span the extra process shrink within each microarchitecture – about a 2 year lifespan.

While keeping one package and socket standard for its microarchitecture, there's much more to Haswell than just the outside package. Namely, Haswell will have more than one die variant, as discussed here before. For the desktop, the usual dual core and quad core die choices are obvious, just like they were on the Sandy / Ivy Bridge. However, there is the third and the most exciting die choice here, which we also covered: the very high end quad core plus GT3 graphics with an additional L4 cache die, very likely an eDRAM large cache (16+ MB) connected via a separate backside bus to the main CPU die.

The purpose of this cache wouldn't be just minimising the bandwidth contention between the CPU and GPU, both more powerful than before here, for the usual dual channel DDR3 memory – of course, that is still a very important factor, since even DDR3-2500 dual channel can't easily satisfy both CPU and GPU together. That cache, after some thinking, could also deliver the extra sustained bandwidth for the AVX-enabled high throughput floating point, with extra FMA (Fused Multiply Add) that comes in Haswell and doubles the peak FP execution rate. To truly double the actual obtained FP rate, the extra memory bandwidth is needed and, in absence of higher bandwidth main memory, this large cache would be VERY helpful to achieve this goal.

Back to the packaging: fitting all this in one socket means going back to the future and the MCM multi chip module approach seen in Pentium Pro far in 1995, then in Presler P4 and, of course, initial Westmere entry level CPUs: here, the CPU and L4 cache dies would be on a single high speed substrate allowing half clocked or possibly even same clocked external L4 cache, but the outside world will not see the L4 cache, only the usual CPU stuff, just like on other Haswell CPUs. No big problems here, since the power savings from the 22 nm tri-gate process should easily leave plenty of TDP headroom for the L4 cache to be accomodated, without any clock speed sacrifices necessary by the processor.

Do keep in mind MCMs are always seen as a sort of high end 'technology frontier' here, and such 'packaging excellence' will add a bit of extra price. Now, imagine how many choices would there be on the mobile side of the story?