00001 Socket 2011 Futures: A Difficult Road to Perfection

Socket 2011 will be Intel's new solution for high-end desktop, workstation and server processors. While the full potential of this platform will be unleashed only with next year's 22nm Ivy Bridge-E processors, Sandy Bridge-E is still likely to dominate the top-end desktop market till then.

LGA2011: The Biggest Socket Ever

When news of Socket 2011 first appeared over a year ago, many were quite excited. This was hardly surprising since Socket 2011 will be the new solution for high-end desktop, workstation and server processors. This behemoth of a socket is so huge it requires two retention levers, one on each side, for the very first time. Despite this, Socket 2011 will retain compatibility with LGA1366 cooling solutions. The competition didn't – and still doesn't – have anything comparable, save trying to put multiple dies onto a single chip in the G34 socket.

Why such a huge socket? Sandy Bridge-E is a huge chip: the full-flavour edition will have eight cores with 20 MB of shared L3 cache on a ring bus, a quad-channel DDR3 memory controller and two QPI links. Socket 2011 also has to ensure platform scalability beyond eight cores and 20 MB of cache. Even on the 32 nm process, all this needs lots of space, not to mention power.

The new CPUs, which had originally been expected to arrive around now (Q3 2011) have supposedly been pushed back to end 2011, with some sources even estimating early 2012. It seems unlikely that Intel would further delay Sandy Bridge-E since AMD (after all those delays) is expected to launch Bulldozer no more than seven weeks hence.

00001 Socket 2011 Futures: A Difficult Road to Perfection

Only Hexa-Core for Desktop SNB-E?

In any case, the complexity of these 'ultimate 32nm generation processors' seems to have hit the ceiling at least in one target market. Word has it that, despite all the delays, Intel is not likely to enable full 8-core functionality in LGA2011 extreme desktop processors, at least for the Sandy Bridge iteration. Apparently the initial high-end part, Core i7-3960X, will stick with a hexa-core configuration like the Westmere-based Core i7-990X. It will have 15 MB of working L3 cache (¾ of the maximum 20 MB on-die L3, since only ¾ cores are enabled), while retaining a full quad-channel DDR3 controller and 40 PCIe 3.0 lanes.

This decision would be understandable if there were initial yield issues: completely functional 8-core dies would go to the Xeon E5 dual-CPU platform lineup, where they would be immediately snapped up by workstation and server OEMs for as much as US$2000 apiece. The defective parts would end up as Extreme Edition desktop chips, which typically go for half that amount (US$1000).

Also, the need for 8 cores per socket is questionable even at the top end of the desktop market. While AMD Bulldozer desktop parts have 8 integer cores, these effectively share 4 floating-point cores. Based on the expected performance of Bulldozer and the estimated 15% to 20% performance increase from Core i7 990X to Core i7 3960X (from early leaks), it should be safe for Intel to stick with 6 core dies on the desktop for a while. Besides, the reduced power and thermal load from disabling two cores should increase overclocking headroom on the remaining cores.

Our sources confirmed during Computex that overclocking all eight cores and 20 MB cache was a challenge due to speed path issues, among others, and that the dual socket Xeon E5 variety will allow only preset Turbo Boost but no overclocking. Of course, a dual-Xeon E5 16-core beast running at 3.2+GHz (with 3.8+GHz Turbo Boost) would hardly need any overclocking.

A Note on Four-Processor Setups

A rarely reported feature of high-end Socket 2011 platforms is support for four-processor configurations. Due to the presence of two QPI links on high end CPUs, it is possible to put together four processors in a 'square' interconnect layout. In such a setup, each CPU could talk directly with its two adjacent neighbours, but would have to hop over an adjacent CPU to communicate with the CPU at the opposite corner.

Intel is expected to artificially create two differently priced SKUs from the same CPU: the Xeon E5 2xxx series that uses the two QPI channels to link two processors together at double width, and the Xeon E5 4xxx series that enables four-way setups using these same two QPI links.

It is quite possible that Socket 2011 will allow for this capability well into the Haswell generation. Socket 2011's huge pin count might even allow it to accommodate a third QPI channel for better connectivity. The other new high-end socket for the Haswell timeframe, supposedly called B3, is said to cater to the 'mid high end' segment instead of the very top, similar to Sockets 1366 and 1356.

Either way, this would result in a small overlap with some of Intel's high-end Xeon E7 'EX' MP offerings. Of course, the E5 series is aimed more at raw performance than the mainframe-style RAS features that the pricier E7 offers. And considering that there is little likelihood of a Sandy Bridge-generation 'EX' part, this overlap might be mostly avoided until the 22nm Ivy Bridge EX redesign.


The Road to Ivy Bridge-E

Desktop Socket 2011's maximum potential will most likely be reached only with 22nm Ivy Bridge-E, where, instead of just bolting on additional Ivy Bridge cores, further cache optimisation and enlargement can hopefully help to feed the CPUs. Instead, the initial Sandy Bridge-E parts will provide a platform for early adopters and, to a certain extent, a solution for the X79 Patsburg chipset rollout issue by offering different SKUs with some features initially disabled.

While the current chipset problems appear to be confined to the storage features, Intel may need to focus on enabling all possible features for such a high-end platform. In fact, I would like to see Intel integrate a proper XOR parity/ECC computation engine for hardware-assisted RAID 5 into the chipset during the time taken for the SAS RAID variant of the chipset to arrive. Such an implementation would be superior to relying on expensive CPU cores for software RAID. How much would adding the XOR engine cost on the chip level, anyway? 20 cents, perhaps?

In summary, my assessment is that Socket 2011 will likely dominate the top-level desktop platform, even with just 6-core parts initially. The combination of fast cores, large cache, very wide memory path for unmatched bandwidth, and plenty of PCIe lanes for any conceivable GPU configuration should give it comfortable headroom for at least another CPU generation in the same socket. On the other hand, we might see further stepping changes at the chipset level in the near future, and of course a 'recommendation' to upgrade to the Ivy Bridge-E generation a year later to unleash the full potential of this platform.