Interconnection dilemma: should Socket 2011 successor get one more QPI to spare?
The full-blooded Socket 2011 CPUs have two QPI links on each socket, to connect either doubly to another CPU, or each connect to one neighbouring processor. With so many pins available, is there use for, say, one more QPI channel? Yes!
As mentioned here before, the dual-QPI Socket 2011 CPUs will have two flavours: the dual-socket Xeon E5 2600 one will use the two QPI channels in parallel to link the two processors at double bandwidth, while the otherwise identical quad-socket E5 4600 series will allow each of the two channels to connect to a different neighbouring processor, for a total of 4 processors in a square layout config.
You notice here that, with only two QPI channels, there will be no direct connections between the processors in opposite corners of the quad CPU configuration, so any information between will have to pass through the in-between CPUs, resulting in both added latency – up to 50 ns extra – as well as the risk of congesting the QPI lanes if there is a lot of inter-processor traffic. The Xeon E7 series in the Socket 1567, which in the current 10-core Westmere-EX generation has 4 QPI links per chip, doesn't have that problem even in 8-socket configurations. However, it has much higher memory latency and somewhat lower memory bandwidth per CPU due to its use of memory bridge/hub chips between the CPU and DDR3 channels to maximise the capacity. On the other hand, the E7 has – besides much higher price – also more RAS features not required for workstations or supercomputing, but critical for servers.
So, if wanting a good quad-socket system based on Intel stuff in mid-2012, the user will have to decide between the E5 with fast cores and good memory bandwidth with low latency, but potentially lacking sufficient interprocessor bandwidth for heavy duty SMP jobs; and the E7 with more and slower cores and somewhat slower memory system, but darn good interprocessor bandwidth and more RAS features.
I haven't yet seen the Xeon E5 Socket 2011 dual-QPI processor pin-out, but it was bugging me a bit whether there's enough pin count left there non-connected – if any, remember power and ground pin counts go into many hundreds these days easily – to fit just one, one more QPI channel in some successor CPU.
It actually may not matter, since a Haswell-based midrange Xeon would probably have yet another new socket, but again, since Haswell desktop parts will still use DDR3 memory, and Haswell server & workstation parts may continue to do the same rather than move to DDR4 yet, the justification to keep the Socket 2011 may still be there.
Either way, the addition of third QPI channel would accomplish two good things: first, for the midrange quad-socket parts, it would enable full proper direct point-to-point connectivity between all four processors, ensuring good interprocessor bandwidth no matter what. Then, for the tightly coupled dual-socket systems in workstation and HPC use, where heavy inter-core communication is likely with new generation finely multithreaded software, an option of all three QPI channels connecting the two CPUs together may be put to good use.
Think it's an overkill? Think again – even Ivy Bridge EP Xeon E5 series chips will have 10 cores, and the Haswell ones will obviously have no less than that. A total of 20 cores wanting to talk to each other a lot, and frequently, will use all those channels, especially if you also have a multi-GPU workstation or HPC node where GPUs attached to either CPU communicate across the QPI between themselves.
There's actually one more important usage model potential, even in those dual-socket systems. This third benefit is obvious – Intel is likely to have a QPI-based successor to its Knights Corner accelerator processor someday soon, for true coherent shared memory processing between the multi-core CPU and many-core accelerator. The accelerator would then be able to address all the CPU's system memory directly and at low latency as if it is another CPU then, assuming it has a compatible memory management hardware, of course.
In this case, the third QPI link would be used for that purpose, so each CPU would have one direct QPI connection to the accelerator, in the same fashion as Xeon 5500 and 5600 dual-socket 1366 processors have separate parallel, direct QPI links to the north bridge. At the same time, the other two QPI links would still link the two CPUs in parallel at high bandwidth.
In the end, such addition wouldn't impair the sales of the top-end enterprise E7 CPUs, as those have their commercial market anyway, and, well, they could also have even more QPI channels in the future if need be… but, for the workstations and supercomputing nodes using mostly the E5 series chips and their successors, it may be beneficial, as we have bseen here.