itr MIC image 920x460 Evolution: Intel Haswell high end offerings to up the QPI speed and use it to join with MIC multi teraflop chip

So, Intel's mainstream desktop CPUs don't have QPI links anymore? Well, the high end ones certainly do, and will have more… Intel is evolving the QPI further, speeding it up and adding more fun chips to it, using the co-processor model akin to the old 80×87 FPU days

Intel QuickPath Interconnect between the CPUs, or QPI, is critical to linking multiple CPUs tightly together in a low latency, very high bandwidth, cache coherent manner to enable smooth scaling of applications across two or more sockets, and sharing the memory between them as one – even though, in effects, it is a NUMA system at the end. It is similar to AMD HyperTransport, as both are evolved from Alpha EV7 interconnect which had it all some 12 years ago, before the world's best CPU was brutally murdered by the aspiring HP failed-CEO to be politician Carly Fiorina and her best 'friend' at the time, Compaq CEO Michael Capellas. So, the technology casualty needed to castrate Compaq to make it edible by HP of the time, set back the high end computing by a decade, what to do…

Back to QPI; unlike HyperTransport, Intel didn't bring its interconnect much to the open market visibility or make an attempt to standardise it – rather, it was an internal high end interconnect seen on top desktop CPUs of the Nehalem and Westmere ere, as well as on all new Sandy Bridge and onwards Xeons. The Sandy Bridge and onwards Core i7 single socket CPUs don't use QPI anymore, though – does it mean Intel is quietly giving it up, one may ask?

Not at all – in fact, Intel is pushing the QPI speed further, from the initial 6.4 Gbit/s per pin, or 25.6 GByte/s per bidirectional 16-bit link, to 8 Gbit/s per pin, or 32 GByte/s per bidirectional 16-bit link in the new Xeon E5 offerings you see out now, therefore outpacing the HyperTransport 3.1, which still runs at the speed same as the original QPI. Their Ivy Bridge EP and, later, Ivy Bridge EX (there is no Sandy Bridge EX, keep in mind) follow-ons will run QPI at that same speed as well.
 
Now, Intel's going a step further in the Haswell EP and Haswell EX generation: the QPI will be sped up to 9.6 Gbit/s per pin, or 38.4 GByte/s per bidirectional 16-bit link, and also there will be more than just CPU attached to it. While the Haswell EP is expected to have two QPI links just like the current Xeon E5 it will replace, the Haswell EX will have three QPI links, not four like the current Westmere EX, since the focus seems to be four-socket configurations where 3 QPI links are enough. Now, that's 77 GB/s and 115 GB/s of total interprocessor bandwidth per socket respectively, not bad at all! New features like 'directory cache' will also help handle MP transactions quicker over QPI then.
 
What's interesting is that, in that timeframe, the MIC or the followon to the PCIe based 'Larrabee' compute accelerator derivative, should be running on that same QPI as the connection to the CPUs. In a sense, you'll have a multi-teraflop co-processor sharing the main memory with all the other CPUs as his own, yet still having a huge local memory, maybe 16 GB or more, at very high bandwidth just like the GPUs do. Of course, it'd have two QPI links at least as to connect to at least two CPUs simultaneously for faster memory access to get those 2+ teraflops actually used.
 
So, Intel is going 'back to the future' with the co-processor model seen last some 25 years ago in the i386 + i387 pair, but now with SIMD or vector-type co-processing being on offer in the 2014 timeframe. Would make for some very very interesting workstations and HPC server monster, methinks… but Intel, please don't be thrifty with QPI channels – let's add more of them, so, why not, AMD GPUs or third party accelerators can use it too, and be the extra co-processors to the main CPUs for many other uses, too
 
IMG 3174 Evolution: Intel Haswell high end offerings to up the QPI speed and use it to join with MIC multi teraflop chip