Intel Xeon family finally accepts the Larrabee in: Xeon Phi and its futures
At the ISC supercomputing conference in Germany, Intel finally gave the name to its MIC accelerator – the Xeon Phi. How does it stand against the competition, both the outside GPUs and its own Xeon brethren?
Intel's Many Integrated Core accelerator chip, or MIC, or Knights Corner – depends what name you refer to – is the compute offspring of the shelved Intel Larrabee high end GPU effort. The 50+ simple two-way in-order Pentium (yes, 1995 Pentium!) like cores feed the same number of 512-bit wide SIMD FP units, with the ability to deliver around 1 TFLOPs peak in double precision at around 1 GHz. That is, as long as you work out of on board 8 GB RAM that the initial pre production cards in the first deployments have. Once you cross to the PCIe connection to the mainboard, the data transfer latency bogs that down – but then, all GPUs suffer from the same ailment anyway.
Intel has now officially christened the new kid on the block as Xeon Phi – phi, or golden ratio, being the famous 1.61… math constant that, together with the more famous Pi, defines our body proportions and the overall order of nature, as the pyramid builders and Leonardo Da Vinci would have it, among others. Yes, Phi is roughly the screen proportion of 16:10 monitors too, making them far more aesthetically pleasing to look at than the somewhat disturbing 16:9 ones.
Xeon Phi has one huge advantage over the OpenCL or CUDA-bound GPU compute solutions. It's an X86 chip. So, ultimately, you can program it in line like an X86 co processor together with your ordinary Xeon chippery, without having to resort to fancy programming interfaces and such. It helps cut the learning curve manifold, according to the users and vendor alike. The ready pool of supported apps will be impressive at the official launch later this year.
That being said, Xeon Phi also has one huge dis-advantage over the OpenCL or CUDA-bound GPU compute solutions. It's an X86 chip. This means outdated, ISA baggage-loaded, less transistor-efficient and harder to scale core than the custom tailored more modern GPU architectures like AMD GCN or Nvidia Kepler. And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.
So, how does it stand performance wise? Its double precision FP throughput is the same as the typical AMD Radeon HD7970 card which costs one quarter of the amount but with much smaller memory, 3 GB, and no ECC. The FirePro W9000 'proper' workstation version with ECC memory and such will likely be much closer in price. Both AMD offerings reach 4 TFLOPs in single precision peak FP, twice that of the initial Xeon Phi.
The current Nvidia offerings based on Kepler GK104 are far behind in both SP and especially DP FP. Only the yearend GK110, with its 3 TFLOP SP and 1.5 TFLOP DP expected performance, will equalise the performance. But, by that time, AMD will have Sea Islands GPUs, and Xeon Phi might end up faster than it is now.
Phi should get that speed up sooner rather than later, as Intel's own Xeon high end CPUs will be faster in FP. The 3.3 GHz 10 core Xeon E5-2600 V2 Ivy Bridge EP will give you over 270 DP GFLOPS peak per socket around yearend, while, a year later, Xeon E5 V3 Haswell EP should bring that up to beyond 600 DP GFLOPs for a 12 core version when using the FMA optimisations. To keep themselves attractive, the Phi's should be at least 3x as fast per socket as the general purpose CPUs they augment – meaning 2 TFLOPs DP at least by 2014. And yes, as it's an X86 after all, Phi could boot Linux on its own, Phi-only clusters becoming a possibility someday.
Either way, you'll see Xeon Phi in quite a few upcoming supercomputers, including some No.1 world leader candidates this year and next. And yes, being in a typical GPU card format, it can fit in your standard high end desktop or workstation PC too.