Intel Xeon Phi

Sometimes in the next 18 months, Intel will power the world's most powerful supercomputer. Consisted out of approximately 100,000 Ivy Bridge-EP based Xeon E5's and 100,000 Xeon Phi boards (Knights Landing), the Chinese supercomputer should be the world's most powerful, reaching no less than 100 PFLOPS of compute power.

The target for the Chinese supercomputer is not a secret – even with the limitations of Amdahl's Law, as well as incurred latencies which reduce the peak compute power, the supercomputer should reach 100 PFLOPS, approximately 10 times faster than the fastest supercomputer in 2011 and five times faster than the current fastest supercomputer.

This project is heavily backed by the Chinese government and Ministry of Sciences, viewed as the system needed to aid the China space exploration, as well as rising the health research to a new degree. The government sees that within the next couple of decades, it will have to start battling the older population, predict where to build cities, calculate the highway system (by 2020, China plans to build more new highways than the current total in the United States, for example – and have twice as much highways than U.S. by 2030), public transportation system as well as wild projects such as intelligent license plates, real time traffic calculation to suggest preferred speed to reduce stoppages on the roads etc.

However, the dark cloud in otherwise a fantastic design win is one of reasons why Intel got chosen in the first place. Anyone who had to compete against the 800-pound gorilla nicknamed "Chipzilla" knows that the company will do anything to win the deal, and if the deal has a budget that would not pass the mustard with the accounting, there are special budgets (marketing, academic, research etc.) where those projects can be placed and ultimately, won.

Thus, we were not surprised to hear that the world's largest (public) installation of Xeons and Xeon Phis will be located in Southwest China. What we were surprised to hear was the overall budget for the system. This project was budgeted with around 100 million dollars for the processing power, which is a large sum indeed. However, given that Intel is placing almost the top-end Ivy Bridge-EP processors and top-of-the-line Xeon Phis, the budget starts to look small. If we take a look at commercial terms, a Xeon E5-2567W v2 should set you back (note: this is a future part, pricing was an estimate based on current line-up) for $2,500. Top end Xeon Phi, the 5110P with 8GB GDDR5 memory should set you back between $1,500 and $2,500.

If we take that into calculation, 100,000 Xeon E5 should retail for $250 million.  Xeon Phi would add additional $250-350 million, bringing a total for just the processing silicon north of half a billion dollars. This excludes motherboards, memory, storage, enclosure, cooling and just about anything else.

Following the TACC deal, where Intel sold the Xeon Phi boards for mediocre $400, the calculation for this deal is even worse. According to sources in the know, the processors are going for less than $300, while Xeon Phis will also fit the sub $300 bracket. This deal will net less than $250 per piece of silicon. When you look at the pricing model, it is clear as day that Intel won't even cover the manufacturing cost – but this is a prestige deal, after all. Also, they will earn some nice dollars off the special warranty arrangements.

When it comes to the memory used with this beast, we're talking about several million DIMMs (each CPU will be outfitted between 16 and 24 DIMMs), which is more memory than some memory manufacturers make in year. We're talking about Registered ECC DDR3L memory, and at retail value, we're talking over quarter billion dollars. At present time, we do not know how much the memory setup will go for, but one thing is certain – this will be the fastest supercomputer for some time.

Naturally, there's only progress and the 100PFLOPS machine won't stay on top for too long. Back on GPU Technology Conference 2009, we talked with researches from Australia involved with building the 1EFLOPS (Exascale Computing, 1000 PFLOPS or 1,000,000 TFLOPS, or 1,000,000,000 GFLOPS – the magic "billion billion" floating point operations per second) supercomputer which will accompany the Square-Kilometer-Array (SKA) radio telescope, which is set to go into operation in the next couple of years. Last time we checked on that project, it was based on the architecture which succeeds Maxwell, combining the ARM and GPU cores on a single die. The projected cost of such system was set in the region of billion dollars just for the compute parts. From our conversations with the scientists, Nvidia had the pole position in this setup, but if Intel offers $100 per Xeon Phi board (Tesla K10 can be bought for $1500 in quantities larger than 10,000 units), we wonder at what point we can start discussing uncompetitive practices and price dumping… for the record, the EFLOPS system is planned to use at least 250,000 GPGPU boards (initial deployment, update with as much as 250,000 additional units 5 years after the initial launch of the exacomputer).

If we leave financials aside, the Chinese supercomputer will be a majestic rig indeed, with over 200,000 pieces of silicon processing weather simulations, calculating Chinese space projects (manned missons to the Moon by 2020, Mars by 2025), and many more.

However, one thing is clear – Xeon Phi is not going anywhere and Intel's sales team will fight with every mean possible to increase the market share, even if that means giving the parts completely for free, which is what they did with an unnamed Wall Street financial institution. The only problem with that deal was that the analyst in place accidentally dropped that news bomb.

Thus, the sale strategy is simple – offer simpler programming model than GPUs, offer unbeatable price and then start charging after the competition is gone.