201111071145359621 Chinese high end CPUs are now in the game   details: Part 1

Last week's report on CPUs, mentioning the Chinese new-generation entries, did raise some waves on various online forums. Here's a bit more on some of those processors.

China has now officially gone deep into the core of high end computing, way to the deepest level – designing and manufacturing its own CPUs – to complete the whole vertical stack from the processor to the application. That includes having own designs covering everything from smartphone to supercomputer, based on three main architectural families: ARM, MIPS and Alpha.



Our last week's report, and its coverage of the Chinese CPUs, has sparked quite a few online comments on various forums, from those of encouragement and seeking more diversified CPU futures, to outright dismissal of these chips as copies or inferior designs, or not having, out of all things, X86 architecture – widely regarded as the worst ever CPU architecture from a design point of view – as a 'proof of true capability'.



Well, let's take a look at the three chosen main architectures here. ARM, MIPS and Alpha are all native RISC architectures – meaning simple, symmetric, orthogonal instruction sets with only a few addressing modes and options, uniform instruction format and easy scalability to both wide cores, multi-cores and a range of speeds from low power to top performance, with much lower gate count required than any X86. Since China doesn't want to depend on Western software stack for its public and, especially, government use, it doesn't need to rely on X86 as this architecture's winning chip is software compatibility with hundreds of thousands of past applications.



So, why bother with the X86 complexities – both technical and legal – then? The internal market is more than good enough to, coupled with Linux and other open source stacks, provide complete solutions and the volumes required to justify these processors even commercially over long run.



Talking about legality: No, these are not fakes or illegal copies right now. The ARM and MIPS processors made in China are fully licensed by the relevant ARM and MIPS IP owning consortia, while the Shenwei Alpha-compatible chip is based on Digital (DEC) IP that is well over 15 years old now – quite ironic for a CPU that matches the best current X86 processors based on 2010 IP and in 2 generations later process.



MIPS – Dragon's Progeny



Loongson (Godson) is the name for the Chinese MIPS processors, developed by Institute of Computing Technology (ICT) at Beijing's Chinese Academy of Sciences, with Prof Hu Weiwu being the design leader. Prof Hu also happens to be a deputy at National People's Congress, which surely is helpful in gaining support for the overall project. For the past 9 years, the effort is run as a joint venture between the government and private enterprises through a company called BLX, a partnership between CAS and Jiangsu Zhongyi Group.



There were 3 major generations of these processors up to now, with the latest one – Loongson 3B – being an 8-core 1.05 GHz CPU, with each CPU having a 256-bit vector FP unit as well. Despite the low clock and 65 nm process, the efficient 4-way out-of-order cores and vector units with dual 256-bit FP ops per core per cycle, allow Loongson 3B to reach 16 GFLOPs per core at 1 GHz, some 130 GFLOPs peak FP rate in double precision at 1.05 GHz clock. For a comparison, the 3.3GHz Core i7 3960X with AVX would achieve some 160 GFLOPs peak in DP, while the Westmere (Core i7 990X) and Bulldozer CPUs would be at not more than two-thirds of this – Core i7 990X is at 90 GFLOPs peak, and AMD FX8150 at some 110 GFLOPs peak, all in DP. And, oh yes, the Loongson 3B achieves this performance at just 40 watts TDP, less than one third of the above competing CPUs.



Something even more interesting is that Loongson 3B has over 200 extra instructions in a separate box, which doesn't affect the main core integrity, that speed up execution of X86 software when using QEMU translator. The benefit of this, at a 5% die area cost, is running lots of X86 software at near native speeds – an approach that Alpha perfected over a decade ago with FX!32 software that enabled Alpha Windows NT to run many X86 titles at the time at high speed.



Anyway, since the core is reasonably efficient already, the next step for Loongson 3 is a 16-core version in 28 nm process, expected sometime in min 2012. The minor core improvements will be there in addition to a much higher clock rate, around 1.6 GHz, as well as larger L2 cache, greater than the current 4 MB. The 2 x 64 KB per core L1 caches are expected to stay on.



What about the software? Several major Linux distros do run – including Debian, Gentoo, Mandriva and China's own Red Flag. The BSD OS ports are done quite a while ago, as well as Windows CE port. Since there are quite a few consumer devices based on the previous Loongson / Godson processors, who knows, one day we may even see Android and Windows 8 ports, although there doesn't seem to be much pressure felt on the Chinese about it.



In the second part, we cover Alpha, ARM and China's own instruction set attempts.