Intel Sandy Bridge-E Core i7-3960X Review: High End CPU roundup
A bit about CPUs
As for AMD, the older cream-of-the-crop CPU carries the title Phenom II X6 1100T. It’s a hexa-core CPU made in the “old” 45 nm production process. It has 3 MB of L2 and 6 MB of L3 cache, works at a default clock of 3.3 GHz, and reaches up to 3.7 GHz in Turbo Core mode. It is also the cheapest of all models in this roundup, which makes it an attractive buy. The second AMD product is, you’ve guessed it, FX-8150, which again offers the most cores for the given amount of money. AMD FX-8150 has the Bulldozer chip, made in the new 32 nm lithography, and working at 3.6 GHz by default and 4.2 GHz in TurboCore mode. Speaking of cache memory, it should be said that, due to the significant number of cores FX-8150 has, it had to have 16 MB of cache. This cache is evenly distributed between L2 and L3 – 8 MB each. The specificity of its architecture is a very controversial subject; although it looks fantastic “on paper”, the general impression of Bulldozer isn’t too good, and that view is shared by media and disappointed customers alike. True, the world may not be ready for AMD’s fledgling just yet, and things are bound to go up as time passes, but that doesn’t mean all that much at the present moment, unfortunately.
On the other side of the ring, Intel has seen little but praise ever since the first Nehalem came out. Sandy Bridge reinforced their position as the provider of successful and popular architectures. We picked the very expensive Core i7 990X Extreme Edition as the old model. This hexa-core Nehalem had a price of over 1000€ when in first came out, and it hasn’t reduced all that much up to today. With the aid of hyperthreading, the total number of parallel threads it takes is 12. The default clock of this CPU is 3.46 GHz, and Turbo Boost stretches this number up to 3.73 GHz. Unlike AMD, the oldest model on Intel’s side is made in 32 nm. The large cache of 12 MB, called Smart Cache by Intel, provides all six cores with ample breathing room.
Next up is Core i7 2700K, which is a stronger version of the enormously popular 2600K, one of the best CPUs in history in terms of the price/performance ratio for anyone looking to build a really strong system. It’s based on Sandy Bridge architecture, and used to be the strongest model based on this core until recently, but it was still only a quad-core CPU. After this CPU generation came out, it became clear that Intel wasn’t planning to stick around with the then-current socket for too long, as the new Extreme series required a new one. This is why the LGA2011 platform was created, in order to round up the product gamma and create clear segmentation in Intel’s lines.
Intel Core i7 3960X
As a reminder, the original Sandy Bridge CPUs have four cores each, with i5 revisions containing 6 MB of L3 cache, while i7 have the whole 8 MB as well as hyperthreading. So far, the desktop segment’s cream of the crop has been Intel’s undisputed i7 990X, based on the old Nehalem microarchitecture codenamed Gulftown. The time is nigh for a generation shift, and Intel’s socket strategy in this iteration gets a new CPU platform. Intel’s new beast is using the brand new LGA 2011 socket.
This basically means that we now have two Sandy Bridge platforms: LGA 1155, using a dual-channel memory controller from inside the CPU, and LGA 2011, which has double that bandwidth due to a quad-channel controller. In other words, if you want maximum performance, make sure you place four modules in your motherboard. This sort of memory bandwidth is mostly helpful with the execution of HPC program code, as this sort of applications are replete with vector calculations and are extremely memory-intensive in terms of bandwidth for the most part.
With a large number of cores, narrow memory bandwidth may easily present a bottleneck, which makes quad-channel access the logical option. Sandy Bridge E, which means the entire LGA 2011 platform, is actually a server platform adjusted to desktop standards. In its very essence, Sandy Bridge E is Intel’s Xeon E5, raised to a much higher frequency, and with two of its eight cores shut off. Each Sandy Bridge E CPU has the cores, L3 cache and the GPU connected to the ring bus. This series’ L3 cache is split into 2.5 MB segments – a single ring port contains one core with 2.5 MB of L3 cache, which amounts to 20 MB of L3 cache for a total of eight cores. Shutting off two cores has consequently provided a CPU with six cores and 15 MB of L3 cache. The standard desktop version uses 2 MB chunks (2600K has 4 x 2 MB of L3 cache).
This new i7 3960X is a top model, with a price of around 1000$ on the US market. Its brother, i7 3930K, has an unlocked multiplier just as well, but a smaller L3 cache of “only” 12 MB; needless to say, its price is much lower, almost half that of the stronger version. There’s also a quad-core i7 3820 in the making, with 10 MB L3 cache, but its price remains a mystery for now, while its performance is expected to be somewhat better than the current LGA 1155 model 2700K.
Core i7 3960X has a maximum Turbo Boost frequency of 3.9 GHz, just like i7 2700K. However, its default clock is 3.3 GHz for all cores, which is 300 MHz more compared to the default clock of i7 2700K. For comparison’s sake, i7 980X, also “ticking” at 3.3 GHz, or more precisely, 3.33 GHz, has a Turbo Boost frequency higher by only 133 MHz, so it works at 3.33-3.46 GHz in most cases.
Conceptually, this is an improved Nehalem/Westmere, but the microarchitecture itself has a few very important changes that have taken place, impacting both consumption and performance. Here are the most important ones:
- Decoded microoperations cache (similar to trace cache in Netburst architecture)
- Physical register file
- Double 128-bit load
- Increased buffer and on-the-fly instruction size – better Out Of Order execution
- Improved memory controller that officially supports DDR3-1600 memory, here as quad-channel
- 256-bit ring bus architecture connecting L3 cache, the cores and the GPU system controller
- Turbo Boost 2.0
GPU/IGP has also been left out in Sandy Bridge E, as it was initially designed for server platforms.
As for other features, everything that you know about the existing gamma of LGA 1155 CPUs applies here as well. These improvements were made in order to additionally speed up execution of traditional code without recompiling.
The new product is also complemented with the new set of SIMD instructions – AVX. This is short of Advanced Vector Extensions. This isn’t an extension of the previous SSE instruction set, though; it’s an entirely new format altogether. AVX uses extended 256-bit registers to work with 256-bit vectors. In applications that use AVX, a performance improvement of up to 50% can be observed compared to standard code.
The X79 Express chipset doesn’t contain anything essentially new, except for the larger number of PCI Express lines and PCI-E 3.0 support, perhaps. Surprisingly enough, the new chipset doesn’t support USB 3.0, although the technology has been around for a while. The chipset isn’t exactly reputed for coolness, so active cooling has to be implemented. For instance, ASUS’ Rampage IV Extreme motherboard, the one that was used for testing purposes, has a fan on the PCH. PCH, or Platform Controller Hub, is the new name for what was previously known as the Northbridge.
X79 chipset’s internal codename is Patsburg. Unlike the X58 chipset, X79 isn’t using QPI links for CPU communication, instead relying on DMI2 interconnection. DMI (Direct Media Interface) is used to connect the CPU to the PCH, and essentially works as a PCI Express controller. X79’s DMI2 uses exactly the same communication protocol as the PCI Express bus, with four lines supporting speeds of up to 5 GT/s each. This is more than enough for any multi-GPU system to communicate with the RAM.
Similarly to QPI and X58, X79 also has an I/O bus used by peripherals, PCI-E and the rest of the system to communicate with the memory, with the difference being only in the fact that the PCI Express interface is now closer to the CPU core. The memory controller is situated inside the CPU itself, so all communication takes place indirectly through it. The main difference in system architecture between Nehalem and Sandy Bridge CPUs is that the DMI controller is integrated into the CPU in Sandy Bridge, while X58 and Nehalem had DMI located in a separate chip; as a result, X79 only has a single chip, while X58 has two: X58 IOH and the ICH10R Southbridge.
BCLK overclocking is limited, just like with P67 and Z68 chipsets. LGA 1155 CPUs with Sandy Bridge architecture have brought integrated clock generator inside the chipset. Since there’s no external clock generator anymore, raising the BCLK value also raises the frequency of all other components in the system, such as PCI-E and SATA links. This makes it much more difficult to do the familiar BCLK-tied overclocking. However, K-suffix CPUs have an unlocked multiplier, and the same goes for the tested Extreme Edition marked with X, which makes overclocking that much simpler.
At factory-default settings, i7 3960X uses much more power than its previous-generation counterpart, i7 980X, i.e. i7 990X, and the consumption measured was higher than AMD’s flagship FX-8150 CPU too. In overclocked mode, the new CPU fares a bit better. Therefore, from the consumption aspect, Sandy Bridge E is definitely not an eco-friendly solution. Performance is around 16% better clock-for-clock compared to i7 980X, which is a respectable advancement. Certain applications give Core i7 3960X an advantage higher than 200% compared to AMD’s FX-8150, based on Bulldozer architecture, and around 40-45% compared to i7 980X.