deeddio Intels Direct I/O in Xeon E5   Workstations and servers today, gaming systems tomorrow?

The new generation dual Socket 2011 Xeon E5 platform has one rarely mentioned, but important, capability – receiving the I/O data from, say, storage or network, directly into the CPU cache, rather than going to and from the main memory. Direct I/O (DDIO) could be used even beyond workstations and servers, though… how about low-latency high end gameplay?

DDIO idea was obviously coming from the server world, where thousands of network requests, or data from storage arrays, could be coming in and out of the server every second. In the old days, the on chip caches were limited in capacity, and priority had to be given to the local code and data used by the CPU.

However, the new generation high end processors have large caches – 20 MB L3 in the case of Xeon E5 parts – and a lot more could fit in there. So, there could be alternative to the old process of sending the arriving I/O data that needs to be used first to the main memory, and then retrieving it from that main memory right away. These two steps alone could waste up to a microsecond per each transfer – and this adds up to quite a bit if you look at thousands of transactions every second.

So, in DDIO, at least with Intel I/O adapters and new E5's, the adapter will send the requested data directly into the CPU L3 cache, for the immediate use by the processor, as you can see on this illustration:

inbound Intels Direct I/O in Xeon E5   Workstations and servers today, gaming systems tomorrow?

outbound Intels Direct I/O in Xeon E5   Workstations and servers today, gaming systems tomorrow?
 

Compared to it, a similar classic approach would take at least two to three memory trips for each write or read operation, as you can see on the comparison picture above for the other way around, sending data to the I/O controller.

Basically, data access ops associated with creating the I/O packet are satisfied from within the cache, avoiding the cache misses occurence for the software running on the CPU and the related main memory data fetching, which of course consumes time. Then, the I/O controller read request is done from that same cache, without evicting that data and causing possible cache misses if the software re-uses it.

Now, all that is great and fine for the servers and workstations, especially low latency dependent apps like high frequency trading and all general Net server usage models. The 10% to 20% latency reduction that can be obtained could also find its use in our home PCs too – how about massive multiplayer gaming scenarios? A combination of reduced network I/O latency, and storage I/O latency from dedicated PCIe SSD controllers, for instance, could be noticeable in a very intensive online gaming setup.

The good thing is, DDIO is completely transparent, needing no software enablement as long as compliant I/O adapter ICs are used. Since the Core i7 39xx parts come from the same dies as Xeon E5 series, Intel could enable DDIO on them as well – it'd be a good extra capability to keep their market niche in the face of arriving mainstream quad core Ivy Bridge processors. How about that?