Intel held a full day event last week to lay out its strategy and products for growth in the datacenter. Senior executives took the opportunity to talk about CPUs, ASICs, FPGAs, memory, and networking, while sprinkling in a healthy dose of real-world customer success stories. Investors were hoping that this event would shed some light on how the company plans to compete with NVIDIA for AI and with AMD for datacenter GPUs. In addition to a few product announcements, we learned more about the company’s AI strategy (which I initially outlined here back in May after its inaugural AI DevCon event). My colleague Patrick Moorhead, President of Moor Insights & Strategy, covered the broader data center topics here in this blog, but here’s my take on the AI side of things.
What’s new for Intel AI?
Naveen Rao, Intel’s SVP for AI, articulated the company’s strategy for AI: to provide the full range of general purpose and specialized devices for AI, supported by a unified suite of optimizing development software. As Mr. Rao correctly pointed out, running AI apps is not a market where one size fits all, and Intel has a broad range of performance, latency, and power envelopes tailored for a wide variety of AI processing. This portfolio includes Xeon for the data center, Movidius for embedded vision, MobileEye for automotive, and Altera FPGAs for edge and datacenter inference. The missing piece of the puzzle remains the Nervana ASIC for training, which should arrive next year.
Intel started out by sharing its internal analysis of its current AI business, saying it sold an estimated $1 billion into AI in 2017, in the datacenter. It is not clear what that counts (inference on AWS, CPUs attached to NVIDIA GPUs, etc.) but it sounds reasonable to me. Intel also updated the projected market size (TAM) for AI silicon to $10 billion in 2022 (up from their previous estimates of $8 billion). The company also adjusted its TAM estimate for all datacenter silicon (servers, accelerators, memory, networking, and storage) to $200 billion by 2022 (up from $160 billion). That’s a big jump and it reflects Intel’s bullishness on these new segments’ growth rates.
Intel announced at the event that it will release the Cascade Lake Xeon update for AI this fall. The update adds a feature called “DL Boost” to its AVX512 vector processor, enabling support for int8 (8-bit integer) math operations. This will help inference processing by a factor of 11, according to the company. In another update due in 2019, called Cooper Lake, DL Boost will get bfloat16 data and AVX512 instructions to help training performance. Bfloat16, from Google TensorFlow, is simply a float32 whose mantissa is truncated to 7-bits, keeping 8 bits for the exponent. This means these numbers have the same dynamic range as 32-bit floating point numbers, unlike IEEE’s float16. Clearly, Intel is trying to keep Google happy by supporting this fast-emerging requirement.
The addition of bfloat16 will help those who use CPUs for training neural networks, a process which demands larger memory size than what’s available on CPUs and TPUs (albeit today that’s a pretty small market). Intel’s Nervana’s Neural Network Processor, called the NNP L-2000, will go after the larger training market when it launches in 2019. This may be Intel’s first production chip to challenge the NVIDIA GPUs, which are currently the gold standard for datacenter training. The stakes are high, and Intel needs to get this chip right after shelving the highly anticipated first-generation Nervana product.
Figure 1: Intel’s long-awaited Nervana AI engine will supposedly be 3-4 times the speed of the unreleased 1st generation device, and will support the Nervana Fabric for scaling to at least 8 accelerator nodes. INTEL
The new Nervana news, not mentioned in the presentation, is the fabric depicted in Figure 1. To my knowledge, this is the first time Intel has disclosed that it will indeed productize the Nervana fabric. This may take the form of a set of PCIe form factor boards that enable a high level of scaling between interconnected Nervana chips. Yes, Intel could enable partners to build this board, and may do so. However, these are probably not simple PCBs, and the designs will require sufficient signal integrity to handle the high data rates the L-2000 will probably drive. Going through an OEM/ODM channel would also delay time to market. These are the same drivers behind NVIDIA’s push into selling DGX AI servers directly, where its boxes compete with its partners—this drives higher gross profit than selling chips.
As for performance, Intel disclosed that the L-2000 would be “3-4 times faster” than the initial design that is out in the field now as an unannounced trial platform for hyperscale customers. If you take that at face value, it means Nervana could presumably be 4-7 times faster than the NVIDIA Volta. Keep in mind though that NVIDIA is almost certainly planning to beat Nervana to market with a Volta follow-on. I expect that the unannounced NVIDIA chip will include more ASIC-like AI functions—much like the TensorCores that surprised everyone (including Nervana, I suspect). After all, moving to 7nm would give NVIDIA lots of new die area to play with!
As for high-performance inference beyond that which Xeon can crank out, Rao disclosed that Intel plans on producing an inference version of the Nervana engine used in the L-2000. This makes a lot of sense since an inference engine usually does not need expensive floating point and high bandwidth memory. However, Rao did not disclose any other information regarding this product.
Intel continues to advance its hardware and software inference capabilities while going back to the drawing board for Nervana training chips. By building out its AI software portfolio, the company has a good chance to ramp quickly and compete with NVIDIA once Nervana hits the market. Enhancements to Xeon for inference processing should help keep Intel ahead of AMD in that market since AMD has yet to announce any AI features for its 7nm EPYC server processor expected next year.