Intel Shows Off Its AI Chips And Chops

Intel held its inaugural Artificial Intelligence (AI) Developers Conference in San Francisco on May 23-24th, presenting its leadership, technologies, and customers to a capacity audience of some 800 AI geeks and media. The company now has a rich portfolio of AI technologies, after acquiring Movidius and MobileEye for real-time processing, Altera for reprogrammable FPGA acceleration hardware, and Nervana for the training workloads currently served by NVIDIA GPUs. Intel’s primary focus on inference processing for the production use of trained neural networks is a sound strategy, as inference is likely to become a much larger market that the training segment over the next few years. While Intel does not yet have a brawny ASIC in its portfolio to build AI networks, it can create a sizable position in inference, alongside companies such as Apple , Qualcomm , Xilinx, and NVIDIA.

That being said, Intel has not abandoned the AI training market, where NVIDIA is enjoying tremendous success with a $3B run rate. Intel stressed Xeon’s strengths in training at the event, while pointing to the future where it hopes to leverage Nervana to compete more directly with NVIDIA’s big silicon. Unfortunately for Intel, Nervana now appears to be at least 18 months away—a  result of an even larger redesign, which I forecasted here. Here's what I learned

At the event keynote, Naveen Rao, Intel’s SVP for AI, articulated the company’s strategy for AI: essentially, to provide the full range of general purpose and specialized devices for AI supported by a unified suite of optimizing development software. As Mr. Rao pointed out, running AI apps is not a market where one size fits all, and Intel has products offering a broad range of performance, latency, and power envelopes for inference processing

Figure 1: Intel SVP Naveen Rao opened the event targeting AI app developers with an informative keynote including appearances by some of Intel’s largest customers.

The event also gave the company the opportunity to share some customer progress with Intel hardware for AI, including Google, Amazon, Microsoft , Novartis , and Facebook. Novartis is a good example where the GPU memory may be inadequate for processing large datasets without thrashing between HBM and the CPU-controlled DDR4 memory. This also points to the benchmarking issues that Intel is complaining about. Specifically, the performance of a chip in training the ImageNet database, where images are only 224x224x3 bits, is irrelevant if you, like Novartis, are processing 1024x1280x3 bit images of molecules. Novartis also recently scaled its AI training to eight nodes using OmniPath, reducing its training time from 11 hours to just 31 minutes. Intel noted that the Skylake generation Xeons have specific instructions such as reduced precision math operations that help Xeon performance for AI.

Facebook shared some interesting data showing that it uses (Xeon) CPUs for all inference work and select training jobs, and GPUs for training Convolutional Neural Networks (CNNs) for image processing and Recurrent Neural Networks (RNN) for speech and language translation. It is workloads such as these for which Intel needs Nervana, but more on that in a minute. Per the subtitle in Figure 2, Facebook makes use of CPUs in part because it already has lots (millions) of them.

Figure 2: Facebook has published a blog showing for the first time where it uses CPUS and where it uses GPUS (in green) for its extensive AI processing.

Intel quite rightly pointed out that most enterprises have surplus CPU capacity, especially at night. Intel has eased the software burden to use those existing resources for many machine learning workloads. As Mr. Rao said, enterprises can run AI on the chips they already have.

In my mind, the biggest news of the event was an update on the highly anticipated Nervana Neural Network Processor (NNP) roadmap. Prior to its acquisition by Intel, Nervana was expected to deliver a fabric-enabled NNP accelerator that might beat an unspecified GPU by a purported 10X. Intel has been sampling that first-generation chip to major AI customers and plans to incorporate their input and enhancement requests into the first production NNP part, due in late 2019. I expected Nervana to ship its first part to generate revenue last year, but now it feels it can afford the time to get the first commercial product right. Intel gave us the tidbits we can use to project where the company might land with the NNP in figure 3 and 4.

Figure 3: Intel left a trail of breadcrumbs we can use to estimate the eventual performance of the NNP L-1000 AI accelerator. In Figure 3, Intel made the case that “Chip X,” which I assume is the NVIDIA Volta GPU, dramatically overstates it performance. While Intel may have a point here, I’d point to the fact that the 125 TOPS number is only relevant for 4x4 matrix operations performed by the NVIDIA TensorCore instructions. Without TensorCores, the Volta V100 is probably in the 30-40 TOPS range. That is roughly comparable to what Intel claims is the throughput for the Lake Crest chip, so it’s pretty obvious why Intel has decided to postpone commercial availability until the 2nd generation. Note the claim on the far right of Figure 3, that shows that the initial Nervana fabric delivers 2.4 Terabits of bandwidth at < 800 ns latency. That claim is very impressive, and important since low latency networking is vital for scale-out parallel processing in training very large neural networks. Figure 4 shows where Intel expects the production Spring Crest to land, tripling performance of the current NNP to become quite competitive with the NVIDIA Volta GPU. I would be quite surprised if NVIDIA’s next chip doesn’t provide a more general purpose TensorCore capability, but at this point the company hasn’t provided any hints of what that next chip might look like (in my 2018 AI Predictions blog, I forecasted that we might hear about the post-Volta chip at the SC18 conference in November 2018). Conclusions Intel understands the strategic importance of catching the AI wave and is focusing on inference processing in the datacenter and at the edge. For training large models and networks demanding large memory capacity, it promotes Xeons, because, well, that’s what it has. The company has decided to wait until late 2019 for its 2nd Nervana chip, in an attempt to match or beat NVIDIA head-to-head. Importantly, the company is investing over $1B in the AI ecosystem and has funded research and educational programs in over 100 universities around the world. Intel has added unifying software and AI features to Xeon, while forging ahead with FPGAs, MobileEye, and Movidius for application-specific requirements. As a result, the company has a fairly robust AI portfolio, with the notable exception of big iron training. This is all impressive progress, and I will be watching closely to see how this materializes in more customer wins and success stories in the future. Like everyone else who isn’t a Facebook or Microsoft, I will just have to wait for the 2nd  Nervana chip to properly gauge how Intel might compete with NVIDIA and the many startups who are readying silicon for training AI networks.