In NVIDIA ’s Q1 2019 quarter, the company once again exceeded expectations, reporting a 66% growth in total revenue, including 71% growth in its red-hot datacenter business (reaching $701M for the quarter). For NVIDIA, the “Datacenter” segment includes High-Performance Computing (HPC), datacenter-hosted graphics, and AI acceleration. While that is certainly an impressive growth rate, it is smaller than the 2-3x year-over-year growth the company has enjoyed over the last few years. This raises a few interesting questions we will examine here. Is this slow-down in growth a sea change or just the law of large numbers catching up with the business? Will the emergence of custom in-house chips like Google ’s TensorFlow Processing Unit (TPU) threaten NVIDIA’s dominant position in Deep Learning Training? Can Intel, AMD , and all the startups in the sector catch up? This blog is longer than the ones I typically author, but there’s a lot to consider here.
Figure 1: NVIDIA revenue by segment shows growth in nearly all parts of their business
What did NVIDIA announce?
As you can see above, the company had a monster quarter. The stock sold off ~2% but it is still up roughly 33% year-to-date and nearly seven-fold over the last 2 years. A few analysts point to the “miss” on datacenter revenue, coming in at $701M vs. the forecasted $703M, but it is just plain silly to think that a 0.2% delta is significant. I suspect that some traders are just taking profits while some investors are increasingly concerned that the competitive landscape is about to take a turn for the worse. I would note that the previous four quarters all showed more than 100% growth in the datacenter business and that 71% growth is not too shabby for a business that will probably generate revenue in excess of $3B in the coming year.
I would also note that the previous quarters included a very large chunk of HPC business from the 27,600 Volta GPUS installed in Oak Ridge National Lab’s Summit Supercomputer. We don’t know what ORNL paid for each Volta but if you assume it is in the $5,000-$8,000 range, this would equate to $130-200M in revenue for NVIDIA. If you subtract that and consider the underlying strength of the business, the ~70% growth rate could be indicative of the organic market growth rate and may, therefore, be sustainable. That is if NVIDIA can fend off the competitors who are gearing up to try to cut them down a notch or two. I’ll touch on how NVIDIA may choose to adapt to competition later on in my conclusions.
Where’s the competition?
NVIDIA’s impressive growth in AI has attracted a lot of attention and potential competitors, many of whom claim to be working on chips that will be 10 times faster than NVIDIA while using less power. That being said, there are only a few companies that might have chips out this year or next. It turns out that designing a chip that is 10X better than what the thousands of NVIDIA engineers can imagine and produce is pretty difficult, and takes a lot of time and money. Except for AMD’s GPUs, all are betting that a chip designed solely for processing neural networks is the way to go. Let’s look at the field.
Intel purchased Nervana (and Mobileye, Movidius, and Altera) to build out an accelerator portfolio. The original Nervana Engine part discussed prior to the 2016 Intel acquisition was supposed to be released last year, but so far all we’ve heard are crickets. It is possible that the company decided to rework its design after NVIDIA surprised everyone with Volta’s TensorCores, which increased performance by up to six-fold over Pascal (NVIDIA’s previous generation GPU). Nervana was supposed to be 10x Pascal so one can see why TensorCores might have given Intel pause. If the V100 Volta is 6X Pascal for key AI operations, that doesn’t make the “10x” claim sound very impressive; especially since Nervana’ performance was supposed to include software tweaks. That being said, NVIDIA also regularly increases application performance with software optimization work. If, in fact, Intel went back to the drawing board, I suspect that the earliest it could have a Nervana part in volume production would be late 2018—just in time for NVIDIA to announce whatever comes after Volta, perhaps at SC’18 in Dallas.
The discussion above is all about training deep neural networks, or DNNs, which is where NVIDIA has garnered much of its success in AI. However, Intel makes the good point that excellent performance can be achieved in inference work by pairing good software design and their standard Intel Xeon datacenter processors. The company claims to enjoy over 80% share of the market for inference processing and I have no reason to doubt it. At a recent event, Intel’s customers in health care also spoke of the advantages of running training and inference processing on the same Intel platform.
In addition, Microsoft has been quite vocal about its success with Intel Altera FPGAs, which can be reprogrammed continually to accelerate a wide range of demanding applications. Note that Xilinx is also making progress here, using the Amazon AWS Market Place and F1 accelerationinstances to ease the on-ramp to FPGA application acceleration. With some data types and latency requirements, such as those in drones and automobiles, a dedicated low-power accelerator will be required (hence the Intel acquisition of MobileEye and Movidius).
Google TPU and other in-house ASICs:
Google has two working Application Specific Integrated Circuits (ASICs) for AI: one for inferencing and a second generation part for training. Google markets a “TPU” as one accelerator, but actually, it is built with four identical ASIC parts, each delivering about 45TOPS (Trillion Operations Per Second). For reference, NVIDIA Volta delivers up to 125 TOPS per chip. It’s confusing and, in my opinion, very poor marketing. That being said, the chip has a few benchmarks that show that it has very good performance if, and only if, you a) do not need to run your AI outside the Google Cloud, b) are happy using unoptimized TensorFlow models, and c) do not want or need to have direct control over the ASICs like most scientists enjoy with NVIDIA GPUs. That, frankly, is a very small niche market today, but this might be beside the main point: Google is likely to move a large chunk of its internal GPU work to TPUs over time.
Google recently announced its next-generation TPU 3.0, with few details and confusing performance claims that do not make it clear whether the company is touting the performance of a much larger “pod” or an individual TPU chip. It looks to me like the TPU 3.0 is primarily an impressive system redesign, with water cooling to enable much more density. Keep in mind that TPU 2.0 is still only available in single units, not a “pod” cluster until late 2018, and only for beta testing, a full year after it was announced. The implication here is that I doubt we will see TPU 3.0 in production anytime soon.
Stanford University recently published benchmarks that demonstrate that no one solution dominates the spectrum of AI workloads. It all depends on what you are doing. For clouds, a GPU may be a better choice since the cloud customer usage pattern is constantly in flux, demanding a wide variety of models and using different software frameworks. For this reason, I expect Google will continue to offer NVIDIA GPUs for the foreseeable future or risk losing business to Amazon AWS and Microsoft Azure.
As for other companies such as Facebook and Amazon that are rumored to be heading down this same road, I for one remain skeptical (read my reasoning here). I’m not saying it won’t happen, I just doubt it will anytime soon.
While AMD has done a good job of preparing its software stack to compete with NVIDIA for Machine Learning workloads, its current chip (Vega) is a generation behind NVIDIA’s Volta in terms of peak performance (25 TOPS vs. Volta’s 125). I suspect AMD may catch up to Volta later this year or next, perhaps by using a redesigned 7nm GPU part. Still, AMD will have its work cut out for it to develop the market and ecosystem to be able to compete with NVIDIA. Having a fast chip is necessary but insufficient to ensure success.
I can count over a dozen startups around the world with plans to compete for machine learning workloads, including a few who already have chips ready. China’s Cambricon looks to be extremely well-funded and backed by the Chinese government, who has clearly tired of US tech companies enjoying all the profits from AI silicon. Cambricon has working parts now, although not targeting DNN Training. Like many others, Cambricon is focused on processing neural networks, not building them.
Silicon Valley-based Wave Computing is probably the farthest along with a chip for building training models. Wave features a novel design called “DataFlow Architecture,” which it claims will eliminate many of the bottlenecks of traditional accelerators that connect to CPUs via PCIe. With Wave, there is no CPU; the data flow processors directly train and process the neural network. Unlike the Google TPU, Wave will support Microsoft CNTK, Amazon MXNet, and Tensorflow software for deep learning. From what the company has said publicly, I expect these systems will ship sometime in the second half of 2018. Note that I said “systems,” not just chips—Wave intends to build custom platforms and appliances for cloud and in-house Enterprise AI work.
Other high profile firms like Cerebras, GraphCore, and Groq remain in stealth mode but have raised significant venture capital to build custom AI accelerators. I don’t foresee them delivering working systems until perhaps 2019, so we will all have to stay tuned.
It’s of note that the British events company Kisaco Research is holding the first AI Hardware Summit, at the Computer History Museum in San Jose on September 18th and 19th. This event will be the first of its kind to focus on AI silicon and systems. The agenda for the conference won’t be live for another week or so, but I expect the event will give us a better idea of what a lot of these companies have been cooking up.
With all that in mind, let’s return to NVIDIA’s prospects. Clearly, the company is hitting on all cylinders, from gaming to crypto, to AI, and I don’t see any material roadblocks that will slow them down in the near future. Beyond the company’s drive for innovation, led by CEO Jensen Huang, and its awesome chips, software, and platforms, NVIDIA enjoys a near-rabid installed base of developers and cloud service users who love what they are getting, even if they may balk at the high prices.
NVIDIA CEO Jenson Huang leads the company with hands-on management and a vision for innovation.
Frankly, I think the largest threat may be the Google TPU, although the scope is limited in the short-term to internal consumption of AI training in Google. Google will probably continue to buy and use many GPUs for workloads for which the TPU is not well-suited, such as recurrent neural networks for language processing. I believe that Wave has a good shot with enterprises that don’t want to use public clouds for their AI development and deployment, and don’t want the hassle of setting up their own GPU infrastructure. Finally, If Intel can get to market with Nervana and is willing to invest to fully support it, it could represent a threat in 2019, but only on the margins; it will need at least 3 years and a solid roadmap to develop a viable ecosystem. One factor to consider is that NVIDIA will be able to add significant die area for AI features as it moves to 7-nanometer fabrication. As a result, the percent of the die focussed on AI May increase to the point that the part effectively becomes an ASIC that also does graphics.
I don’t think of NVIDIA as a GPU company, I think of it as a platform company with an insatiable appetite for growth. Keep in mind that nobody in the industry has the depth and breadth of AI hardware and software expertise that NVIDIA enjoys. If NVIDIA foresees a threat from AMD, Intel, or ASICs, it can and will design a better AI mousetrap. It already did this with the Deep Learning Accelerator (DLA), which I covered here. If GPUs become threatened, NVIDIA can and will pivot to whatever is next. In the meantime, it has a clear path to significant growth and market leadership in AI training chips. When it comes to inference processing, NVIDIA is focused on demanding datacenter workloads and vision-guided systems for applications such as autonomous vehicles. While the auto market will not become material for the next few years, I have little doubt that it will eventually add significant growth—perhaps just about the same time the AI training market begins to slow or see more significant competition.