NVIDIA surprised the market last Thursday with earnings that beat expectations, driving their stock up over 15% the following day. The Automotive and Datacenter market segments were especially strong, driven in large part by demand for NVIDIA’s accelerators for Deep Learning (DL) applications for Artificial Intelligence (AI). NVIDIA has strong products and roadmaps and few serious competitors. However, when fast growing markets get large enough, a gold rush ensues; startups like Nervana and large players like Intel and Qualcomm are developing their own products to join the fray.
NVIDIA Tesla P100: Twice the performance plus High Bandwidth Memory (Source: NVIDIA)
This is NVIDIA’s house, at least for now
While their traditional GPU business for desktop PCs and workstations felt the softness of the desktop market that has plagued the likes of Intel and Advanced Micro Devices, NVIDIA has now successfully diversified their business to address the datacenter and automotive markets. These segments delivered revenues of $143M and $113M in the latest quarter, an eye-popping 63% and 47% year-over-year growth respectively. And the company has just rolled out its next generation chips for this market and is well positioned to continue on their growth path.
NVIDIA co-founder and CEO Jen-Hsun Huang (Source: NVIDIA)
NVIDIA officially launched Pascal at GTC 2016 and has recently begun the rollout of their new architecture specifically targeting the High Performance Computing and AI markets. Perhaps equally important is the enviable position NVIDIA has built in the emerging ecosystem for Deep Learning. Their cuDNN software appears to have become pervasive, while the open source AI frameworks such as Torch, Caffee, Theano and Google TensorFlow all support NVIDA accelerators. NVIDIA also provides hardware, assistance and expertise to become entrenched in universities, software developers, OEMs, and large end users.
But Deep Learning is different
The arithmetic behind deep learning is similar to traditional GPU compute workloads NVIDIA has enabled with the NVIDIA Tesla product line, in that it requires parallelization of matrix operations. However there is a key difference: the bulk of the processing in DL does not require double (64-bit), or even single (32-bit) precision floating point. In fact, half precision (16-bit) calculations are adequate for the vast majority of the work. NVIDIA’s new GPU architecture, Pascal, is the first to our knowledge to support “half-float” instructions in hardware and should therefore be at least twice as fast as its predecessors in calculating the variables of deep neural networks. This is one reason why many innovators and early adopters in the machine learning space are eager to get their hands on the Tesla P100 chips. The other the HBM2 High Bandwidth Memory. It’s not often someone breaks Moore’s Law, but that is just what NVIDIA has accomplished with these products.
Enter the challengers
Attracted by the impressive growth of AI, startup Nervana Systems, based in Palo Alto and San Diego, CA, and founded by Qualcomm AI veteran Naveen Rao, has raised over $24M in venture capital to create a hardware and software platform for AI and is approaching this opportunity holistically. First, it has developed an optimized software library called Neon for Deep Neural Networks, which the company claims is twice as fast on existing NVIDIA hardware as the popular Caffe and Torch open source frameworks. Delivering this platform on their own AI-as-a-service cloud with NVIDIA GPUs today, they have landed customers such as Blue River Technology, which is building agricultural robots that allow farmers to assess their crops plant by plant.
But their real weapon could be the Nervana Engine, a neural network accelerator implemented in TSMC 28nm technology that they hope to ship in early 2017. Instead of supporting the full- and double-precision floating point needed to accelerate a wide range of scientific codes, Nervana’s chip only accelerates half-precision operations. While this narrows their target market compared to a general purpose GPU like the NVIDIA Tesla, it could allow them to cram more half-float arithmetic units for AI on their chip. The company believes this will enable them to deliver 10 times the performance of NVIDIA’s current Tesla K80 accelerator. That implies it could be around 5 times faster than the new Pascal chips, which is plenty fast enough to steal some valuable customers if they deliver on schedule and convince customers to use the company’s proprietary Neon stack. That last point could become an issue for Nervana, unless they can also provide support for running existing codes without change.
Established players are also keen to get in on the AI party. Intel has recently been touting the potential to use Altera FPGAs for deep learning. While the company has not yet projected any performance results, the concept of using OpenCL and the efficiency of FPGAs has potential to pose a credible threat to NVIDIA’s dominant position. It’s important to note that Intel accelerators have not been able to slow NVIDIA’s traction to date.
On the “inference engine” side of the Deep Learning process, Qualcomm’s new SnapDragon 820 and the company’s “Zeroth” SDK have positioned the company to provide performance chips and software to attract mobile and embedded smart applications for the Internet of Things. The Zeroth SDK will enable a multitude of Snapdragon-based devices to perform cognitive computing functions on pre-trained neural nets.
Perhaps those neural network datasets will be trained on NVIDIA as they are today, or perhaps they will be trained on Intel or Nervana based accelerators in the future. One thing is for certain: it is shaping up to be a fight over a very large prize. And NVIDIA is now the 800-pound gorilla the competition will have to tame.