NVIDIA Launches Ampere A100 GPU For Data Center Computing And AI

Figure 1: NVIDIA shared performance comparisons across a range of precisions. Note the huge performance increase for sparse matrix operations. Note the banks of 6 HBM chips. 

While many of us missed watching Jensen Huang on stage in his trademark leather jacket, he did not disappoint his on-line audience at the virtual GTC Keynote this week. In a session lasting over two and a half hours, the CEO and founder of NVIDIA announced new hardware to fend off a slew of potential competitors and new software to open up lucrative markets for GPU acceleration. The cornerstone of the announcement was the new Ampere GPU architecture, with the flagship A100 platform available through practically all hardware and cloud channels. Let’s take a deeper dive into the announcement and consider the implications for AI and NVIDIA, and even AMD.

Star of the show: the Ampere A100 GPU

Almost exactly three years ago, NVIDIA launched its Volta architecture and the V100 GPU with Tensor Cores, which dramatically accelerated performance for FP32 and FP16 tensor operations used in training AI deep neural networks (DNNs). Rarely does a new technology maintain market leadership for that long, but the V100 remained the industry’s flagship AI chip, until now.

NVIDIA released surprisingly few details about the A100. However, the 7nm chip, with over 54 billion transistors, appears to break the mold in performance, as measured in TOPS. Furthermore, it is already in “full production,” and is available now in a new 5 Petaflop NVIDIA DGX A100 server. The A100 GPU will soon be available from practically every server OEM and cloud service provider, according to NVIDIA. Here are a few highlights:

  1. The new 3rd generation Tensor Cores can process data in a variety of floating point and integer precisions, an important feature in the mixed-precision work that is beginning to dominate AI processing. The new “FT32” precision can increase performance by a factor of ten over the V100, and up to twenty-fold when processing sparse matrices exploiting what NVIDIA calls structural sparcity (common in many AI applications). NVIDIA’s performance comparisons can be seen in the figure below, though I would note that these are based on TOPS (trillions of operations per second), not real-world applications or benchmarks such as mlperf (which I suspect will be forthcoming later this year).
  2. The new DGX A100 features the 3rd generation NVLink, which doubles inter-chip network bandwidth, and 6 NVSwitch modules. NVIDIA didn’t mention this, but the slide Jensen shared clearly states that DGX sports dual AMD Rome EPYC CPUs. That would mark the first time NVIDIA has collaborated with its primary GPU competitor, to get access to the great I/O, memory and computational bandwidth of the 2nd gen AMD EPYC CPU.
  3.  Finally, the new A100 has something called “multi-instance GPU,” or MIG. This allows the chip to act like up to seven distinct GPUs. This could be great for Cloud Service Providers, as they can price the A100 as a single GPU and scale up for big training jobs. They could also employ a smaller farm of MIGs for inference processing, reducing the need for a lot of smaller GPUs, and simplifying their infrastructure.
Figure 2: The new DGX A100 has a little surprise in addition to the 8 GPUs and 6 NVSitches: it comes with dual AMD ROME EPYC CPUs, not Intel Xeons. 

But wait, there’s more

Of course, NVIDIA is also updating its venerated CUDA-X software stack with new versions of over 50 CUDA-X libraries for graphics, simulation and AI. That is just table stakes. However, NVIDIA went beyond that with a few other announcements. First and foremost, it announced support for accelerating Apache SPARK, the industry’s largest open-source data analytics platform. This move opens up a huge market for NVIDIA GPUs, extending the company’s role in the data center from exclusively AI, HPC and graphics workloads, to the broad market for data analytics.

Jensen also announced two new AI services frameworks to address growing market opportunities. The first is “Jarvis,” named after the gregarious “Just A Rather Very Intelligent System” from Marvel’s Iron Man and Avenger movie series. Conversational AI is the next big frontier in AI, allowing computers to better understand and communicate with their human counterparts. Led in part by Microsoft and Baidu, the performance and memory capacity of the A100 could be an ideal platform for natural language interfaces. Jarvis simplifies the development and deployment of conversational AI on the A100.

The second framework is called Merlin, and it eases development of recommender systems on NVIDIA GPUs. Recommendation engines are potentially another big market for NVIDIA, as every e-commerce and social networking site has huge farms of CPUs to recommend products and services from massive data lakes. This vast quantity of user data has made the internet and online shopping a personalized service—now GPUs can be used to accelerate the process.


As usual, Jensen Huang’s GTC keynote was a tour de force of hardware and software innovations that likely left most of the audience breathless. Like the V100 from 2017, the A100 appears to surpass even the most optimistic expectations of many industry observers. In fact, this is NVIDIA’s first launch that featured commitments from practically every OEM and cloud vendor to support the chip right out of the gate. We await real-world benchmarks to properly assess NVIDIA’s claims, and the volume shipments necessary to provide the yields needed to meet demand for such a huge chip. However, the early commitments from the industry’s major players, including Google, AWS, Microsoft, Alibaba, Dell, Lenovo, HPE and many others, are an excellent indicator that this architecture will be a boon to the AI, HPC and Analytics markets. Furthermore, it also stands to form a powerful platform for robotics, autonomous vehicles and graphics. These topics will be covered further by my Moor Insights & Strategy colleagues Patrick Moorhead and Anshel Sag—stay tuned.