Traditionally, NVIDIA introduces the company’s latest GPU architecture at its GPU Technology Conference (GTC), which I have been attending since its inception in 2009. Over the years, NVIDIA launched its enterprise, datacenter and AI GPU products, as well as the architecture behind them, at the GTC conference. This year is no different, except for the fact that the announcement is happening virtually and a bit later than the traditional March timeframe. That said, a lot of announcements that were slated for the March were delayed until companies and employees figured out the whole work from home situation. Ampere is the next in the series of NVIDIA’s GPU products named after some of the world’s most famous scientists (a nod to NVIDIA’s GPU compute origins in the scientific community, working with GPGPU and CUDA).
The NVIDIA A100
Typically, NVIDIA’s GPU architectures for enterprise and datacenter target AI acceleration, with an initial focus on training. That said, NVIDIA is now significantly boosting the company’s performance in inference as well, to fend off its many competitors who chose to focus on the area because of NVIDIA’s primacy in training. NVIDIA claims some pretty significant AI training and inference performance improvements with the A100, on the order of “up to 20x,” which the company says is its largest leap in performance in eight generations. The new architecture of the NVIDIA A100, Ampere, is what NVIDIA calls an elastic and multi-instance GPU (MIG) which unifies data analytics, training and inference into a single chip.
The NVIDIA A100 GPU is capable of being split into seven different instances on the same chip, which NVIDIA calls MIG. Alternately, it can be combined with many different A100 GPUs via NVLink to create a single massive virtual GPU. So, how big is this new GPU exactly? Well, NVIDIA hasn’t given the exact dimensions of the chip itself, but we know that it is a 7nm chip with over 54 billion transistors, making it the largest 7nm chip on earth. By comparison, the last generation NVIDIA GV100 was a 7nm, 21 billion transistor chip, and it previously held the record for the highest transistor count in a GPU. The GV100 was built on 12nm, so going from 12 to 7nm should buy NVIDIA some space for doubling the transistor count. That said, I still expect that transistor density went up significantly. Regardless, this is the highest transistor count of any processor ever created, and that includes all the different CPUs, GPUs, FPGAs and other accelerators out there. The NVIDIA A100 GPU also features 40GB of HBM2 memory per GPU, which, when combined with 7 additional GPUs inside of the new DGX A100, can total up to 320GB of GPU memory. That’s also an interesting memory capacity, because it could indicate a different memory controller configuration or a different number of HBM chips or densities—one would expect 48GB rather than 40GB based on previous architectures.
The A100 GPU also features NVIDIA’s 3rd generation of Tensor Cores which add significant performance improvements to AI inference. The new Tensor Cores include support for TF32 which is how NVIDIA is able to claim the 20x AI performance improvement for FP32 without making any changes to code. Additionally, NVIDIA’s Tensor Cores finally support FP64, which the company claims will deliver up to 2.5x more computer than the previous generation for HPC applications. The A100 also features NVLink 3.0, the third generation of NVLink, which doubles the bandwidth between GPUs in order to improve scaling across multiple GPUs in the same node or cluster via Mellanox Infiniband. NVIDIA says that the A100 GPU is already in full production and is immediately available directly through NVIDIA or its many partners.
Microsoft is already a customer of the A100, along with some other early adopters. These include Indiana University, Julich Supercomputing Centre, Karsruhe Institute of Technology and Max Planck Computing and Data Facility in Germany. Notably, the US DoE Lawrence Berkeley National Laboratory is also using it. Many CSPs also plan to offer A100-based services, including Alibaba, AWS, Baidu, Google, Oracle and Tencent. Additionally, A100-based servers are expected from the usual suspects: Atos, Dell Technologies, Fujitsu, Gigabyte, H3C, HPE, Inspur, Lenovo, Quanta and Supermicro.
Edge compute powered by Ampere
Thanks to the NVIDIA A100 GPU, NVIDIA is also launching a multitude of other products based on the Ampere architecture. One of those products is the EGX A100, which is designed to be slotted into existing edge data centers to help manage data near the edge (especially the vast volumes of data that might be coming into a data center closer to the edge). When paired with an NVIDIA Mellanox ConnectX-6 Dx network card, the EGX A100 can receive up to 200 Gbps of data and send it directly to the GPU memory for AI or 5G signal processing. With the introduction of NVIDIA Mellanox’s 5T (time-triggered transport technology for telco) for 5G, the EGX 100 is a cloud-native, software-defined accelerator for low-latency 5G applications. The EGX A100 will be available at the end of the year, at a price that has not been disclosed. I expect there will be others who try to offer similar solutions with the A100 in the future. In addition to the EGX A100, NVIDIA also announced the EGX Jetson Xavier NX, which isn’t based on Ampere, but is designed to enable lower power edge compute solutions. At 10 Watts, the EGX Jetson Xavier NX delivers 14 TOPS of performance, and at 15 Watts, it delivers 21 TOPS. Products based on the EGX Jetson Xavier NX are available now, for those looking to use a lower power solution at the edge.
NVIDIA’s Ampere-based A100 GPU is without a doubt the company’s largest, most complex and most powerful GPU ever. While we don’t know all architectural details of the GPU quite yet, we do know that this is going to set a new standard for enterprise compute, AI performance and HPC. While NVIDIA is still very much a gaming company, it is quite clear that many different derivative GPUs will come out with similar capabilities for cloud, gaming and professional graphics. NVIDIA is injecting AI in all of these applications and by accelerating AI performance, I believe that NVIDIA is differentiating itself from its competitors and widening the performance gap in certain workloads that can be accelerated by AI. I’m genuinely excited to see what NVIDIA’s partners can do with the A100 because we desperately need all the compute power we can get to beat diseases and viruses like COVID-19. There’s so much runway still left in AI for so many different applications. I believe we’re still in the early days of what’s possible, and I believe the A100 is absolutely a big leap forward.