The technical community has been looking forward to seeing how NVIDIA’s monster Hopper H100 Tensor Core GPU would perform ever since its March announcement at GTC2022. No one was disappointed when the latest round of its artificial intelligence tests performed under MLPerf v2.1 were published last week.
MLPerf v2.1 is a product of MLCommons. It provides benchmarking for machine learning models, software, and hardware (and energy consumption as an option). It is the industry benchmark for deep learning, AI training, AI inference, and HPC. This specific test, MLPerf Inference v2.1, measures inference performance and how fast a system can process inputs and produce results using a trained model.
Each benchmark test is defined by its dataset and quality target. The table above summarizes benchmarks used in MLPerf v2.1. Also shown in the NVIDIA results is DLRM (Deep Learning Recommendation Model), a recommendations model introduced by Facebook.
The NVIDIA H100 is NVIDIA’s ninth-generation data center GPU. Compared to NVIDIA’s previous generation, the A100 GPU, the H100 provides an order-of-magnitude greater performance for large-scale AI and HPC. Despite substantial software improvements in the architectural efficiency of the H100, its major design focus has been carried over from the A100.
In the Data Center category, the NVIDIA H100 Tensor Core GPU delivered the highest per-accelerator performance across every workload for both the Server and Offline tests. It had up to 4.5x more performance in the Offline scenario and up to 3.9x more in the Server scenario than the A100 Tensor Core GPU.
NVIDIA attributes part of the superior performance of the H100 on the BERT NLP model to its Transformer Engine. The new engine, combined with NVIDIA Hopper FP8 Tensor Cores, delivers up to 9x faster AI training and 30x faster AI inference speedups on large language models as compared to the prior generation.
Speed is crucial because huge AI models can have trillions of parameters. The models are so large, it may require months to train one with that amount of data. NVIDIA’s transformer engine provides additional speed by using 16-bit floating-point precision and a new 8-bit floating-point data format that increases Tensor Core throughput by 2x and reduces memory requirements by 2x compared to 16-bit floating-point.
Those improvements, plus advanced Hopper software algorithms, speeds up AI performance and capabilities and allow it to train models within days or hours instead of months. The faster a model becomes operational, the earlier its ROI returns begin, and operational improvements can be implemented.
NVIDIA A100 continues high-level superior performance
Although the H100 is the latest GPU generation, a check of MLPerf v2.1 results confirms that NIVIDIA’s prior generation A100 GPU is still producing record results and high performance:
- A100 GPUs won more tests than any submission in data center and edge computing categories and scenarios.
- Three months ago, the A100 delivered overall leadership in MLPerf training benchmarks, demonstrating its performance capabilities across the AI workflow.
- It has had continued improvements. Since the A100 made its first appearance in the July 2020 MLPerf results, NVIDIA has increased its performance by 6X.
- NVIDIA AI was the only platform to run all MLPerf inference workloads and scenarios in data center and edge computing.
Orin at the edge continues energy improvements
Edge computing is key to the success of many emerging applications with exponential growth. Orin is built for edge AI and robotic applications. In the previous MLPerf round, Orin performed up to 5x faster than its prior-generation Jetson AGX Xavier module. At the same time, Orin delivered an average of 2x better energy efficiency.
For MLPerf v2.1, NVIDIA Orin ran every MLPerf benchmark in edge computing, winning more tests than any other low-power system-on-a-chip. And although it was not the most energy-efficient, thanks to full-stack improvements, Orin has shown additional energy efficiency improvements of up to 50% compared to its earlier MLPerf results in April. Its efficiency is expected to continue to improve.
In MLPerf Inference v2.1, despite several significant model and dataset changes from v2.0, the first NVIDIA H100 submission set new per-accelerator performance records on all workloads in the data center scenario and delivered up to 4.5x higher performance than the A100. Its increased performance resulted from many Hopper architectural breakthroughs and software optimizations that leveraged the new capabilities. We look forward to seeing its results in the next round of MLPerf testing.
- MLPerf provided a first look at the impressive performance and power of the NVIDIA H100. Although it is not currently available, it will likely be released later this year.
- It is unknown to what extent government export restrictions will affect NVIDIA’s overall production and product availability.
- NVIDIA is planning continued participation of the H100 in future rounds of MLPerf benchmarking.
- The complete results for MLPerf v2.1 are available here.
Paul Smith-Goodson is Vice President and Principal Analyst for quantum computing, artificial intelligence and space at Moor Insights and Strategy. You can follow him on Twitter for more current information on quantum, AI, and space.
Note: Moor Insights & Strategy writers and editors may have contributed to this article.