I attend hundreds of events a year as an analyst and one way I rate them is by the number and industry impact of the announcements. I watched NVIDIA’s GTC 2020 virtual keynote today, and the company scored high on both. Architecture is everything when you are a platform provider and NVIDIA rolled out its new architecture, Ampere, across most of its entire markets. The only exception was gaming but I’m sure that will be rolled out at the next gaming show.
NVIDIA rolled out its announcements in nine video chapters and I thought it made sense to roll out my coverage of GTC 2020 aligned with most of those eight chapters. I’ve included links to the most relevant parts.
Clara healthcare platform updated and “Data-Center-Scale Accelerated Computing” introduced
From his kitchen in 4K splendor, NVIDIA CEO Jensen Huang kicked off this section thanking front-line COVID-19 workers and citing numerous examples where GPU-accelerated computing was helping in the fight. The two that stood out to me were Oxford Nanopore Technologies and Oak Ridge National Laboratory (ORNL) and the Scripps Research Institute who respectively, sequenced the virus in seven hours and already screened a billion drug combinations in one day. It’s important for all keynotes these days to start by talking about COVID-19 as I believe this matters a lot to the audience.
The company took the time to provide an update on Clara, NVIDIA’s healthcare platform. It introduced Guardian to power smart hospitals, new AI models to advance COVID-19 research and improve detection and a new world record for analyzing the human genome in under 20 minutes. As I said here, supercomputing will only get more important as we look at today’s and future pandemics.
Huang continued onto NVIDIA’s architecture for the datacenter, coined as “Data-Center-Scale Accelerated Computing”. He went on to make the case that the datacenter is the new computing unit. This was a very provocative statement, but it basically says that the future is all about converged solutions containing CPU, GPU, networking (DPU), and a comprehensive software stack. It was an effective way for the company to reinforce why acquiring Mellanox made sense for NVIDIA and its customers.
Omniverse and tip of the hat to NVIDIA RTX, DLSS 2.0 and ray tracing
This segment was a bit of a victory lap for NVIDIA on ray tracing and upsampling using AI (DLSS), which, I believe it deserves. This led to the Omniverse announcement. I believe the company has led the industry in accelerated ray tracing and AI for both gaming and workstations. RTX was announced at SIGGRAPH 2018 and it’s safe to say ray tracing is now a mainstream feature.
Jensen showed some cool eye candy demos of how the Unreal engine was upsampling a 540P scene to 1080P. Then he showed how ray tracing and AI came together on Minecraft. I wrote about the Minecraft for RTX Beta (here).
Huang then moved into the announcement that “Omniverse”, a collaboration platform for graphics and simulation workflows, was now available to early access customers. Huang showed how different developers in AEC (architecture, engineering, construction) working on different tools (Rhino, Max, Revit, AR) could work on the same project, via the cloud.
To me, the biggest significance is seeing NVIDIA develop a cloud-based collaboration platform that micro-verticals like AEC can leverage
Apache Spark 3.0 acceleration for big data
Huang then dove into GPU acceleration for the biggest of data in HPC and scientific computing where NVIDIA already performs well. NVIDIA’s software library already supports over 700 CUDA-accelerated applications and Spark 3.0 acceleration is a huge announcement.
Many data scientists use Apache Spark. Adobe is one of the first companies working with a preview release of Spark 3.0 running on Databricks and said it achieved a 7x performance improvement and 90 percent cost savings in an initial test.
In his classic phrase we have learned to appreciate, Huang said “the more you buy, the more you save.” I have not scoured through every inch of the cost analysis, but similar prior claims he’s made have passed my scrutiny.
Huang ended this session saying Databricks and Google Cloud Dataproc will soon be offering Spark with GPU acceleration. I find this announcement fascinating given just how much revenue opportunity there is in ETL (extract, transform, load). This news coupled with NVIDIA RAPIDS support on Google Cloud AI, AWS Sagemaker means NVIDIA is moving upstream in data workflows.
Merlin recommendation application framework
Next, Huang jumped into “Merlin”, a new application framework for recommendation systems. You’ve likely used thousands of recommendation systems before if you’ve used Amazon (“Customers who bought this item also bought…”) or Netflix (suggested videos). He said, “Merlin slashes the time needed to create a recommender system from a 1-terabyte dataset from a couple of days to just a few minutes”.
I will be watching this one very closely as I believe most real-time, or near real-time recommendation systems use CPUs today. The frameworks for recommenders are now more complex than ever and I believe, in need of acceleration.
Jarvis conversational AI application framework
All my readers have either used Alexa, Siri, Cortana, or Assistant intelligent agents. But what about application developers who want to use something else in their applications? Enter NVIDIA Jarvis.
Jarvis is NVIDIA’s application framework for multi-modal (speech and vision) and conversational AI applications. The company says it can recognize vision and sound so companies can create real-time translations, closed captioning of everyone speaking, transcriptions of video calls in real time or power a number of other applications such as smart speakers, call centers, interactions with robots and cars, and retail services.
I find this incredibly interesting as NVIDIA just got into the chatbot enabling business. The latest conversational applications use AI to be humanlike in intelligence and speech, it is impossible to run such systems in real time without GPUs. Jarvis opens the door for every enterprise to build and retrain such language-based applications for their use cases.
NVIDIA A100 datacenter GPU and DGX A100 integrated system
While the previous announcements were interesting, Huang put the biggest reveal in the middle of the program, the A100 datacenter GPU. This would be part six, for those counting at home. For the deepest dive on the A100, head on over to Moor Insights & Strategy AI analyst Karl Freund’s post on the A100.
Huang kicked it off big by saying the A100 is in full production and shipping to customers worldwide. As Freund wrote, the company has “early commitments from the industry’s major players, including Google, AWS, Microsoft, Alibaba, Dell, Lenovo, HPE.” While Freund did a great job in his article, I wanted to point out my main takeaways on the NVIDIA A100:
- Desire to unify both training and inference on a single chip
- 20X performance boost using new TF32 over V100 using FP32
- MIG, or multi-instance GPU, to make many GPUs look like one to the programmer
For those who pay close attention to this space, the most interesting takeaway is on training-inference using the same chip. While the industry has coalesced on NVIDIA GPUs for training, there are currently 100s of options here or coming for inference. I wonder how many VCs will be losing sleep tonight.
It’s one thing to do a chip, another to create an entire system based on the chip. Huang also announced what we at the firm consider “converged infrastructure”, containing CPU, GPU, networking, memory, and storage. NVIDIA says the new platform, called DGX A100, is the first system to deliver 5 PetaFLOPS in a single “node”. I checked and it’s the only one I can find.
A single DGX A100 can contain eight A100s, six NVSwitches, nine Mellanox NICs, one dual core AMD Rome-based EPYC processor, and 15TB NVMe SSD. What a beast. And yes, it can likely play Crysis.
What a way to kick off an ecosystem.
EGX A100 edge computing with Ampere architecture and BMW factory robotics win
After that high point with A100 and DGX A100, Huang wasn’t finished. What’s good for the datacenter is good for the edge, right?
Huang announced the NVIDIA EGX A100 for larger edge servers and the smaller EGX Jetson Xavier NX for what NVIDIA calls “micro-edge servers.” The two products enable developers to enable different performance and price points as well as different form factors.
The EGX A100 consists of an NVIDIA Ampere GPU and Mellanox ConnectX-6 DX SmartNIC and can support “hundreds of cameras” while the EGX Jetson Xavier NX supports a couple of cameras.
It was nice to see NVIDIA bring a major design win to the table, BMW, who committed to NVIDIA’s robotics platform, called Isaac, to power many robots at its factories. Huang talked about numerous robots that split, pick, place, transport, and sort raw materials and work-in-progress assemblies.
Bigger picture, I see NVIDIA raising the bar by adding increasingly higher amounts of performance at different price points with a software platform that leverages all the designs. While there weren’t hard numbers on how many robots overall or revenue size, it sure is a positive feather in the NVIDIA cap. BMW is a very precise manufacturer and you can imagine how hard it would have been to win this business.
Moor Insights & Strategy analyst Anshel Sag hit some points on EGX here.
Automotive goes Ampere with Orin
If you have ever attended an NVIDIA CES event, you will know that NVIDIA is serious about the automotive market. CEO Jensen Huang once spoke 90 minutes at a CES keynote about the state of autonomous driving.
Just like what was “good for the datacenter is good for the edge”, what is good for the edge is good for automotive. With Orin, NVIDIA is bringing the Ampere architecture to automotive. What’s new here is NVIDIA offerings now span ADAS (10 TOPS/5W) to L2+ autopilot (200 TOPS/45W) to full L5 robotaxi (2,000 TOPS/800W).
Like the EGX family, the benefit of having a scaled architecture is that as an automaker or Tier 1, you can leverage design resources across hardware and software. In the past, OEMs and Tier 1s would have to qualify two separate platforms: one for ADAS and one for self-driving. Now they have only one platform to qualify.
While GTC 2020 was virtual this year, it certainly didn’t decrease the number of announcements or excitement. Too many events I attend now are devoid of product and service announcements and I think they’re real snoozers. Having attended nine GTCs, I can say this had the most announcements I can recall.
What I wanted to do was net out over 90 minutes of video from Jensen Huang’s kitchen and eight press releases and talk about what it means for NVIDIA. With NVIDIA’s chips based on the Ampere architecture and Mellanox networking, the company is bringing up to 20x AI performance in roughly the same power envelope. With the A100, it is very emphatically saying that you should be doing both training and inference on the same chip. That’s a really big deal because, while NVIDIA does have competitive inference solutions, there are at least 50 competing solutions out there already or coming.
The event also showcases Mellanox’s value add and integration into NVIDIA’s grand plan, which Huang hit first, which he described as “Data-Center-Scale Accelerated Computing.” With Mellanox in the datacenter, nodes and GPUs connected by Mellanox becomes the datacenter. I think that’s bold.
Overall, it was an impressive showing for NVIDIA. Now, I’m looking forward to seeing how NVIDIA will integrate Ampere into its GeForce gaming solutions.