AWS Unveils The Latest On Its Custom Silicon-Based Graviton3, Inferentia, Trainium Trn1, And Nitro SSD Instances At Re:Invent 2021

AWS re: Invent 2021 is the Amazon Web Services (AWS) annual user conference that focuses on cloud strategies and operations, security and developer productivity, and IT architecture and infrastructure. The significant change is that the event is back live in Las Vegas between Nov. 29 and Dec. 3, along with the option to attend online. I am attending in person.

AWS re:Invent is arguably the industry’s most-important cloud conference because of Amazon’s dominance in the public cloud IaaS and PaaS market, its massive ecosystem, and its tendency to launch dozens of products and highlight its roadmap for innovation.

This article will attempt to synthesize what I consider the biggest EC2 announcements related to AWS’s custom silicon and put some practical use cases around them to give you insight into what you can expect in the coming months.

AWS re:Invent AWS

I have written several articles covering how Amazon invests in custom silicon to differentiate in the price performance value proposition. The new Amazon Elastic Compute Cloud (Amazon EC2) instances announced this week powered by AWS-designed chips are yet another example. AWS isn’t abandoning AMD, Intel or NVIDIA as the company announced many instances with their silicon.

If you believe that cloud compute is becoming a commoditized service for many infrastructure workloads, the information below may change your perspective.

The evolution of Graviton

AWS continues to push the price-performance envelope on Graviton2. To date, AWS has launched 12 EC2 instances powered by Graviton2 processors that include general-purpose, compute-optimized, memory-optimized, storage optimized, burstable, and accelerated computing instances. The adoption has been across the spectrum of customers, from small startups to large enterprises. 

AWS has been on a three-year journey, with some having initial doubts about whether Graviton and Arm in the Cloud would succeed. The response has been incredible, starting with the “Graviton Challenge,” launched at our very own Six Five Summit 2021. The challenge helped developers move workloads to a Graviton-based instance by providing a blueprint or step-by-step guide. More than 1,000 developers took part from large enterprises to small startups and even a few individual open-source developers.

AWS just announced the winners at re-Invent 2021. The enterprise category winner was the VMware vRealize team. The project migrated 60 microservices and realized 48% latency improvement with a 22% cost savings. Kasm Technologies achieved 48% better performance in the startup category and 25% potential cost savings for the container streaming platform. Even though the contest is over, AWS will keep the four-day plan to help developers move to Graviton. You can find the Graviton Challenge winners here.

AWS is still working to make Graviton easy to use, including ecosystem support for ARM-based applications on Arm-based processors. Since Graviton was released in 2018 to spark that ecosystem support, there has been an incredible response from all Linux-based operating systems and software-provided services, such as containers, monitoring and management, security, and development software.

At re-Invent, AWS announced the Graviton Ready Program, enabling software partners to offer certified solutions and for customers to know which applications are Graviton ready.

At re-Invent, AWS announced three new instances based on Graviton2. Two storage-optimized instances, Im4gn/Is4gen with new AWS Nitro SSDs and the first GPU-based Graviton2, which brings Graviton2 together with an NVIDIA GPU. 

The new Graviton3

It is incredible to see Graviton moving from a limited-function processor in the first iteration to a full-service instance at this point and AWS continues to make it easy for customers to move workloads. 

But, with the announcement of Graviton3, AWS is not done yet. Graviton2 came with a 40% price-performance improvement over what was available in the cloud at that time. Graviton3 continues to push the performance envelope even higher, with new C7g instances delivering what AWs says is 25% higher compute performance when compared to Graviton2. AWS didn’t say it, but I believe Graviton is using Arm Neoverse V1 with support for SVE (Scalable Vector Instructions) and BFloat16. AWS also measures a 2x higher floating-point performance, essential for scientific and some machine learning or media encoding workloads. Also, there is 2x faster performance for cryptographic workloads, an area of focus for Graviton3 and a 3X performance increase on certain ML inference workloads. C7g instances also have the latest DDR5 memory, providing 50% higher memory bandwidth versus Graviton2-based instances to improve the performance of memory-intensive applications like scientific computing.

I’m looking forward to understanding better the improvements made to the chip’s fixed function accelerators.

There has been a focus on reducing the carbon footprint. AWS says Graviton3 uses 60% less energy to achieve the same level of performance versus comparable EC2 instances.

The new C7g instances powered by Graviton3 target compute-intensive workloads such as high-performance computing (HPC), gaming, video encoding, and CPU-based machine learning. 

Custom silicon for artificial intelligence (AI) and machine learning (ML)

Today nearly every AWS customer is doing some form of AI and ML from financial services, healthcare, manufacturing, and retail. Customers realize that AI and ML are essential to remain competitive and provide customers with an enhanced experience. One of the challenges with AI and ML is the high costs involved. 

There are two sides to AI and ML. You train models and then perform inference which is the part of ML with deep learning when you apply those models. The cloud has been a fantastic enabler for AI and ML providing access to high-performance computing, high-speed networks, and a vast amount of storage, readily available in an on-demand fashion.

Most customers would love to do more AI/ML as it has positively impacted the business and the customer experience. The thing that’s slowing them down is the cost of running training models and inference. AWS looked at improving performance and reducing the cost of AI/ML, which led to its custom AI and ML silicon.

AWS released the first machine learning chip, called Inferentia, targeted at inference. The inference process is a real-time analysis of incoming data. AWS started there because about 90% of the cost of ML goes into performing inference. Inferentia delivers the high performance and throughput needed for machine learning inference at a significantly lower price than GPU-based instances. 

AWS delivered the AWS Neuron SDK, making it simple for developers to go from a GPU-based inference model to Inferentia using frameworks like TensorFlow and PyTorch.

Like inference, ML training can be costly. It requires much high-performance computing with parallel processing. Time to train is a critical metric, and obviously, the cost to train is essential as well. Customers are constantly gathering new data to go back and retrain the model, which increases cost. 

AWS can speed up the process with highly parallel math operations and the highest computing power to train ML models. AWS has doubled the networking throughput from 400 gigabits per second on the GPU-based instances to 800 gigabits per second, providing high-speed throughput both on the network and interconnects between the training chips, bringing down the latency and providing the fastest ML training available in the cloud. 

With the high-speed networking, customers create EC2 UltraClusters bringing thousands of training accelerators together using the 800-gigabit networking to generate a petabyte-scale, non-blocking cluster. These are essentially mini supercomputers that can dramatically reduce the time to train complex models. 

New Trn1 instances powered by AWS Trainium chips offer the “fastest and lowest cost of machine learning training in the cloud”, providing up to 40% lower price to train deep learning models than the latest P4d instances with NVIDIA A100s. AWS says Trainium chips deliver the highest teraflops (TFLOPS) performance to train machine learning models up to 50% faster than the latest P4d instances. These are huge claims and as soon as I get more details on the test methodologies, I will share them.

Amazon EC2 I4/Im4gn/Is4gen instances featuring new AWS Nitro SSDs 

Today Amazon announced EC2 I3/I3en instances offer Non-Volatile Memory Express (NVMe) SSD-backed instance storage optimized for low latency, high I/O performance, and throughput at a low cost. I3/I3en are storage-optimized instances for applications that require direct access to data sets on local storage like scale-out transactional and relational databases, NoSQL databases, big data, and data analytics workloads. 

The demand for even higher computing performance and faster access to data without higher costs increases as workloads evolve to process more complex reads and write access and larger data sets.

The new I4/Im4gn/Is4gen instances are architected to maximize the storage performance of I/O-intensive workloads with up to 30 TB of NVMe storage from AWS-designed AWS Nitro SSDs.

When I asked AWS for some details on how exactly these are custom SSDs, I received the following “What we’ve learned from operating commodity SSD and NVMe technology is that the FTL (flash translation layer) is the key to providing high performance, performance consistency, and reliability. This firmware component has historically been a source of data durability, security, and performance variability in the commodity storage devices produced in industry. A FTL produced by vendor A can have radically different performance and operational characteristics compared to vendor B, even when the storage media used in each device is exactly the same. The FTL technology we have been building since at least February 2016 (maybe earlier), is the fundamental building block here.” There we have it- AWS is optimizing the FTL for speed.

Compared to the previous generation I3/I3en instances, AWS says you can expect 60% lower I/O latency and 75% lower latency variability. AWS Nitro SSDs tightly integrate with the AWS Nitro System. As mentioned earlier, Im4gn instances feature AWS Graviton2 processors and provide up to 40% better price performance and up to 44% lower cost per TB or storage than I3 instances. Is4gen instances also use AWS Graviton2 processors and provide up to 15% lower cost per TB of storage and up to 48% better compute performance than I3en instances. I4 instances feature 3rd generation Intel Scalable processors (Ice Lake), delivering up to 55% better compute performance than current generation I3 instances.

Wrapping up

When you thought silicon was getting boring and software being the answer to everything – along comes Amazon to turn that idea on its head. Again. And again. And again. AWS best exemplifies potential strategic advantage with its first-part silicon.

I’ve always been fond of companies that understand the economics of choice, and that is something that AWS has done extraordinarily well. AWS has many announcements today that included AMD, Intel and NVIDIA and I think these three companies will remain important to AWS and its customers for a long time.

I have followed AWS for several years, and I’ve been impressed with value proposition commitments made at launches that have come to fruition. I went back and looked at my notes on the past claims made around Graviton and Inferentia. I was glad to see that the claims were more than just marketing, but the positioning held over time, and I know how hard that can be as a former product person. 

So, we have covered Graviton as a first-class citizen for general-purpose compute, Inferential for machine learning inference, and Trainium for machine learning training in this article. 

How do you get started with these AWS silicon innovations?

Well, it is simply to try it out. If you want to move to Graviton but are not sure your application can support it, and this is where the Graviton challenge came from, give it a try. The same is valid on the ML side. If you have built frameworks like TensorFlow or PyTorch, move it over, let it run on Inferentia, or try training on the Trn1 instances powered by Trainium.  

I think you’ll be pleasantly surprised at how simple it is and how quick it is to improve performance. And you might save money while you’re doing it. 

Note: Moor Insights & Strategy writers and editors may have contributed to this article.