Amazon AWS Ramps New Hardware For AI

AWS’s annual re:Invent conference this year included big news from CEO Andy Jassy and AI VP Swami Sivasubramanian. AWS launched two new hardware platforms for AI and significantly expanded its SageMaker software to simplify running AI models on AWS at maximum performance. When I look at the breadth of AI software, the depth of AI hardware, and the broad range of clients adopting AI on AWS, I believe that AWS has surpassed Google and Microsoft for leadership in cloud AI services. What started at AWS as a simple chatbot, with text, image, and speech recognition, has now vaulted into a leadership position with the world’s most comprehensive cloud for AI.

Figure 1: The AWS stack of hardware choices and software tools is perhaps unmatched by any other cloud service provider.
 AWS

Amazon continues to build its AI software stack, adding over 200 new features every year for the last three years. Amazon SageMaker Studio makes AI as easy as point and click, reducing the need for in-depth AI expertise. 

The scope of client engagements is also impressive, as the slide behind Swami attests in Figure 2.

Figure 2: This slide shows that AWS has over 100,000 customers using AWS for machine learning (AI). 
 
AWS

New AI hardware from NVIDIA, Intel and AWS itself

Amazon has built the industry’s most comprehensive infrastructure services, all based on openness and customer choice. AWS EC2 instances are available with every style of processor and accelerators its clients could desire, including instances with Intel, AMD and Arm CPUs, as well as Xilinx FPGAs, NVIDIA GPUs and AWS’s own AI accelerators. We will focus on the latter in this column.

Last year, AWS launched the home-grown Inferentia inference processor, which appears to be gaining traction outside of internal applications at Amazon. Inferentia delivers excellent price/performance and latency, according to AWS—a purported 35% better throughput at a 40% lower price than GPUs. Notably, AWS did not state which GPUs it put up against Inferentia.  

Responding to the demand for NVIDIA‘s A100 GPUs, AWS also launched the P4d instance with eight A100 GPUs, available in 4000-GPU Ultra Cluster pods. This offering seeks to provide integrated and elastic infrastructure to tackle even the largest training jobs. Using the NVIDIA HGX design, the P4d is a testament to the evolving system design business NVIDIA has been promoting over the last three years as it moves up the value stack. It also demonstrates AWS’s willingness to offer NVIDIA GPUs even as it makes alternatives available to its clients. 

While AWS customers are comfortable with NVIDIA GPUs, Amazon keeps pushing the envelope to find more cost-effective and higher performing alternatives. At re:Invent, AWS CEO Andy Jassy announced two new platforms for training, both of which will be available next year. The first is the long-awaited Intel Habana accelerator, which Mr. Jassy said would provide up to 40% improvement in price/performance. During discussions with Intel, I learned that this claim originated from a basket of benchmarks, accounting for some 80% of the AI work running on AWS.

Figure 3: AWS CEO Andy Jassy surprised his audience when he announced that AWS would support the Intel Habana Gaudi chip. 
 
AWS

Mr. Jassy also announced that AWS was developing the big brother to Inferentia, called Trainium, which would have “the most teraflops of any ML instance in the cloud.” Expect more details when the service is announced later in 2021, probably the second half of the year. Like Google, AWS intends to distinguish its cloud services for AI by fostering the use of its own AI chips, which of course, are only available on its cloud. So don’t expect this chip to be offered for on-premises computing any time soon.

Figure 4: AWS Trainium could be a game-changer. 
AWS

Updated AI software

AWS also announced an extension of SageMaker to support data and model parallelism for distributed training. This enhancement will deliver nearly linear performance gains when adding GPU capacity, automatically spreading the work across the cluster. 

This announcement marks a continuance of AWS’s efforts to enhance the Amazon SageMaker software services, focusing on ease of development and deployment. The company now has tens of thousands of clients using SageMaker, reducing development time from weeks to hours for AI models.

Figure 5: Amazon SageMaker is garnering end-user traction for its ease-of-use while delivering solid performance. 
 
AWS

Often, one has to trade off performance for ease of development; data scientists typically resort to hand coding to an accelerator’s instruction set to maximize throughput and enable parallelism. However, it seems that Amazon has done an excellent job with SageMaker in this regard.

Figure 6: Amazon SageMaker has added data and model parallelism to the AI platform to speed training times, with near-linear performance gains. 
 
AWS

Conclusions

I’ve always known that AWS had excellent GPU infrastructure services for AI, so I watched the Inferentia chip launch with interest. However, the company’s AI tools like Lex and Text seemed somewhat limited. Fast forward three years, and all that has changed, along with my opinion of AWS as an AI powerhouse. AWS has added more choice with Intel Habana—the first to do so, but probably not the last. With Tranium, AWS will soon compete directly with Google TPU and GPUs. Meanwhile, the SageMaker development team has been busy, dramatically expanding and enhancing its easy on-ramp for AI application development and deployment. 

With this steady development and recent announcements, in my opinion, Amazon AWS has vaulted into a leadership position in AI.