Arm Neoverse Keeps On Rolling With Marvell ThunderX3 Design


For those that still think of Arm-based servers as AMD’s Seattle, Applied Micro’s X-Gene, Calxeda, or even Qualcomm, think again. While the specifications of Arm’s Neoverse portfolio looked impressive at launch last year, only time could tell what the implementation of the N1 microarchitecture would look like in the real world. In a word: wow. In two words: holy heck. The next few paragraphs are going to bring you up to speed on where N1 is in the real world and what companies like Marvell are doing to deliver solutions to the market.

First – a refresher on Neoverse

Neoverse refers to the Arm CPU architectural family designed for the cloud, edge and datacenter from the ground up. Neoverse marks the first time Arm married its expertise on performance and power to deliver such intellectual property (IP) to the server market in such a focused way. In the past, the company built off less optimized platforms for partners to design and develop CPUs. This is what gave us the likes of AMD’s Seattle, Applied Micro’s X-Gene and even the not so distant launch of Qualcomm’s Centriq CPUs (yes, I’m aware the list goes on–shoutout to the old Calxeda folks). These very capable companies were trying to design enterprise-grade CPUs with Arm IP that was not enterprise server-grade.

To further complicate, the software (ISV) ecosystem just wasn’t ready to support Arm. Progress was being made (Qualcomm showed real promise in its launch of Centriq), but developing the ecosystem is kind of like cooking roux. It’s a delicate and time-consuming process with no shortcuts.

The launch of Neoverse sent a message to the market that has maybe gone underappreciated. That message is this: Arm is invested in the datacenter, from cloud to edge to enterprise. N1 represents Arm’s designs for higher compute capacity platforms (lots of big cores with supporting architecture), while E1 represents the company’s design for lower-powered platforms (e.g., networking and edge).  

For a deeper dive on Neoverse, check out my coverage here.

Second – let’s not forget Graviton2 from AWS

First out of the gate with an N1 based platform was none other than AWS with Graviton2. To me, this epitomizes what Arm is about-giving organizations the ability to customize compute platforms for their specific needs. This is precisely what the team at AWS appeared to do, building out a platform that delivers the best performance per watt per dollar for its environment. In turn, the company offered up lower cost, highly performant cloud instances to its customers:

·     EC2 M6g – General Purpose

·     EC2 C6g – Compute Optimized

·     R6G – Memory Optimized

AWS delivered on the promise of Neoverse with Graviton2. While adoption numbers are not known, I’ve heard anecdotal evidence that the platform is finding a strong customer base.

For a deep dive on Graviton2, check out my review here

Third – Marvell pulls back the covers on ThunderX3

The specifications on ThunderX3 are beyond impressive. With 96 cores and 384 threads, this is not a type-o. Let it sink in – 96 cores and 384 threads. All stuffed into a monolithic design to enable the lowest possible memory latency, with integer performance that rivals the best of x86, and floating point performance that further positions this chip for the HPC market.

Anybody who thinks this is just another run at the long list of pre-Neoverse chips is crazy. The ThunderX3 from Marvell is for real and should satisfy the range of workloads that power today’s business – from virtualized to cloud-native to analytics and applications with high compute and/or memory performance. Check out some of the performance comparisons Marvell has published:

ThunderX3 performance across cloud workloads

The above chart is a good indicator of how ThunderX3 would support traditional enterprise workloads. I’m especially impressed by the integer and database performance. The improved integer performance demonstrates that servers based on ThunderX3 will support virtualized (virtual machines) and cloud-native (microservices, container-based) environments quite well. Additionally, the database performance shows the strength of the ThunderX3 core, memory and caching architecture.

When ThunderX2 landed in the market, the Microsoft Azure team adopted it as an internal development platform. Given AWS’s success with Graviton2, I expect Azure will feel some pressure to respond with Arm-based instances. I believe ThunderX3 would be a strong candidate for the underlying chip.

HPC is another workload Marvell targets with ThunderX3:

ThunderX3 performance in HPC

The ThunderX2 (released by Cavium before Marvell’s acquisition) has the distinction of being the CPU that powers the highest Arm-based cluster on the HPC on the TOP500 list (Sandia’s Astra – # 198). The Thunder X3 doubles down even further on HPC. In two key measures of HPC performance (floating point and memory bandwidth), the ThunderX3 is particularly strong. The company achieved this performance profile by packing 4 128-bit SIMD units per chip and increasing memory channels (v ThunderX2). I fully expect to see Marvell expand its footprint in the TOP500 list with ThunderX3.

An area where I was surprised that Marvell did not capitalize was in PCIe support. ThunderX3 ships with 64 PCIe v4 lanes per socket. Comparatively, AMD’s EPYC ships with 128 PCIe v4 lanes per socket. Why does this matter? Many servers ship with workload accelerators that help improve performance (e.g., GPUs). These accelerators are slotted into servers and use PCIe to communicate with the CPU. With so many cores delivering so much horsepower, the ability to stuff more accelerators into a ThunderX3 based server seems natural. For some reason, the company chose not to fully implement the N1 architecture.

Fourth – the ecosystem is strong

As the CPU is such a foundational part of any datacenter deployment, the need for a vibrant ecosystem is vital. And, as previously mentioned, ecosystem development is a long and complex process. There aren’t many shortcuts in developing hardware and software support that enables all of the target markets (i.e., datacenter, cloud, edge). Arm’s team began building the ecosystem in the early days of its jump into the server market, and the momentum picked up under the leadership of Drew Henry and Mohamed Awad. The launch of Neoverse signaled a real commitment to the datacenter, and the team wisely focused on strategic partners such as Red Hat, Docker, Kubernetes, MongoDB and NGINX. By going after these more prominent players, competing ISVs support and optimize for Arm. The strategy has worked, as the Arm ecosystem rivals that of its x86 competition. Companies like Marvell benefit, and also contribute to the ongoing success.

Marvell ThunderX IHV & ISV Ecosystem

What is most intriguing about the ecosystem and ecosystem development, in general, is the potential that Arm and companies like Marvell, in particular, have to lead the market in terms of breadth and depth. Because Neoverse (N1 & E1) can be taken and architected by many silicon and hardware partners for particular industry use cases, the software ecosystem that supports these deployments would naturally follow in support. This dynamic can lead, over time, to a range of partners supporting usage models and associated applications that are unrivaled.

Is this a little too much of looking into a crystal ball? Perhaps. But it’s done to underscore the flexibility of the Neoverse architecture, and the potential market reach for developers of industry solutions. Furthermore, Arm-based platforms have a real opportunity in nascent markets such as edge computing.

Fifth – What does Marvell need to do?

There are a few things Marvell has to do to build on the success of the ThunderX brand when ThunderX3 launches. 

1.   Ship it! For all of the goodness in the ThunderX3 platform, none of it matters until the product ships. Given the Corona craziness, the expectations for a major launch any time soon are gone. But given the splash that ThunderX3 made in the press, Marvell should aim for a launch event shortly after a return to normal for the market (whatever normal means, anymore).

2.   Stay focused on what matters. If the team at Marvell listens to pundits like myself too much, it would be easy to take on the world all at once. ThunderX3 has a performance profile that makes it a very good fit for specific markets and workloads, and the potential ThunderX customer has a very specific persona. The company should focus on chasing these customers and workloads and be patient. The wins will follow.

3.   Invest in the entire strategy. Building a good chip is step one. Now the hard work begins in enabling and driving momentum in the market. It is easy for a chip company to ignore the banal world of go-to-market (GTM). Ironically, strong partnerships are crucial to driving sell-thru. Do not underestimate (or under-invest) in the very critical relationships and campaigns that will lead to success in the market.

Closing thoughts

There are Arm cynics that believe the server market will never happen, and the skepticism is understood. It seems like just yesterday that AMD and Applied Micro were fighting for time on the Open Compute Summit stage to push Seattle and X-Gene, respectively (by the way, that was six years ago).  

This is a different era, and a different chip. The wholesale adoption of cloud, and the tools that drive (cloud) optimization have made the processor argument less relevant. IT Operators and Administrators want flexibility and reduced cost. The software developer supporting the cloudified environment is developing in languages and tools that are interpretive in nature and no longer have an underlying architectural requirement or affinity.

Arm’s gains as a compute engine in the datacenter should continue to be incremental in nature. However, areas such as cloud and HPC are ripe for Neoverse and ThunderX3, where chip customization can equal significant performance gains and millions of dollars saved. Stay tuned…