AMD took a big step forward today in the datacenter with its launch of the 2nd Gen EPYC processor and platform. I attended the AMD EPYC Horizon launch in San Francisco, CA, and want to hit the highlights. I will focus more on the high-level messages and meanings while Moor Insights & Strategy server and compute analyst Matt Kimball will do a deeper dive on the technology here.
First off, the 2nd Gen EPYC platform was a bigger leap forward than I had previously expected three months ago. AMD improved most of its Gen 1 shortcomings like single-thread performance (+15%) and core scaling and added new RAS (uncorrectable DRAM error entry) and security (Secure Memory Encryption, Secure Encrypted Virtualization, 509 keys) capabilities in addition to substantial, multi-core performance gains.
Early indications from AMD and its OEM partners show that 2nd Gen EPYC is performing exceptionally well and has advantages in many different workloads, but not all workloads. AMD looks strong in Hadoop RT analytics (AMD says world record), Java throughput (AMD says 83% better), fluid dynamics (AMD says 2X better), and virtualization (AMD says up to 50% lower TCO, Twitter said 25%). Overall, AMD says it has achieved 80 world performance records. Intel will likely have advantages on low latency ML inference workloads that take advantage of Intel’s DLBoost instructions and will also look very good in in-memory database workloads utilizing Optane DC.
Typically, before coming to any final conclusions on specific numeric advantages though, I would like to see detailed, third-party performance benchmarks, including application-level ones, and hear from enterprises on their performance experiences. Nearly every one of AMD’s event benchmarks were published by OEMs and ODMs like HPE (HPE says “37 world records”), Gigabyte, Lenovo (Lenovo says “16 world records”), and Supermicro not AMD. Twitter said it improved its TCO by 25% going EPYC. Surprisingly, there are more third-party benchmarks by the likes of ServeTheHome here and AnandTech here.
AMD has done a great job with EPYC’s chip architecture, single-core IPC, and multi-core scalability leveraging its massive L3 caches and its hybrid, multi-die architecture with 2nd Gen Infinity Fabric.
Enterprises care about roadmaps
In addition to performance, enterprises want the confidence that AMD has a high probability and compelling 3-5-year roadmap. Enterprises aren’t interested in point solutions, and they want long-term partners. AMD has already proven itself to public cloud providers and HPC and needs to shift that momentum to enterprises.
As part of that future roadmap, AMD needs to disclose how it will optimize latency-sensitive, ML inference workloads as well as traditional big data and how it stays ahead of Intel. And do that in a confident, low-risk, high-confidence fashion. While two years ago, I felt AMD had baggage from withdrawing from the market with Opteron, I feel that fear has mostly subsided. I should note that this is a new AMD with a heavier focus on execution and not big and fast wins, and unlike the Opteron days, isn’t risking everything with a brand-new architecture like Bulldozer.
AMD didn’t share a lot of future details, but it did share that Milan with Zen 3 “design is complete” and Genoa Zen 4 is “in design.”
An array of CSPs, OEMs, ODMs, end customers, ISVs, and IHVs showed their support for the new EPYC platform. It was an impressive showing at the event with support from CSPs (Azure, GCP), OEMs (Cray, Dell Technologies, Hewlett Packard Enterprise, Lenovo, Cisco, H3C) with “2X the platforms”, end customers (Google, Twitter, U.S. Air Force), ISVs (VMware, Canonical, RedHat, SUSE), ODMs (Gigabyte, Quanta, Supermicro), IHVs (Broadcom, Micron, Xilinx, Samsung). Ecosystem supporters are important, especially when you are out resourced like AMD.
HPE CTO Mark Potter said it has three platforms available delivering “37 world records”, with twelve in total in the future. Dell EMC Infrastructure CTO Robert Hormuth didn’t give a lot of model or SKU details, but said it “will expand its offerings in the fall.” Lenovo Datacenter COO Doug Fisher announced two new EPYC platforms with PCIe 4, big-time NVME, high-speed memory, and 16 world record performance scores including SPECPower, coining “the world’s most energy-efficient server.” Fisher also talked about full solution stacks for smart cities and security. Microsoft Azure announced new HPC instances (HBv2) and virtual desktops instances, and general-purpose series (Da_v3 and Ea_v3) available for preview today. Google’s VP Engineering Bart Sano says it is the first to deploy Rome in its production datacenter. Why? “High core count, high performance shared memory, and PCIe4 for our accelerators.” Sano ended by saying EPYC will be available in GCP later with the ” largest VMs ever and for HPC.”
Matt Kimball and I will be writing and researching a lot more on the partners over the next few weeks.
Google is an interesting end customer who has exhibited it is willing to go big if it sees better performance and price. Google was AMD’s largest Opteron customer back in the day. I will be keeping my eye on this one. While AWS and Alibaba Cloud were not at the event, AWS was there on video, and I fully expect them to support the new platform. I would have expected a few non-cloud enterprises and resellers to be on stage or in the solutions booth, but maybe that is in the cards later.
Breadth of solutions
AMD and Intel are taking different approaches to the datacenter; AMD’s is narrow, Intel’s is wide. AMD is focused on optimizing many workloads but less in number than Intel. AMD does not provide as many elements of the solution to ODM and OEMs.
For instance, to a carrier, Intel can offer a CPU, motherboard, NAND, Optane DC, networking cards, silicon photonics, FPGA acceleration cards, security accelerators, and an optimized software stack. In 2020, Intel will add two Nervana ML cards, one for ML training and one for ML inference and a datacenter GPU. I bring this up as those who embrace AMD will be OK with piece-parting solutions together, and those who choose and Intel solution will want the solution wrapped in a pretty blue bow.
Not a walk in the park
I believe AMD will be more successful with CSPs and HPC with 2nd gen EPYC running the same sales, marketing, and ISV plays it did with 1st Gen EPYC. The enterprise is different. AMD is investing a lot more in the enterprise value chain (ISV, field sales, resellers, ads, SIs) over the last two years but nowhere to the degree of Intel has been investing the last decade. AMD will need to lean on OEMs like HPE, Dell, and Lenovo, none of whom have a recent track record of creating demand for AMD. I need to spend a lot of time better understanding of AMD’s go-to-market as the product speaks for itself.
Every ODM, OEM, CSP, and enterprise I have talked to in the last ten years wants more competition in the space to accelerate innovation and lower costs. With that said, none of these customers listed above would adopt AMD if it didn’t have some advantages.AMD gained low, single-digit share with 1st Gen EPYC but I expect the company to gain more (up to approximate 10%) market share with 2nd Gen EPYC with CSPs, enterprises, and HPC. Enterprises don’t mass deploy any first-gen product, they didn’t deploy 1st Gen EPYC, but they will deploy the 2nd Gen EPYC. Gaining that share won’t be easy. Intel is a fierce competitor that has woken up to the threat. Intel has a wider datacenter product array it can leverage, delivers more of the solution, has the processor AI inference lead, holds 10-20X the sales and marketing resources, and will have its 10nm solution out in a year. AMD needs to aggressively show it has a compelling roadmap and step on the gas for the next year to steal as much share as it can. This is a different AMD, one that I don’t think you can bet on to fail.