Google’s Workload Optimized Infrastructure at Next ’24 – Six Five On the Road

By Patrick Moorhead - April 24, 2024

On this episode of the Six Five On the Road, hosts Daniel Newman and Patrick Moorhead are joined by Google Cloud’s Mark Lohmeyer, Vice President & GM Compute and ML Infrastructure, for a conversation on Google Cloud’s latest innovations and strategic direction, particularly focusing on workload-optimized infrastructure.

The discussion covers:

  • The holistic approach of Google Cloud toward system-level design optimization across essential technology stacks.
  • The extensive range of TPUs and GPUs available through Google Cloud and partners, enhancing infrastructure flexibility.
  • An in-depth look at Google Cloud’s AI Hypercomputer architecture designed to supercharge AI operations.
  • The introduction of Axion, a new custom-designed Arm-based CPU by Google.
  • Exciting expansions in Google Cloud’s storage capabilities, including the launch of Hyperdisk ML for AI applications.

Learn more at Google Cloud.

Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.


Patrick Moorhead: The Six Five is on the road here at Google Cloud Next 2024 here in Las Vegas. There have been a boatload of announcements here. In fact, I think the analyst Book of News was over 200 pages long, including the decks. Dan, where do you even start?

Daniel Newman: Well, you’re going to do that one page roundup post, but you’re probably going to have to use Vertex and Gemini. I saw that workspace app, Pat, and I’m pretty sure that’s what we need exactly right now, is consolidating this thing down to 10 bullet points. But you and I, we love chips to SaaS, we always say that, but there is a special place in our heart for infrastructure.

Patrick Moorhead: That’s right.

Daniel Newman: And it was a really big set of news and announcements in Thomas Kurian’s keynote this morning around that.

Patrick Moorhead: No, that’s right and I love the diversity of options out there for enterprises. Merchant silicon folks, custom silicon from everybody. But let’s bring in our guest. Mark, welcome to The Six Five.

Mark Lohmeyer: Thank you, great to be here.

Patrick Moorhead: What a great… Imagine a guest just pops in to talk about exactly what the preamble was, but great to have you.

Daniel Newman: Yes, I hate when we do that so well. It’s almost like it was done on purpose.

Patrick Moorhead: It’s almost like planned.

Daniel Newman: Mark, you should be smiling. It’s been a big day. Lots of announcements. Of course, the headlines and the news will probably focus a lot on the new chip announcements because Pat and I, we said chips would be cool again. Not potato chips.

Patrick Moorhead: We did. We called it. We called it.

Daniel Newman: We called it early, but in this world, if you’re going to make all these AI things happen, there’s really a bevy of infrastructure considerations that every enterprise needs to be thinking about and Google addressed a lot of those today. Start maybe, let’s start broad. Let’s start macro. Talk a little bit about the systems and infrastructure approach that Google’s taking very holistically, how you’re deciding about what you’re building, how you’re approaching, and how you’re delivering it to customers.

Mark Lohmeyer: Sure, absolutely. First of all, as a long time infrastructure person, I think like both of you, it’s a really exciting time. I think infrastructure really matters again, and I think a lot of this is being driven by our customers looking to drive a digital transformation of their businesses. As they’re doing that, they’re looking across their application portfolio, whether it’s cloud native apps or traditional enterprise apps, and they’re taking those existing applications and infusing AI into them. Plus they’re creating a whole new experiences that are, let’s say, AI native. If you think about some of the agent experiences that Thomas talked about this morning.

You have these amazing new services and experiences, but the interesting thing from an infrastructure perspective is that those AI workloads are placing incredible demands on the infrastructure, frankly, like we’ve never seen before. And so it becomes a real challenge for our customers and for technologists in general. How do you provide the right performance, the right scale, the right cost, the right security for this next generation of AI workloads? And so that’s a problem that we’re keenly focused on and looking forward to sharing more about that with both of you today.

Patrick Moorhead: I’ve heard some phraseology used this week around workload optimized that’s part of your strategy and maybe the name says it all, which is, “Hey, I’m optimizing specific infrastructure for a certain workload.” And I also know that when it comes to, let’s say compute, you have your own version, whether it’s the TPU or your new Axion that you brought out today, but illustrate what workload optimized infrastructure means.

Mark Lohmeyer: Sure. As you said, it’s really all about working back from the needs of each and every workload. And so we do that for all those different types of workloads that I talked about before, but probably the best example of that is AI specific workloads and there we have our AI Hypercomputer architecture. The core idea here is we bring together performance optimized hardware across compute storage and networking on top of that open flexible software, so things like support for all the major AI frameworks like Jax and PyTorch, orchestration frameworks like GKE on top of that, that really help you maximize the utilization of that underlying hardware.

And then on top of that new flexible consumption models that are specifically designed for AI workloads. You might’ve heard Amin Vahdat speak earlier today about dynamic workload schedule, which is a great example of that. And so we’re really enabling that architecture at a systems level end-to-end integrated from top to bottom, but also open. We also embrace a rich ecosystem of partners that can come and work with us at any level of that stack that makes sense for them. We think ultimately it’s this approach that allows us to deliver against those results that our customers care about.

Patrick Moorhead: It’s amazing how far the cloud has come. I remember 15 years ago when it was anything you want, as long as it’s one size of VM and one specific processor, and now it is just getting so much more mature, which makes sense as you get down here, but that is what enterprises want, is they want flexibility. They don’t want to over-provision, they don’t want to under-provision and things change and they want the system to be able to make recommendations to them on the fly of maybe doing something different that could maybe save them money or give them better performance and results that impact their applications.

Daniel Newman: We’re seeing the mass customization moment, the Model T, you could have any color you want as long as you wanted it black-

Patrick Moorhead: 15 years ago, yes.

Daniel Newman: In the cloud you could have, as long as you single VM, single processor, you could do anything you want in the cloud. You started down this path, you mentioned the hyperscale…. Hypercomputer, sorry, got to get that right everybody.

Mark Lohmeyer: Got it.

Daniel Newman: But the hypercomputer and what you’re doing there open, but one of the things that’s really interesting in the cloud space right now, Mark, is the silicon diversity. Obviously today in the TK keynote, he mentioned Intel, he mentioned NVIDIA, he mentioned, and of course the TPU. You have the new Axion. How are you guys thinking about overall how you propose, how do you lead, how do you make everyone happy, and how do you decide? I know you’re going to say workload optimized, but I’d love for you to deal. Just give me a little bit, give us a little bit more on how you’re thinking about that, keeping all those people happy.

Mark Lohmeyer: Let me go a little deeper there. Definitely we work back from the customer needs and the workloads they have, but within that, we see some very specific workload types and how that maps the underlying infrastructure. First of all, you probably heard Thomas mention our announcements around our support for the next generation of Intel, Xeon, fifth generation Emerald Rapids processors. We are the first cloud to support those publicly, which is great.

Daniel Newman: It’s great.

Mark Lohmeyer: But it’s not just about those CPUs. It’s also about how we deliver them with software on top as services to our customers, and so we announced our Gen 4 VM families around those Intel processors. The C4 instance type is designed with a slice of hardware architecture. What that means is we carve up that CPU into dedicated slices of hardware and then we provide those to the applications. That enables us to give the most highest level of performance, the most consistent level of performance and we can also deliver a very highly controlled maintenance experience, which for traditional enterprise apps, SAP, et cetera, is obviously critical.

That’s a fantastic option for the most demanding enterprise workloads, let’s say for x86. But then we compliment that with the N4 VM family, and N4 is designed at a software level to balance cost and performance. And so we have software capabilities that allow us to maximize the underlying utilization of those CPUs, and then we pass on the savings from that utilization gains in the form of lower prices for those instances to our customers. Now if you think about a customer who has a broad range of general purpose workloads, they can mix and match those C4 and N4 instances to give just the right performance to each application type while driving down the total cost effectively of the fleet, so really exciting things we have with Intel. But then we also of course announced some big news in the ARM space.

You heard Axion processors, and these are the first Google designed custom ARM processors. Obviously in close partnership with ARM, we’re leveraging our latest Neoverse V2 core designs, and so we’re really excited about bringing those Axion based instances to Google Cloud as well. There’s an ever expanding portfolio of ISVs and customers that have apps that are optimized for ARM, and now they can bring those apps into Google Cloud in a very effective way. Maybe a long-winded way of answering your question, but back to your earlier point, there’s just this broad range of workloads and we want to have the right option for each and every customer.

Daniel Newman: It’s definitely not going to be a zero-sum. Doesn’t it feel like Pat, and you probably see it too across the interwebs, just that everything is like, “NVIDIA is going to be all of AI.” We’re like, “Okay, yes, nothing will be on ethernet.” Or everyone’s going to homegrown chips, nobody’s going to ever use Intel again. It’s like, “Yep, that’s not right either people.” And so it is really interesting though to hear how you’re thinking about it.

Patrick Moorhead: Congratulations on all that. I did want to do the double-click on Axion. What was the need there that really got you to do this? Because this is, sure, developing an SOC is less expensive these days, but it’s still a tens of millions of dollars, if not more commitment per design. By the way, that’s my data, not yours, but I remember what it was when was when I was at AMD, it was in the billions. What’s driving you to do that?

Mark Lohmeyer: Google has a rich history as you of designing custom silicon, custom infrastructure specifically for certain workload types. GPUs are a great example. Video coding is, or others, even the Tensor processors that go into your Pixel phones are another great example. In some respects, this is the next major step in that journey of something that Google’s been doing for many years now. But I think the thing that makes this unique is we’re really tackling a broad problem space, which is general purpose data center computing infrastructure.

We saw an opportunity leveraging these great capabilities from ARM combined with our engineering prowess to create CPUs that as you heard, 30% better performance than the best ARM-based CPUs in the cloud today. 50% higher performance in comparable x86 based instances, 60% better energy efficiency. And so for the workloads that are optimized for those environments, you think about interpretive languages like Java or Go, you think about web servers or app servers. This is a fantastic benefit we can offer to our customers, and so given everything they have to do to serve all of their workloads, we felt it was the right time to provide this type of option that gives them the ability to drive higher performance and lower costs together.

Patrick Moorhead: Exciting stuff.

Daniel Newman: It’s a trend line. You and I have talked about it so much, Pat, the trend line towards the cloud providers and what ARM has enabled to the ecosystem. Of course with all the capabilities, the EDA providers, there’s so much more simplicity that enables, and again, it’s disrupting, creating innovation, creating competition. We love that as more companies for us to work with. Of course, making everybody happy on volume is always a trick, but you’ve been in the infrastructure business a while, you know how that goes.

Mark Lohmeyer: Well, and if you think about the trend of computing. Having more options and lowering costs actually expands the market.

Daniel Newman: It does.

Mark Lohmeyer: When the cost goes down, you can serve more applications, you can serve more consumers, you can drive more business outcomes. And so we think these capabilities ultimately expand the market for all of these different types of options that are out there.

Daniel Newman: 100%. If the AI market is not done, the compute market is not done. We’re not in the last instance of software. The one person billion dollar companies are coming, of course, but-

Patrick Moorhead: I’m waiting for it.

Daniel Newman: I want… Well, can we do 2?

Patrick Moorhead: Okay, we can do that.

Daniel Newman: But it’s got to be 2 billion.

Patrick Moorhead: Okay. Yeah, we’re in.

Daniel Newman: So storage. I know nobody talks about it anymore.

Patrick Moorhead: We got to.

Daniel Newman: It’s part of the quadrangle that I talk about on every podcast-

Patrick Moorhead: It is. It’s a big thing.

Daniel Newman: But storage quietly became boring old infrastructure, but it’s getting to be cool again, because storage, data, data management, you’re seeing all these new software-defined storage, where you got the way we’re tagging data metadata, creating it where storage and data layers are getting really close and enabling the AI era. You made a bunch of announcements on storage. Talk a little bit about how Google’s thinking about innovation and storage.

Mark Lohmeyer: Sure. We’re really excited about some of the things that we announced from a storage and data perspective here at Google Cloud. Actually as in prior life, I spent some time in the storage industry as well, so this is an area that’s near and dear to my heart too. I think one of the interesting things, if you look at AI workloads in particular, you think about those GPUs, those TPUs, those are incredibly data-hungry resources and also incredibly valuable resources. And so you want to make sure those hungry and valuable resources are always very well-fed with high-performance data and high-performance storage because that determines the performance of serving these models, that determines the cost in many cases of serving these models, so storage matters a lot.

And so what you’re seeing us doing is across all aspects of our existing storage portfolio, we’re adding new capabilities that are designed specifically to optimize those resources for AI. For example, with File Store, we added caching capabilities that allow us to serve a storage in a more high-performant way to those GPUs and CPUs. With our Google Cloud storage, which is an object-based storage system, we added a file system interface on top of it that gives the customers the low-cost benefit of Google Cloud storage, both an easy-to-use file system interface.

So we’re making enhancements to our existing products, but then we’re also creating entirely new products specifically for AI. We announced a new product called HyperDisk ML. This is a block storage service optimized specifically for AI workloads. If you think about inferencing, you might have hundreds or thousands of GPU or TPU instances that are inferencing workloads, serving workloads. And so you want to have a shared storage pool that you can pull from to load models into those instances and then serve them cost effectively at high performance. That’s exactly what HyperDisk ML does. As a result, we’re able to deliver 12 times higher performance than previous approaches with that technology.

Patrick Moorhead: Now, I love that the innovation, and by the way, for those in our audience who haven’t heard about the quadrangle. My premise is that over the last 50 years of computing, every major milestone or major inflection point puts some pressure on the quadrangle, which is memory, storage, network and compute. Sometimes we didn’t have enough compute, sometimes we didn’t have the right networking, the memory subsystem and then storage, but all four of them need to be working in coordination and we’re seeing a ton of innovation on the storage side.

I’m glad you talked about the caching portion because sometimes that gets forgotten about, but when you’re trying to increase throughput, increase performance, you can’t necessarily do it natively, so caching is the next best thing. Heck, a HBM on a GPU is essentially caching as opposed to going into system memory over PCI, so it makes sense. Final question here, Mark, is I know you love all of your new announcements, your new babies the same, but what are some hidden gems? And maybe you’ve already talked about that, but maybe we haven’t heard about that is going to make a huge impact.

Mark Lohmeyer: Well, so one thing we haven’t really talked about much yet, but to your point of having a perfectly balanced system across all those aspects of infrastructure is the network. The network is absolutely critical. It’s always been critical, but probably even more so in the age of AI, and we’re doing a bunch of really exciting things here too. Let me just call out two I think are interesting. The first is if you think about AI model inferencing or serving, you’re going to have a set of, like I said before, hundreds or thousands of compute instances serving those models. But you want to make sure that the requests that are coming in are nicely balanced across that pool and the pool is perfectly sized based on the amount of workloads that are coming in, and then we spread those workloads across that instances. And so with our load balancing capability and networking, we’re able to do that.

We can perfectly size that pool of instances and then spread those requests across them to drive up utilization, drive down costs, and improve performance. That is pretty key. And then the second is our cross cloud network, and so this is run by a colleague of mine, Sachin Gupta. It’s a critical part of what we’re doing in AI as well, because if you think about data and data gravity, some customers are going to have data they want to keep on-prem, some other customers might have data that sits in some other cloud, but they want to be able to get access to that data and then be able to use it in the context of Google AI cloud services And so our cross cloud network could create that high performance, highly secure data fabric that allows access to that data. You can leave it where it is if you want, or you can bring it into Google Cloud. We give you that flexibility, and so those are maybe two hidden gems that we think are really going to be really important to our customers.

Patrick Moorhead: I’m really glad you brought up networking in there, and I have to ask, is this ethernet based subsystem that you’re doing this on?

Mark Lohmeyer: Primarily based around ethernet, and so we’ve got our Jupiter fabric network, all these wide area networks are obviously driven. If you look at some of the networking architectures within the TPU clusters themselves, those are actually optically circuit switch networks, so OCS networking. I probably should have mentioned that earlier, but that is a really key technology because it allows us to take, if you take a TPU v5p pod, we’re able to network together 8,960 TPU chips into what is a single virtual larger AI computing cluster using OCS. We’re able to do that not only with high performance, but very high levels of efficiency, so we can drive up to 40% better energy efficiency because we’re using optical circuit switching.
We’re picking the right networking technology based on the topology and the needs of the workload, but a lot of innovation there too as well.

Patrick Moorhead: I love it.

Daniel Newman: There’s optical and there’s ethernet, and there’s CPUs and GPUs. There’s, by the way, more than one cloud. I know. Nobody wants to admit that. I like that you talked about-

Patrick Moorhead: I’m super glad he brought that up. Yes.

Daniel Newman: Sometimes it’s funny because you live in your own little world, but the enterprise looks exactly as you described it. We spent a lot of time talking about that. Mark, thanks so much for joining us on The Six Five. This was a lot of fun. We really appreciate it. Let’s do it again sometime.

Mark Lohmeyer: Thank you too. Sounds good.

Daniel Newman: Thanks. All right, everybody, thank you so much for tuning in. We’re here at Google Cloud Next in Las Vegas, Nevada. 2024 is the year. It’s been a great show. We appreciate you all tuning in, but for this episode, it’s time to say goodbye. We’ll see you all later.

Patrick Moorhead
+ posts

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.