Intel Gaudi Performance – Beats NVIDIA?

By Patrick Moorhead - April 1, 2024

The Six Five team discusses Intel Gaudi Performance – Beats NVIDIA?

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.


Daniel Newman: All right man. Listen, I tweeted something out the other night. I think this is probably where this topic came from, and I kind of said something along the lines of, “We don’t talk about Gaudi enough.” The last several months there’s been kind of this weird gap that’s been created. We talk about NVIDIA, H100 now the B Series and the Grace Blackwell, and then we talk about homegrown silicon being provided by the cloud providers and it’s NVIDIA AMD. It’s like NVIDIA has AMDs, they’re looking at them and that’s the competition. And then over here we’re looking at… But we do talk a lot about accelerators. You actually had a great tweet this week about ASICs and the need to create standards so that we can scale the development in that particular area, Pat. But one of the things that we haven’t talked a lot about is Intel and whether or not… I know we talk about 2025 and their potential GPU, but Pat, we’ve talked a lot on the show about how ASICs and even the XPU can be very competitive in certain cases to NVIDIA.

And this week Intel put out a newsroom post, this probably isn’t like a 20 minute discussion, but it’s a few minutes here. And they basically talked about the MLCommons putting new results industry standard MLPerf benchmark for inference. And it basically noted that the Gaudi2 and fifth gen Intel with their AMX, which is the accelerated extensions essentially can be a very good alternative to H100s for generative performance as it relates to inference, Pat. And I guess I just was thinking to myself when I’m looking at this is, “Gosh, why does nobody talk about Intel? Why is Intel being written off?

Now, of course, I can give you a quick argument of that because they haven’t talked about enough big cloud wins yet. And I think the fact that we’ve heard about Gaudi, we’re seeing its performance. By the way, this is a really strong performance with their Gaudi2 and guess what’s coming?

Patrick Moorhead: Oh gosh, Gaudi3.

Daniel Newman: Gaudi3.5. No, I’m kidding. That’s GPT. Gaudi3. So the point is, with their almost last generation, you know how we love to do the generations thing, Pat, we love to talk about, “Well gosh, NVIDIA’s chip that isn’t even shipping yet is kicking AMD’s butt. Well, hold on a second. H100s were outperformed in many ways by the new AMDs. And now yes, NVIDIA’s answered that with a product that’s going to ship in the future, but same thing here. So now we have an Intel product that’s coming that’s more performance in certain inference cases than the NVIDIA chip. Now, said that Pat, you and I think have to be very, very clear because we know a lot of people in the chip space listen to us, this is not a GPU, it does not have flexibility and programmability like a GPU, but in cases where inference in language is super important, this is a really efficient performance alternative with strong specs, strong metrics, and they talked about it on LLaMA, on Stable Diffusion, on Hugging Face text generation, so on a number of different workloads, this particular chip performed.

So the moral of my story is the world loves to write off intel, and I’m sure Pat Gelsinger loves what he calls the permabears. I just think between now the Gaudi3 and then ’25 when they start to deliver their GPUs, if there really is a 250 and upwards of potentially $400 billion TAM for GPUs over the next four years, five years, is what we’re hearing, I think there’s a real shot. Intel is going to get a piece of that business. And I know I’m a little too positive on Intel, I hear it sometimes from people. But people like to always tell me why they’re right and I like to mark it as this date, 3/29/2024 when I told them I think they’re wrong.

Patrick Moorhead: Wow, you left me a little oxygen. Let me take a little bit of a difference. So first off, the claim was not that it was better performance with Gaudi2, it was that it was best price performance and it’s 40% more. And when I stand back and say, “Hey, would I shift for 40%?” I probably wouldn’t if I needed three years of different types of models, but if it’s a steady state workload, 40% is a ton. The one thing that got a little bit buried in the lead was that Intel Xeon was the only processor tested or SOC tested with and like you said, AMX extensions and think of AMX as a little accelerator that sits on the Xeon SOC. And I think that’s a major accomplishment in that we didn’t see anything from AMD. Now, a MD does not have acceleration capability like AMX. It does have a massive FPU, and then a massive matrix engine that’s leveraged by SSE2, but that’s very different and less efficient for many workloads compared to AMX.

Dan, we have debated on this show that if only two people showed up for a gunfight, was there really a gunfight? And one thing I did appreciate from MLCommons, this is David Cantorc xxd, we’ve all been on briefing calls together and he said, “Submitting to MLPerf is quite challenging and a real accomplishment Due to the complex nature of ML workloads, each submitter must ensure that both their hardware and software stacks are capable, stable, performant for wanting these types of ML workloads.” And that message was directed at Dell, Fujitsu, NVIDIA and Qualcomm that submitted data center focused power numbers, but those power numbers had to be run while you’re running the ML inference out there.

So I think first of all, it’s good to acknowledge why others weren’t on there, but I still kind of question that if you only have two people show up for a certain benchmark, what’s the value of that? So, I mean we’ve already debated that I think on these MLCommons benchmarks, but I think it is a reflection of the difficulty of AI in totality. So Dan, let’s move to the next topic.

Daniel Newman: Can I say one thing?

Patrick Moorhead: Please.

Daniel Newman: I’m glad you called it out. I want to make sure I’m correct when I said it, I said on par, not equal, but I said that, and I believe it’s A100s, that it actually outperformed H100s that it was near par. So I should say, “Near par” not, “Outperform.” If I said outperform, I was wrong. I’m correcting myself.

Patrick Moorhead

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.