Broadcom and AMD’s Open Architecture Vision for AI – Six Five On the Road at Dell Technologies World

By Patrick Moorhead - May 28, 2024

On this episode of the Six Five On the Road, hosts Dave Nicholson and Lisa Martin are joined by Broadcom and AMD, featuring Jas Tremblay from Broadcom and Robert Hormuth from AMD for an in-depth conversation on their collaborative efforts in supporting Open Architecture for AI. This discussion sheds light on how their partnership is setting the stage for more power-efficient and scalable AI solutions.

Their discussion covers:

  • The strategic vision behind Broadcom and AMD’s support for Open Architecture in AI.
  • The benefits of Open Architecture for businesses and developers in the AI space.
  • Achieving power efficiency in AI applications through collaborative innovation.
  • The role of scalability in future-proofing AI deployments.
  • Insights into Broadcom and AMD’s roadmap for AI development and support.

Learn more at Broadcom and AMD.

Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Lisa Martin: Hey, everyone. Welcome back to Six Five On the Road from Las Vegas. Lisa Martin with Dave Nicholson, covering Dell Technologies World 2024, the AI edition. If you saw Michael Dell’s keynote the other day, you saw a lot of partners featured and we’ve got some great partners on the program next, and get this, show and tell! I’m pleased to welcome Jas Tremblay to the program, general manager at Data Center Solutions Group at Broadcom. It’s great to see you.

Jas Tremblay: Thank you.

Lisa Martin: Robert Hormuth joins us as well, corporate vice president, Architecture and Strategy Data Center Solutions group at AMD. Great to have you, Robert. Thank you for joining us.

Robert Hormuth: Thank you. Absolutely.

Lisa Martin: So, Jas, great to see you. Talk a little bit about the history of the Broadcom-AMD relationship. I know it’s deep, it’s broad, it goes way back.

Jas Tremblay: Yes.

Lisa Martin: Give us that overview.

Jas Tremblay: So, it’s existed for decades and it’s based on this concept of open ecosystem in the data center, but it was, our collaboration started in a compute-centric way. AMD doing the CPU, us doing peripherals and connectivity. And now in an AI world, we’ve decided to take it to the next level, where we’re focused on the connectivity, AMD’s focused on the compute, the acceleration. And it’s extremely complementary.

Lisa Martin: Robert, bringing you into the conversation. Talk a little bit about, is it the MI300, the AMD MI300? Give us the lay of the land of what that is and why it’s significant.

Robert Hormuth: Yeah. The MI300X is a product that we announced last fall. It’s a general purpose GPU targeted at AI. So, the world of training, inference, LLMs, ChatGPT, OpenAI, all of that. All the world of AI that’s abuzz right now. So, it’s right at the center. And it’s significant in that it really brings choice to the world of AI. We came from a world where there was a limited choice and the MI300X really brings a performance choice to the market and it’s enabled us to really use our partners like Broadcom and Dell, to bring it to the market to provide choice in this open ecosystem, which is one of the real principles for AMD. We’re all about open source, open industry standards, open partnerships. So, we’re just continuing that journey with our great partners at Broadcom and Dell.

Jas Tremblay: Same set of values.

Lisa Martin: Perfect. That’s great.

Dave Nicholson: Yeah. We, in fact, right now, as we speak, we have our documentarian and her film crew walking the expo, the solutions expo here. And one of the things they’re doing is they’re looking at some reference implementations that were built using AMD and Broadcom technology. And so, it’s really interesting. It fits perfectly under this umbrella that Dell has created this idea that there isn’t just one way to do things, and fit for function is the name of the game.

I’ve been hearing, really in 2023, it felt like we were coming out of the tail end of the fear of missing out era, where people were running in a single direction thinking that there was no alternative. Do you think we’re beyond that now? Do you think people are ready to be more rational about the decisions they make when they start looking at things like networking technology, CPU and GPU technology? Robert, start with you. Do you think we’re there?

Robert Hormuth: Yeah, no, I think we’re getting there. We’ve partnered with Broadcom on some of the most complex AI solutions to bring AI at scale using standard Ethernet. We launched a partnership with Broadcom to take that. We think Ethernet’s the … Never bet against Ethernet kind of mantra here.

So, we worked with Broadcom too, on getting together with Broadcom on, “Hey, can we improve it?” And so, we were co-founders of the Ultra Ethernet Consortium, which is really to address it in a standards way of addressing RDMA, congestion management, packet spraying, and making sure that we have a robust ecosystem, because we both have the same values of an open ecosystem to advance everyone in AI.

Lisa Martin: Those shared values are incredibly important for partnerships. In fact, they’re really business critical. Share a little bit about some customer successes that you’ve achieved bringing those shared values to an organization, whether it’s a healthcare organization, a financial services, who’s really leveraging the open architecture for AI to accelerate its business. Any customer come to mind?

Jas Tremblay: Well, I think one of the shared values that we have is who our customers are. And we have tremendous appreciation for the role of the OEM. For example, us, yesterday we launched products at Dell Technology World. We know that for our end customers to be successful, we need to live in that platform, live in that ecosystem, and that’s where we come together.

So, it’s not just about us working together as a semiconductor and software companies, but really working with the platform companies because these are complicated clusters to bring to market. And it’s not just about, “Okay, let’s come up with some cool chips.” You have to take these from the lab to deployment at scale, and that’s really part of the tough item.

Lisa Martin: Definitely. And you have some items, you both brought some show and tell. Can we dig into that?

Robert Hormuth: We brought a bunch of stuff.

Lisa Martin: Jas brought some stuff that was on main stage with Charlie this morning during the Dell Tech World keynote. What do you have for us and why is it significant?

Jas Tremblay: So, let’s start with a couple of things here. So, this is a Tomahawk 5 51-terabits per second chip, monolithic die, single chip. It’s the only 51-terabit Ethernet switch in the market. This is the device of choice for AI infrastructure for the top hyperscalers. And Dell announced that they have this now available in their portfolio. So, that’s the first thing that we have.

Dave Nicholson: So, before you go a little bit further-

Jas Tremblay: Yeah, please.

Dave Nicholson: … really quickly, so how does this … So, sort of put flesh on this bone in terms of the form factor? So, this is in a … Call it a pizza box.

Jas Tremblay: Pizza box.

Dave Nicholson: How many ports are available or what does this do for me?

Jas Tremblay: This has 128 ports of 400 gigabits per second. So, you could use, example, half those ports to connect at 800 gigabits per second to other Ethernet switches and the rest to aggregate AI servers. Now, the tricky part is with the compute server, you’ll have one or two Ethernet NICs. In an AI server, you’ll have 8, 10, 12 of them. So, you got front-end networks, back-end networks.

There’s a lot more connectivity required. So, you need a lot of bandwidth, low latency, and so forth. So, you’ve got this device and then you’ve got your AI servers where you put this guy. We launched this yesterday, full production, first five nanometer, 400 gig Ethernet NIC. And these two guys go together. So, they actually share a lot of the same IP, the same Ethernet MAC in both of them. And if you put them together, you can actually go five meter over copper.

Dave Nicholson: Okay.

Jas Tremblay: Yes. So, we’re really proud of this one. It’s less than half the power of the competition’s solution with better performance, because it’s really optimized for AI infrastructure. And Dell launched this in their AI servers available in July, but the product’s available now.

Dave Nicholson: I had the opportunity to moderate a technical panel discussion yesterday morning with Samsung, talking to their DRAM, high-bandwidth memory, and SSD folks, and I asked them, “What’s your main concern in terms of bottlenecks?” And they said, “Networking. We’ve got to focus on networking.” And someone chimed in with also, “And power being an issue.” You mentioned some of the stats about power in the future, how important that’s going to be.

Jas Tremblay: Yeah. Jeff Clarke in his keynote this morning, he had an interesting metric. He said by 2030, the power consumed by data centers would be 8X what it is today. So just, what? Three years ago, we were in the pandemic, it was a chip shortage. Now we’re in the era of power shortage. And we need to be hyper-focused on doing everything we can to optimize around power. Yeah, it’s going to be a tough time to get power for all those data centers.

Dave Nicholson: So, AMD’s going to start building nuclear power plants?

Robert Hormuth: Yeah. No.

Dave Nicholson: And then we’re going to say, “Okay. Well, show me what you do around here.” Because I was going to say, because if that’s plutonium, we’re not-

Robert Hormuth: Yeah. So, speaking of power, this is actually a CPU. This is our EPYC Genoa CPU. So, 96 cores. That’s the highest, most performant, highest efficient CPU on the planet, about 2X over the competition. So, in terms of this notion of power and space constraints, back into the math, there’s about 100 million five-year-old servers out in the world today, that consume about 45 gigawatts. If we were to replace those 100 million with 21 million of these, we free up 30 gigawatts of power for AI. Free up 30 gigawatts.

Dave Nicholson: That’s interesting.

Lisa Martin: Significant.

Robert Hormuth: Significant. And so, one of the best tools in the data center right now is that notion of we need to do an infrastructure refresh to make room and space for AI. And this is the weapon of choice is the AMD EPYC Genoa, because we can consolidate five to eight old servers down. And if you do the math, every 10 or so that I take out with one, I now have space and power for an AI server. And so, very significant tool. I’m going to have to ask Jas to help here, but the way all these go together. So, in your AI server-

Dave Nicholson: You need extra hands.

Robert Hormuth: I need some extra hands.

Dave Nicholson: This is great.

Robert Hormuth: So, in an AI server, you have an EPYC with a PCIe bus that connects to this NIC, and then out of this NIC goes a cable to Jas’ switch.

Dave Nicholson: Yeah. Right. Right.

Robert Hormuth: So, partnerships for life, right here.

Jas Tremblay: Yes. Yes.

Dave Nicholson: Hold that up for a second. How many of those, roughly, give me a range of how many of those can connect through just one of these?

Robert Hormuth: So, we have 128 lanes of PCI Express here. And then we can use … Jas has some other toys in his arsenal called PCIe switches, that allow us to do additional fan out, to get more of these to connect to more of those.

Dave Nicholson: Okay.

Robert Hormuth: Now, on the other part that we don’t have is the GPUs that would connect to the EPYC processor as well, or through the PCIe switches. So, they’re all inadvertently or directly … All of our stuff directly connects together to build systems and solutions.

Dave Nicholson: Yes. Amazing.

Lisa Martin: Wouldn’t you know? Wow.

Dave Nicholson: It’s amazing.

Robert Hormuth: Yeah.

Dave Nicholson: And you achieved that by fighting over standards and disagreeing on everything. Is that how you achieve that?

Robert Hormuth: No.

Dave Nicholson: No.

Robert Hormuth: No. We work together to form on PCI Express on the CXL Consortium, on the Ultra Ethernet Consortium. I’ll let Jas talk more about this, but in the December AI event, the way we interconnect our GPUs is through what we call our Infinity Fabric. And we announced in December as part of the 300X that we would be opening that up to strategic partners and collaborators. And here’s one of our strategic partners and collaborators.

Jas Tremblay: Yeah. We’re building a switch around that. But you joked about fighting and standards.

Dave Nicholson: Yes.

Jas Tremblay: What AI has done is said, “Okay, we need to do this amount of work in this amount of time.” So, it’s really cut down the noise. We get down to business, we’re like, “Okay, guys, here’s the best architects from five different companies. This is the problem statement. Let’s go figure it out.” And we’re accelerating the time from concept to getting a line on the standards to building stuff. It’s just everything is shortening, everything is going faster, and the amount of collaboration in the industry is much bigger than before.

And I think one of the reasons is because the prize is so big, we don’t need to fight over it. Let’s just all go build it together, get things done. The challenge is actually building it, building it at scale, building it so that it’s power efficient, building it so people have choices. So, we just need to get it done. And it’s instilling a different mindset with folks. It’s a lot of fun.

Dave Nicholson: That’s it. And this isn’t unprecedented. I think we’ve had a chance to chat about this concept of what happened with compute servers moving forward in AI.

Jas Tremblay: Yes.

Dave Nicholson: It’s just you’re saying that this is accelerated because the perception is that the prize is so big, we don’t necessarily have to be as territorial as maybe we even were then.

Jas Tremblay: So, we know how to do open. We’ve defined open in the data center around compute servers. That’s worked extremely well. We are doing it in AI. We’re just doing it faster.

Dave Nicholson: Interesting.

Lisa Martin: One of the things though, that that brings up in my mind is really cultural shifts within organizations to be more collaborative versus competitive.

Jas Tremblay: Yes.

Lisa Martin: How have you seen that … Obviously, AMD and Broadcom work very well together collaboratively, but the prize is so big, there’s enough room for everyone to share a piece of the pie, but how have you seen cultures and organizations, historied organizations, shift at the speed at which they need to, to make this acceleration a reality?

Robert Hormuth: I mean, I think for us, I think Jas hit it right, the size of the prize is large, but I think the advancement for mankind, especially in the world of AI, as we advance from the early stage of AI, let’s be honest, it was about human influence. How do I click, buy, watch … Get you to watch a movie or take an ad? But as we’ve been moving up through AI, it’s moving into productivity like Microsoft Office Copilot, right? Is helping productivity.

I mean, the end prize that we’re moving towards is really a tool that is helpful for mankind, to really be a tool for good to help humans live longer, faster, better lives, not just which movie to watch. And so, that’s really the end prize that we’re all aiming for. And so, that drives a lot of cultures together to go … Especially when we have a common goal in the industry, helping humankind is a pretty good goal for the industry.

Lisa Martin: It’s a great goal.

Robert Hormuth: So, it’s easy to get people to rally around that cry.

Lisa Martin: That’s great to hear that.

Dave Nicholson: Did I miss this? Did we talk about this?

Jas Tremblay: Oh, Charlie showed this also, this morning.

Dave Nicholson: Yeah, yeah, yeah, yeah.

Lisa Martin: Yes.

Dave Nicholson: But tell us about this.

Jas Tremblay: Okay. So, picture you have this chip and you want to attach to 64 servers at 400 gig with optics.

Dave Nicholson: As one does.

Jas Tremblay: Yes. So, you have this chip with up to 128 optical transceivers around it.

Dave Nicholson: Okay.

Jas Tremblay: Lots of power, lots of cost, lots of space. Very challenging to cool. Optics needs to stay cold. Replace it with this. So, this is cold-packaged optics. So, the same Tomahawk switch-

Dave Nicholson: Oh, I see.

Jas Tremblay: … but with optics at the semiconductor layer. So, you reduce tremendous amount of components. You have integrated 128 cold-packaged optics, and it cuts down the power by 70%.

Lisa Martin: Wow.

Dave Nicholson: Interesting.

Jas Tremblay: And we’re sampling this to customers, and one of the big challenges is actually manufacturing this with high quality. And we’ve figured out how to do this. We got the dedicated division just on this solution.

Dave Nicholson: This is fascinating. One of the discussions we’ve had today is this idea that if you limit the amount of distance that data has to traverse within a memory subsystem, you decrease the amount of power.

Jas Tremblay: Yes.

Dave Nicholson: You have effectively done something to get a massive amount of efficiency out of this, but there’s a balance because one could argue that, “Well, wouldn’t this be more open? Just let everybody hook their wires up.” So, it’s like, how do you balance between the sort of, it’s proprietary because it’s better with having everything be completely open to everyone.

Jas Tremblay: So, that’s one of the tricky points. And Robert and I talk quite a bit about this. If you want to be open, you want to get a lot of people contributing, get everybody’s ideas. If you’re closed, you’re very focused and you can fully optimize the solution. So, you need to get the best of both worlds. And the way to do that is you take the right people, give them the right problem to go solve, and give them really tight lanes to stay within.

And then, they start simplifying things. So, this quest for simplification is key to power and cost in the data center. So, it’s really tough to do that. And you talked about what’s different with the mindset? Well, this is a small industry. The people that we’re working with together, we’ve worked together for 5, 10, 15 years, and it’s like, “Okay, guys, we’ve known each other for 10 years. This is the time to define the standards, to go at it and just to be really focused.” Yeah. So, that’s a tricky part, but I think we’re on the right path.

Lisa Martin: Take us out … Talking about that path. What are some of the next steps? Because obviously, the collaboration is there, the cultures, the mindsets are there, the technology is there. What are some of the next steps that we might be looking out for on the horizon from Broadcom and AMD?

Robert Hormuth: I mean, I think from AMD’s perspective, we’re very focused on the compute side, both the CPU and the GPU. And what we’ve concluded is to continue to drive that efficiency, deeper levels of integration of our compute engines together, deeper integrations of more memory capacity, more bandwidth between that memory and the CPU and the GPU, drives levels of efficiency.

And so, for us, we’re going to keep driving more compute, more memory bandwidth, more memory capacity, higher integration, more cores. It’s the best lever that we have to go continue to drive that energy efficiency. And then we have to look at, well, okay, now how do we drive scale-out networking or scale-up networking? And this is where the partnership with Broadcom is just so natural, because we’re focused on … We have enough on our hands with the compute side.

Lisa Martin: Yes, you do.

Robert Hormuth: And Broadcom’s got a lot on their … Those are complex things that Jas has shown. And so, it makes us very good natural partners because our swim lanes are really pretty clean.

Lisa Martin: Defined.

Robert Hormuth: And are there going to be disagreements between partners once in a while? Sure.

Lisa Martin: You’re human.

Dave Nicholson: But you can pick up the phone and call each other.

Robert Hormuth: But we can pick up the phone and call.

Lisa Martin: Exactly.

Jas Tremblay: Yes. Yes.

Lisa Martin: Jas, what’s next on the horizon from Broadcom’s lens?

Jas Tremblay: So, front-end networks, Ethernet. Scale-out networks, Ethernet. Just continuing to double performance every two years. Internal networks within AI servers, PCIe. We’ve got the plans there. I think the next thing we’re working on, and we’ll have more information over the next few months, is making the scale-up open and accessible to the industry.

Lisa Martin: Awesome. Well, guys, thank you so much for joining us on the program.

Jas Tremblay: Thank you.

Lisa Martin: Bringing this amazing technology to show and tell. This is definitely a space we’re going to keep our eyes on. So, you’re going to have to come back and share with us more innovations in the next few months.

Jas Tremblay: Absolutely.

Lisa Martin: Robert, Jas, thank you for your time.

Robert Hormuth: Thank you.

Jas Tremblay: Thank you.

Lisa Martin: All right. Our pleasure. For our guests and for Dave Nicholson, I’m Lisa Martin. You’re watching Six Five On the Road from Las Vegas, covering Dell Technologies World 2024. Stick around. More guests join us in just a moment.

Patrick Moorhead
+ posts

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.