The Six Five team discusses their tour inside AWS’s chip lab.
If you are interested in watching the full episode you can check it out here.
Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.
Daniel Newman: We got our eyes on some pretty cool AI technology yesterday. Pat, what were we doing with AWS yesterday?
Patrick Moorhead: Yeah, so we got the grand tour. We talked to developers, we talked to architects about the entire AWS custom silicon portfolio, two key locations for this development work. One of them is in Austin, Texas, which we visited and the other is in Cupertino. And I literally got every single one of my questions answered about Graviton, about Trainium, about Inferentia. And I got to take pictures. We got to take fricking pictures of boards of the backs of chips that will tell you almost everything out there. Some really cool stuff that, well first of all, the Inferentia2 and Trainium1 are the same chip. Now the blocks are programmed differently for different things versus training versus inference and it’s packaged a little bit differently on the board, but it’s the same chip and I think that’s really cool. Ironically, when it comes to Generative AI, even with NVIDIA, their H100 does both inference and training.
I thought that was pretty cool. And gosh, one big takeaway was they designed for 10 years. They designed connectors, they designed their chips to last 10 years. And I thought that was just a huge takeaway for me. They showed us racks too, which was crazy. And the most impressive to me was this foundational model training beast. It’s two 4U racks stacked on top of each other with a 3U that as the CPU module interconnected by copper fricking cables over PCI express. I would’ve thought for sure it would’ve been optical. That’s how you connect high bandwidth stuff. But they worked the signaling to the point where they almost hit the performance of, well not the full bandwidth of optical, but at just an incredibly lower cost with the same amount of reliability and having nearly the speed of NVLink. And I just thought that was just a big thing.
The other fun fact here is they actually do system wafer testing. So that’s taken a 12-inch wafer putting it on a fixture at a foundry in Taiwan that you might know. And it runs neural nets on it. And I’ve never heard of that in my life. I’ve heard some basic testing, right? Making sure power, making sure your pins are lighting up, but they actually do functional testing. The reason they do that is Interposers and HBM and the packaging costs so much, why test on a full processor? Now that makes sense, but the fact they do it on a complete 12-inch wafer blew my mind. And by the way, it’s not punching out the chips and testing those. It’s actually putting the entire wafer on a fixture and doing a functional test.
Got the Inferentia2 tray, pretty cool. It’s 4U. You don’t rack and stack because the CPU’s on there, boy, that looked like an AMD CPU with 12 Inferentia cards going over PCI express. And again, I can’t believe they let us take pictures of the boards themselves. We saw Lattice, some big Lattice semiconductors, probably security potentially an FPGA for IO. And then I saw a bunch of Marvell, which is likely networking.
So again, unprecedented access. I’ve been covering these guys for a decade and that’s how long the company has been doing this. But big picture, I’m going to get out of my geekdom real quick is AWS has found a way to have their cake and eat too. They have merchant silicon, everybody wants to work with them. NVIDIA, AMD, intel for CPU, for GPU for AI because they are the largest IAS provider hands down. So you want to go strong with AWS.
And then they have their own custom silicon CPU inference, GPU and where it all started, which was on the networking plane, which is ironic. So they can provide customers lowest cost, they can provide the customers highest performance. And I’m just scratching my head wondering how do the other providers compete on this level? You can’t just dive in and be world-class. It took Apple 10 years and it took AWS 10 years. And the question I have is how long will it take everybody else to get to this level if the rate they even want to go there, because AWS has to be spending a billion dollars a year on development. It has to be right. Even if I go 250 million a design per year that they have to rev, they have to be spending this.
So I’m going to end it there. I’m really excited about getting this access. Never thought we would get it. Nobody lets you take pictures of boards, really. But they did. And they don’t even let you take pictures of their racks because their racks are non-standard from power and a width and a height basis. So again, okay, I’m pretty excited. I’m going to calm down. Dan, you got the ball?
Daniel Newman: Yeah, you did. You spilled my beans. You shared all my secrets. Yeah, I had a couple of people when you and I asked, can we share these? And they’re like, yeah, go ahead. And I was like, sure. Then I tweeted and some people were like you said, I actually had 15,000 people look at that tweet and they’re like, “I can’t believe they let you share that.” People are commenting on that. And it was really cool and I guess their point is like, well, it’s not like other people can’t do this and take these out of a rack and take a picture of it potentially, but generally speaking it’s not a known practice to give that away. Pat, my take on it though is kind of like they’re like, catch us if you can catch us, if you can. There’s a quote, I believe it was Herb Kelleher, the CEO of Southwest Airlines early on. He said, “I could leave my entire strategic playbook on the seat of one of my planes and I’m not worried about anybody. I don’t believe anybody else can execute like we execute.”
And I think that was a little bit of what we were hearing was they absolutely believe in their execution. They believe in their ability to deliver price performance. And your comment about the get it all here, merchant silicon all the way to custom homegrown silicon, they’re like, we’ve got it all. I mean, look, this company, it’s not a small margin by which they’re the world’s largest infrastructure player. It’s not small as much as we want it to say it is, it’s actually pretty substantial, especially because the way numbers are sort of represented by all the different cloud companies when it comes to pure enterprises running their infrastructure on AWS, it’s a fairly significant gulf. Having said that, of course everybody’s competing. But Pat the theme vertical integration, the theme, we understand that if we can be more and more vertically integrated, we can be higher growth, we can deliver better experiences across the application stack.
And of course there’s a model for this. The company is called Apple and it’s a different business, but it’s the same of understanding how as you vertically integrate, you control your destiny more. But they’re approaching it in all ways and I think they’re doing a good job from a balance standpoint. You hit a lot of the geekiest stuff here, Pat, but from a story and an evolution here, Pat, they’re taking things like PCIE and using it to create competitive networking to what I’m not going to say it doesn’t outperform NVLink, but the point they’re kind of saying is, yeah, but for 1/10th the cost we can give a whole lot of performance and things like that are pretty impressive as we know that the scale of AI eventually is going to need to address price markets from lower to upper. By the way, your background is so cool. I’m just telling you that the building behind you.
Patrick Moorhead: Coming up folks.
Daniel Newman: Yeah. God, that’s so cool.