On this episode of The Six Five Insider, hosts Daniel Newman and Patrick Moorhead welcome Mircon’s Girish Cherussery, Senior Director, Product Management for a conversation on High Bandwidth Memory (HBM) along with insight into the memory ecosystem.
Their discussion covers:
- A deep dive into what High Bandwidth Memory (HBM) is and how its being used for today’s applications
- How Micron is responding to the needs of the market and its customers
- What is currently happening in the “memory ecosystem”
- Girish gives us insight on what our viewers should keep an eye on in the market
Watch the video here:
Or Listen to the full audio here:
Disclaimer: The Six Five webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.
Patrick Moorhead: Hi, this is Pat Moorhead and we are back for another Six Five Insider. I am here with my incredible co-host, Daniel Newman, and we are here to talk chips. We’d love chips and we love what High Bandwidth Memory, otherwise known as HBM, can do for everybody. Daniel, how are you doing buddy?
Daniel Newman: Hey, Pat. I just love when you call me incredible. I have to be honest, we could cut the show there and we could call it a day, but I think our audience, our subscribers, our fans, everybody that’s out there that wants to learn a little something here, would appreciate it if the show went on. I wanted to say the show must go on, Pat.
Patrick Moorhead: Yes, it will go on. But hey, without further ado, let’s introduce our guest from Micron Technology. Girish, how are you doing?
Girish Cherussery: I’m doing good, Patrick and Dan, very excited to be here and honored to be here. Been a big fan of your show.
Patrick Moorhead: Oh man, can you just keep talking about that? The show? No, I’m just kidding. Sometimes we like to talk about ourselves, but it’s even better when other people are talking about us. All jokes aside, how the pod works, we have fun and we really are big believers that entertaining and educating is the way to go. And let’s roll in. Let’s talk High Bandwidth Memory, HBM3?
Daniel Newman: Yeah, let’s do that Pat. And by the way, I think you can do both. You can entertain and inspire and educate and do all at the same time. That’s great. This is what it’s all about. So Girish, HBM, Pat’s throwing it out there like everybody knows. I don’t think necessarily everybody out there knows. So just give us the quick, what is high-bandwidth memory and talk a little bit about the applications?
Girish Cherussery: Sure. Yeah. So if you start pedaling back, the world is today excited about ChatGPT, pretty much everybody around there knows when you say ChatGPT, everybody’s eyes pop up. But if you look at AI applications and generally the large language models have led to state-of-the-art accuracies to do various tasks today. However, training these models has not been very easy in the past, it’s been very challenging because the GPU memory capacity and its bandwidth is very limited. So if you want to take an example, the model sizes have been growing at about 1,000 X every three years. But if you look at the memory capacity growth to cater to that model growth is not scaling at that same rate. Which makes it extremely hard for us to basically train a model in a very efficient manner or in a reasonable timeframe.
So something like a ChatGPT-3, which was about 175 billion parameters, if you were to translate that into typical capacity to store that model, that’s about 800 gigabytes of potential memory that you need. So all of this points to an incremental need in memory capacity. Now you can have all the memory capacity in the world right next to wherever you want it, but if you don’t have enough bandwidth to basically meet that growing need for the data coming out of the memory, then you’re still going to be choked for performance. So if you look at basically a memory consumption in terms of a training model, for example, it typically has an optimizer state, it has gradients and it has parameters. And then apart from this, it has got the activations and the buffer capacity. All of that ends up consuming a lot of the available memory for training basically, which means that you cannot get enough memory capacity and with that, the memory bandwidth.
And High Bandwidth Memory, if you think about it, is the industry standard memory that, basically as the name suggests, provides the highest bandwidth for a given capacity. And the way it does it is think of it as you’ve got eight to 16 highway lanes, which are independent, and you can move data back and forth on these highways, which in the memory and in the semiconductor industry you call memory channels. So it has 16 independent channels that run at a pretty high frequency and that’s one of the ways you get that bandwidth need. But again, if you were to try to build a, call it an exascale supercomputer or a traditional large data center that requires the kind of memory bandwidth that we are talking about for these generative AI models, if you start looking at the different memory solutions out there, you could get the bandwidth by just shoving a lot of memory components.
But the issue is as you start putting them together, you will start to see that the power required to basically get the data in and out of that memory is going to be so expensive that it’s not an easy one for you to manage. So you need to do all of this very energy-efficiently. And the way the industry has basically come about defining a solution which we call the HBM, it’s an in package memory, which means the amount of distance the data has to travel between the host and the memory is very short, which means it’s very energy efficient. And because it’s got these 16 wide highway lanes that you can run at speed, you get the bandwidth. And the way we get the capacity is we actually take one layer of memory and start stacking on top of it. So it’s like a little mini skyscraper that you’re building and you have eight high solutions, meaning eight stacks of DRAM.
And the way the DRAM talks to the host or between themselves is through what we call the through-silicon via or TSVs. An analogy that I typically give my kids when I try to explain my job at Micron is I say, “Hey, you get those burgers and you’ve got that little toothpick that goes between them that holds them together. The TSVs are very similar. It tries to basically take the data that is on the bottom most die and tries to move it up the top of that memory layer through these little tiny TSVs or through-silicon vias.” Which basically means you’ve got a lot of capacity in a very small footprint. So effectively HBM, if you want to summarize it, it’s the industry standard memory that provides the highest bandwidth in the smallest footprint and it’s an inpackaged memory in a very energy efficient manner.
Patrick Moorhead: By the way, I love the hamburger analogy. You literally took one of the most sophisticated techniques, which is TSV, to get power and data between layers. I love that. I mean, I like to think of HBM that connects chips together for the most, the highest performance applications and high performance definition is expanding two to the amount of data that it takes to get from let’s say, one compute engine to another or getting it to other parts of the subsystem. So Micron for a long time has had a product line delivering this. I mean, I can envision the cards right now with your memory on it, but I’m curious about this just insatiable need for compute and the latest, which is things like generative AI and ChatGPT, how are you responding to your customer needs in the marketplace?
Girish Cherussery: Yeah, I think that’s a great question, Patrick. So we believe in really solving the problems that not only just benefit just the technology industry, but also basically solving human race related topics down in the future. And if you look at the supercomputers that are built today, which have these crazy amounts of memory need and bandwidth, HBM is one way to basically go address some of those needs. I’ll give you an example. One of the supercomputers that were built was one of the primary sources of running various simulations when COVID-19 pandemic was prevalent and the pharmaceutical companies were trying to identify the right drugs or the compounds that actually go and help tackle the disease directly. So solutions like HBM really change the way we think about memory technology as just being a component in a system to being a critical component in that system.
So if you scale that back to Micron’s history, Micron was one of the pioneers in developing the TSV technology. Most of your audience and I’m sure you folks remember, we had come up with what we call the Hybrid Memory Cube, which again takes the concept of stacking memory on top of each other and then communicating with the host there. And we took that and we leveraged the technologies that we had developed for that particular solution as we came into the HBM2E world. HBM2E was the first place where we introduced our products and we are currently in production with that product for a while. And just this week we had an exciting announcement, we announced the planet’s very first, greater than one terabyte per second memory bandwidth per placement solution. Our HBM3 is sometimes called HBM3 Generation 2 or Gen 2. Some folks in the industry call it HBM3E.
Now, the beauty of what we’ve done with this particular announcement is it’s not just a boost of the bandwidth that we spoke about a while back, it also provides an increased capacity. So we are using one of our latest process nodes that you folks might be aware of, that we had announced, our 1-beta process node. And we are leveraging the technology leadership there to help pack 24 gigabytes of memory in one layer. So basically by stacking eight of them, you get a 24 gigabyte solution, which is extremely power efficient. It’s expected to be one of the best-in-class in power efficiency, basically delivering the kind of performance we are talking about, which are going to help address some of the growing challenges, what I call the beast, which is the compute element, which is craving for this data. So this is going to be the next generation of HBM technology that basically feeds the beast.
The going joke around in Micron is, we have bigger, faster and cooler. What I mean by that is we have more capacity so we are the big guys. It’s the fastest memory that we think is going to be available on the planet for the next few years and by the sheer amount of performance that you get. And the cool piece is where we are trying to get that 24 gigabytes of memory solution within eight layers of DRAM. So one of the challenges in the memory industry, especially HBM, when you stack memory on top of anything, the heat that’s generated at the bottom needs to be dissipated at the top because a cooling element typically is at the top. And this means the more number of layers you stack, the more heat you’re trapping at the bottom.
So with an eight high 24 gigabyte solution, we think we are going to be able to address some of the energy efficiencies and thermal efficiency problems that the industry is facing in getting to higher capacity HBMs. So we think that with our solution, we are going to be able to improve the overall performance in a data center. We measure that with the total cost of ownership or TCO, a huge benefit that we are expecting. Now, a lot of people ask me, “What’s TCO mean?” And you could look at it from two different angles. The one angle that you could look at is if you said, “Hey, I have a fixed number of nodes that I want to go and invest in,” then you have a specific number of GPUs or memory that you can buy. And the amount of time you take to train a model then reduces by using Micron’s HBM.
The other way to think about it is if you’re building your initial data center and you say, “I want to train a ChatGPT in 30 days.” And you basically plan your GPUs and memory based off of that, then you do not need as much memory bandwidth or as much memory components in your system. And hence as a result, sometimes even your GPU. So you would actually be saving in terms of the actual CapEx that you invest to get that data center up and running for those workloads. So it’s going to be a huge TCO benefit once this product is being used in your generative AI applications or HPC applications.
Daniel Newman: So Pat, he did the hamburger thing.
Patrick Moorhead: Yes.
Daniel Newman: And he is teaching his kids HBM, future engineers, and he did a high school metaphor from Greece, “Bigger, faster, cooler.” I don’t know what that is, but something like that. And you’ve somehow taken HBM now and you’ve made it the coolest kid in high school. The biggest, fastest and coolest. But I like the analogy and obviously there’s probably some pretty valuable sustainability impacts too from being a higher performer, lower power, which matters. So just a couple of minutes left here, I hear a lot about the opportunity in AI and I think that with LLMs that’s going to be a big topic and a big opportunity for Micron. But I’d love to just see from your perspective, what else are you seeing right now in the memory ecosystem and what should Patrick and I, what should all of our viewers be keeping an eye on?
Girish Cherussery: Yeah, I think the memory and storage landscape in the market has changed. The Industry 4.0, autonomous cars, generative AI, and even the phones that we tend to use have transformed the way memory is being used. And you are looking for more heterogeneous architectures and solutions that basically address what I say are three major problems that the industry wants the memory industry to solve. They want more capacity, they want more bandwidth and they want it to be more reliable. So the three big ones, more capacity, more bandwidth, more reliability. And if you look at broadly what’s happening in the industry today, the DDR-4 is transitioning to DDR-5, which means they’re craving for more bandwidth. And that transition is also coupled with an increase in capacity. You are beginning to see the introduction or usage of LPDDR memory, Low-Power DRAM memory, being used in data centers for traditional CPU related applications.
You’re also seeing solutions that were used for gaming devices like GDDR6X that Micron has running at 24 gigabits per second pin speed. I mean, it’s crazy the kind of speeds that these folks try to push that industry towards. That memory solution is not only being used for gaming applications, it’s also going into data centers for inference applications, especially for text to video or voice to text kind of conversions. And then if you look at it from a CXL perspective, there’s the CXL attached memory, which is trying to expand the capacity and bandwidth simultaneously. So that’s another area where Micron’s playing in this space. And then we already talked about HBM as the critical piece of the generative AI related solutions. It’s a very important component in that system that basically helps you enhance your AI solutions.
And if you just look at it from a HBM standpoint, the ecosystem has very ambitious targets. They want to double the bandwidth every two to four years and usually want to couple that with a 50% bump in capacity every generation. And if you look at HBM, it’s a critical, crucial piece of the puzzle needed to unleash the full capability of AI in the market. And if you look at Micron’s strong product portfolio, we pretty much service the entire gamut of all the solutions that we just talked about. And with our current leadership technologies in the process notes, we hope to basically lead the industry towards coming up with innovative solutions that basically influence how we think about various aspects of human life and we want to make life better for all of us.
Patrick Moorhead: Girish, I could talk on and on about memory and basically the memory and storage hierarchy out there, because it pretty much drives everything. I know compute gets a lot of attention and GPUs get a lot of attention, but quite frankly, without the interconnective tissue of all the memory that you ended with and whether that’s DDR5, HBM3, LPDDR, and one of my favorites, which is CXL, which I’m super excited about, the world does not turn. And with Moore’s Law changing as it has, newer technologies to get compute units, and what did you call it, the beast? I love that. I might use that from now on and not give you any credit whatsoever. But no, I think you really gave the audience more of an appreciation not only about High Bandwidth Memory, but also Micron’s participation in that. We really want to thank you for coming on the show.
Girish Cherussery: Thank you, Pat. Thank you, Dan. Appreciate your time.
Daniel Newman: Yeah, Girish, it’s really great to have you here. I appreciate you being a supporter and keeping in touch and now you’re an alumni. And so let’s have you back sometime soon.
Girish Cherussery: Sure.
Daniel Newman: Everyone out there, thanks so much for tuning into this Six Five Insider. It was great to talk about HBM memory and, of course, it’s huge implications to all this AI talk and what it’s going to mean for impacting large language models, generative AI and those traditional AI workloads, those general AI workloads that everybody is focusing on for their business. But for this episode, from Patrick and myself, it’s time to say goodbye. Hit that subscribe button, tune into all of the Six Five weekly episodes and all the other episodes. We appreciate you. We’ll see you later.