On this episode of The Six Five – On The Road, hosts Daniel Newman and Patrick Moorhead welcome Lenovo’s Sergio Severo, President and GM, ISG, North America and Simons Foundation’s Ian Fisk, Scientific Computing Core Co-Director, Flatiron Institute, for a conversation on Lenovo’s partnership and technology has supported the Flatiron Institute’s vision for “doing new kinds of science.”
Their discussion covers:
- Lenovo’s experience supporting researchers across education and technology through supercomputing and the partnership with Flatiron Institute
- How Lenovo’s technologies have supported the Flatiron Institute’s vision “for doing new kinds of science”
- How Lenovo’s partnership with NVIDIA delivered high-performance supercomputer, Henri, operated by the Flatiron Institute, which topped the Green500 List of the most power-efficient supercomputers in the world
- How Lenovo’s achievements with the Henri supercomputer and collaboration with the Flatiron Institute fit into the company’s overall strategy
Be sure to subscribe to The Six Five Webcast, so you never miss an episode.
Watch the video here:
Or listen to the audio here:
Disclaimer: The Six Five webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.
Patrick Moorhead: The Six Five is on the road at Supercomputing 2023 here in Denver, Colorado. We are in Lenovo’s booth and it is rocking. You can hear all the excitement here and quite frankly when you combine super computing, whether it be flop space compute or tops with AI-
Daniel Newman: Tops the flops, baby.
Patrick Moorhead: … I mean, how could you not get excited about this? And Daniel, it’s great to be here.
Daniel Newman: Yeah, I was thinking more maybe it’s the evolution of supercomputing from flops to tops.
Patrick Moorhead: Here we go.
Daniel Newman: In 2023, this is the ultimate transitional year and we’ve had conversations about this, the changing in of the guards as workloads move to more and more AI. And I mean what made this place an absolute crazy destination in terms of people is all the interest in AI. We’re seeing it go from being something very academic, very research driven to the interest expanding like crazy Pat. And you and I could barely get in here last night.
Patrick Moorhead: Yeah, it was crazy. We almost got stampede, and I’m not kidding you, about 10,000 people waiting to get in. We were all on time. I love being on time and so were the other 10,000 people trying to get in here. But no, the evolution of the industry has been incredible. And not just, is it national labs, local labs, but in commerce, whether it’s fluid dynamics for designers, whether it’s pharmaceutical drug discovery, all the different ways we use high performance computing. And one vendor that has really, that keeps on super strong and high performance computing was Lenovo. And I’d like to thank Sergio for coming on the show again.
Sergio Severo: Pleasure.
Patrick Moorhead: And also, so invite Ian from the Flatiron Institute onto the show. First time on The Six Five. Thanks for being here.
Ian Fisk: My pleasure.
Daniel Newman: Yeah, it’s great to have you both. Great to have you for the first time on the show, Ian, we talked a little bit about the pedigree of Lenovo and you’re sitting here probably because you have a partnership, but I would love to talk a little bit about that, a lot of provenance with Lenovo in terms of education and technology. But talk about the partnership that you developed with Lenovo and why you went down that path.
Ian Fisk: Sure, I’d be happy to. So a little bit of background, the Flatiron Institute is the in-house research group of the Simons Foundation. And in 2014 we had eight computers, we had about eight cores. And over the course of about 10 years, we have gone from eight cores to about 200,000 processor cores and about 800 GPUs. And our partnership with Lenovo has allowed us to sort of go on a ramp, which is sort of unexpected, that basically that you can go at such, we were growing at 50% a year, 25% a year. And that allows you to go five orders of magnitude in the number of CPUs. And this particular machine that we talk about today, which is the binary machine, that top of the Green500 that’s based on SR670 from Lenovo, which is a 3U air cooled case. And for a supercomputer, it’s incredibly accessible. This machine went from sitting on our loading dock to being on the top of the Green500 in less than four weeks. And it was a machine that basically we could assemble ourselves. It’s very tuneable in terms of the pieces of the components that you’re using in the energy. And so we got to, our partnership with Lenovo allowed us to go take very cutting edge technology that was also accessible from a small organization that allowed us to become a relatively large organization.
Patrick Moorhead: What kind of workloads are being, well we talked a little in the green room about the type of workloads, but can you tell the audience what’s running or what’s going to run on the systems?
Ian Fisk: Well, one of the things that makes us sort of unique is that we have a group in biology, a group in astrophysics, a group in quantum systems, which is sort of first principle material science, a group in neuroscience and mathematics. And the machines we have have to run those codes from all of those people every day. And so they tend to be a little bit, we take the superset of everyone’s requirement and that’s what we choose. So almost all of our systems have a terabyte of RAM in HPC, which is almost unheard of. It’s a lot of ram and we have a lot of cores and we have a lot of GPUs. And one of the things that’s changed in the last several years has been the impact of AI. So we went for a while, we were doubling the GPU farm every six months and one GPU goes to two, goes to four, that’s pretty easy. But 256 goes to 512 becomes a more of a technical challenge. These use a lot of power, they take a lot of cooling, they’re and interesting machines to support. And as that’s gone, we now have people who are doing inference on systems biology, gene function, simulations for astronomy codes, so multi body simulations, forward-looking simulations. And then in the last six months or so, we’ve had people looking at foundational models for science. So looking at things like large language models for how you might do large scale simulations of fluid dynamics, which involves before we thought it was hard before with sort of getting GPUs now that you have to have many GPUs tightly connected, working together, that’s even a more challenging environment. And that’s another place where our partnership with Lenovo has been very beneficial to us. We are expecting a delivery in the middle of December of our first machine optimized for large language models, which is a water cooled H100 system with 1.6 terabit of networking per node. So the machine’s really designed for this particular challenging problem, which is also what everyone else wants to do. So they’re very hard, they’re hard to get, they’re hard to support.
Patrick Moorhead: So Sergio, I hear going from eight cores to hundreds of thousands of cores and GPUs, how on earth does Lenovo support something like that? And my guess is it happened at a time when these types of products were hard to get as well. I mean, how does Lenovo approach something like this, the scale?
Sergio Severo: That’s a fantastic question, but let me kind of present an analogy with a sport that we all love, that is Formula 1, right? We were there. So when you’re in Formula 1, you need the best machine and the best pilot. These guys are the best pilots in the world. They have the best scientists, they have the best knowledge on how to make the most of our machine. And we have the best machines in the world. We have the best infrastructure in the world. How we plan for this partnership to work is not only Lenovo, also we have Nvidia very well involved in all of this. And together with Flatiron, the code name of the machine is Henry. So we build Henry together, they use Henry as much as they can. But I think everything starts from the top. Our Lenovo’s mission is to tackle the most compelling and the most difficult challenges in humanity. And the Flatiron Institute, the Simon Foundation mission is to use advanced technologies to advance in science in specific areas like the universe, machine learning for protein analysis, all these complex things. So I think everything starts from the top. We were planning through very difficult times to build the best Formula 1 car in the world, but we have the best pilots in the world. That’s the secret sauce.
Patrick Moorhead: Anything have one? I don’t know, we just went up to you on flops to tops.
Daniel Newman: I’m sold. I don’t know. I don’t know if it gets better than flops to tops, but-
Sergio Severo: Another data. So because Formula 1 speed is key. So this machine, the performance is 65 gigaflops per watt. Why the per wat is important because it’s not only to be fast, it’s also to be efficient. And this is an efficient machine and this is a machine that you can go through the door of a data center, you don’t need to break any wall, anything to move it inside the data center. This is a standard rack.
Daniel Newman: I was absolutely going to ask that question when you started kind of alluding to some of the sustainability, by the way, if you want to stay with the F1 analogy, same thing they’re doing moving towards electrification lower, they’re doing more and more output, smaller and smaller engine, less fuel, more electricity. So this is a problem that everyone in the world is trying to solve. You don’t only want to have this infinite compute power, which infinites exciting, but you also want to do it sustainably. And what I understand about Henry is that you were able to accomplish that. Can you talk a little bit about, I’d like to hear both Sergio from you about how Lenovo’s approaching the sustainability challenge and then from the Flatiron standpoint, what is your sort of ethos as it relates to being sustainable? So Sergio, start with you.
Sergio Severo: Lenovo is simple in concept, it’s very difficult to achieve. So everything we do has a sustainable method behind, such as Neptune, for example, water cooling. In peak performance, you can save 40% of energy using water cooling technology. And in the case of Henry, we are doing that. So it’s very simple, but it’s very difficult to achieve in a standard rack form factor with all the heat that you put in these big processors and big GPUs. But they are the users so they can explain to you how they do it.
Ian Fisk: For us, the foundation has always, this has been an area of sensitivity for them that they would like to have as low carbon program as possible. At the same time offering a leadership class, computing environment. And for us, the GPUs, they use a tremendous amount of power. The H100 is a marvel because it’s about twice the performance for a similar amount of electricity. And the other aspect of this is that the Lenovo box can be tuned so that elements of the system that would use electricity would not give you a lot in the performance can be turned way down. So the CPUs can be turned down if they’re not contributing to the application at the time. And so from our perspective, this allows us to lower our footprint. We’re in a data center which is using wind energy credits intentionally so that we have a lower footprint there and then we have to buy offsets, the foundation buys offsets for all of our other computing environments. So this allows us to basically just lower our footprint and maintain a high performance.
Sergio Severo: And also the magic is you have a standard racks and you connect them with a highest speed InfiniBand from a Nvidia, right? 200 gigabit, right?
Ian Fisk: This one is 200, 200 times two, it’s 400.
Sergio Severo: Wow, that’s very fast. But you need to do that in order to have parallel process, right?
Patrick Moorhead: Yeah. Sometimes we forget the fact that you have to network these things together, have coherent memory systems. And what we’re seeing in all our research is that the next bottleneck could be the network. And it’s something that we’re looking at. By the way, kudos to you on embracing the water cooling, both of you. I mean it is funny, a lot of the hyperscalers weren’t doing it and now clearly their next generation of data centers, they realize they have to do it. Lenovo’s blazing the trail here. I’m curious about the work that you’re doing with Flatiron Sergio, how does this fit into your overall strategic goals here? Whether it’s making an impact on the planet and doing these amazing things, whether it’s where you sit on sustainability, just overall, how does Flatiron fit into the big picture of Lenovo? Strategically?
Sergio Severo: It’s all of the above. So as I mentioned before, we are tackling the most challenging aspects of the most difficult challenges that we see in humanity. And one of the most difficult challenges now is environmental sustainability. So it’s in our mission in everything we do. We think about, but also in this case with Flatiron, look at what they’re doing. They’re using machine learning simulations to improve the genomics, to investigate the universe, and quantum physics. So this is key for us and it’s a fantastic partnership for us.
Ian Fisk: And I wanted to mention just a little bit about water. We switched to Lenovo water cooled servers for our facility in San Diego about four years ago. And it was a decision we debated for a while because it’s a step to go to.
Patrick Moorhead: Is it chilled water?
Ian Fisk: Yeah.
Patrick Moorhead: Okay.
Ian Fisk: We use chilled water-
Sergio Severo: But the chill water in the water building is 60… 60 something to 100 degrees.
Ian Fisk: But from our perspective, it’s a decision that we don’t regret at all. It’s that the machines are more efficient because there’s no fans in them. We actually lose fewer components because the temperature in the systems is much more consistent. So we’ve replaced fewer DIMMs. And then this year when we did our most recent procurement, the density that you go to in a water cold solution is much higher because you don’t have to have the airflow in the fans. So we have 72 nodes per rack, which allows us to put the next cluster in six racks. And what we discovered a bit surprisingly was that the networking cables are so expensive that the density becomes incredibly important. And if you can make it in six racks rather than 10, you can reduce the networking budget by about a half a million dollars. So suddenly the water cooled stuff is saving you a tremendous amount of money simply because the form factor is smaller, cables are shorter, it’s denser, and we’re going to run out of electricity before we run out of space. So the density was not so important from just a facility perspective, but it became incredibly important from a cost perspective in the networking itself. And so that was the solution. I think essentially we bought our last set of air cooled servers on a large scale because these things are only getting higher wattage on the CPUs, more challenging to support and the liquid just does a much better job. Right, for sure.
Daniel Newman: I mean, Sergio, just a kind of final, more of a concluding question, but supercomputing is one of these exciting frontiers future areas. How do you see these types of partnerships shaping growth and helping build Lenovo on a global scale? How is this adding value to the bigger vision of Lenovo?
Sergio Severo: So this kind of… Thank you for the question. This kind of partnership where they do is they challenge Lenovo, they put Lenovo in the boundaries of what our machines, our engineers can do. But also I think because of what Ian is sharing, we collaborate in developing new methods to practice science and it’s very important for us. So in the future, what I see is now they are very well involved in machine learning, AI, all these new technologies that everybody’s trying to use. We have had this for many years. Somebody, a company, an organization that has been using this from many years ago, we can take knowledge and put it in the public service. And I think this is a mission that we have at Lenovo to share these learnings with other customers
Daniel Newman: And it also really helps break through societal needs.
Sergio Severo: Of course.
Daniel Newman: … Safety, wellness, climate, all these different things, these big problems that are very, very hard to solve, that accelerated and supercomputing can support. So Ian, Sergio, thank you so much for joining The Six Five.
Sergio Severo: Thank you, pleasure.
Ian Fisk: Thank you.
Daniel Newman: All right, everybody subscribe to The Six Five on the road here or subscribe to all of our shows. We’d appreciate all of the above. But for this episode here at Supercomputing 2023 in Denver, Patrick for us, for our guests, for everyone out there, we’ll see you all later.