The AI Craze Is Highlighting The Cooling Crunch

By Matt Kimball, Patrick Moorhead - March 22, 2024

There is not a conversation I have with IT and business leaders these days in which artificial intelligence is not discussed. Virtually every company sees AI as a critically important strategic initiative. Significant investments are being made and a lot of technology is being deployed.

Much like it has worked during the hype portion of previous technology trends, the deployment of AI has, for many organizations, been learn-as-you-go. While many people may read that and immediately think of software stacks and large language models, this particular science project begins with infrastructure—the storage, networking and servers that house, move and process the unprecedented amounts of data being used across the enterprise.

As this bespoke infrastructure continues to be deployed alongside a newer generation of servers that are replacing aging technology, many IT organizations are realizing that budgets for electrical power are being strained. Worse, they are strained at both the rack level and across the datacenter. Further, there is a realization that these new servers loaded with the latest CPUs, GPUs and other specialized silicon run really hot. So hot, in fact, that air cooling alone will not keep these servers from overheating.

As servers run hotter and the need for more specialized AI acceleration becomes the norm, what do datacenter architects do? In the following sections, I will dig into the power/cooling challenge a little deeper and explore some of the trends that have emerged.

Servers Are Running Hotter

The need for compute is insatiable. I know this sounds like a cliché, but it’s a fundamental truth in enterprise IT. Five years ago, GPUs were something that gamers and HPC folks talked about. Today, they are the most in-demand pieces of silicon in the world. Likewise, if I mentioned ASICs five years ago, people would assume I was talking about running shoes.

In line with this dramatic shift, specialized technology is being deployed to meet the needs of the modern workloads that organizations require to compete in a market filled with smaller, more agile competitors born in the cloud. Companies are not only deploying more servers across the datacenter, but more servers that have more (and specialized) silicon to drive down the all-important time-to-value metric.

Forbes Daily: Join over 1 million Forbes Daily subscribers and get our best stories, exclusive reporting and essential analysis of the day’s news in your inbox every weekday.

The implications of this for power consumption on a global level are startling. According to the International Energy Association, datacenter power consumption accounted for between 240 and 340 terawatt-hours of electricity in 2022. That is roughly 2% of global consumption—equal to the amount of electricity consumed by Australia. By 2030, the IEA predicts this number will rise to about 8%.

Barring some unimaginable shift in IT priorities, there is no real way to avoid this rise in power consumption. The CPUs being deployed are more powerful—and power-hungry. Likewise, the GPUs and other accelerators being used to analyze data and train models faster require even more power. Consider the following: CPUs from Intel and AMD can consume up to 400 watts apiece, while GPUs from Nvidia and AMD can consume upwards of 700 watts apiece. And these power consumption numbers are going to rise in next-generation silicon.

With most datacenters operating on a very tight budget, two big challenges present themselves to datacenter managers and operators:

  • Making better use of the available power to support the needs of modernized infrastructure.
  • Deploying a cooling technology to keep up with so much heat generated in a server form-factor.

To add one more crucial piece of context to these intertwined challenges, consider this: about 40% of the average datacenter budget goes toward—you guessed it—cooling.

It’s All About PUE

Power usage effectiveness measures datacenter power efficiency—that is, how much of the power a datacenter consumes that goes toward powering servers, storage and the like. The ideal rating for PUE is 1.0, meaning every watt consumed powers the servers, storage and networking gear in the datacenter.

According to the National Renewable Energy Laboratory, the average datacenter PUE is in fact about 1.8. Datacenters that are highly focused on sustainability sit at about 1.2. These numbers tell us that if an IT organization can drive down its PUE, the power budget becomes much more manageable.

The question becomes, how can an organization drive down the PUE and still cool all these servers that are running hotter than ever? The obvious answer is to do something different. And that something different is liquid cooling.

Distinguishing Between Cooling Methodologies

Liquid cooling is another technology that was reserved for high-performance computing and other more niche use cases just a few short years ago. However, the rush to deploy AI-ready infrastructure, combined with the evolution of compute platforms, has pushed this cooling methodology into the mainstream. (For a good introduction to liquid cooling in the datacenter, check out this blog post.)

There are two major types of liquid cooling—direct-to-chip and immersion. In direct-to-chip, cool liquids pass through a cold plate connected directly to the components that will heat up. The cold plates extract the heat and pass it along to the liquid. In immersion cooling, by contrast, whole servers are housed in a dielectric fluid that directly removes heat from the components.

Direct-to-chip and immersion cooling come in two subtypes: single-phase and two-phase. In single-phase, a hydrocarbon-based liquid removes the heat and is cooled off through a heat exchanger. With two-phase cooling, a fluorocarbon-based liquid draws heat from servers. Upon heating up, the liquid turns to gas. As the gas separates from the liquid, a condenser changes its state back to liquid.

While two-phase cooling delivers a better PUE than single-phase, there are significant environmental issues with the liquid used. These fluorocarbon-based liquids are known as PFAS, or poly-fluorinated alkyl substances. In short, they are bad for the environment. Because of this, we’ve seen more and more governments ban their manufacturing and use.

When comparing direct-to-chip and immersion cooling, it’s also important to consider the disruption to IT operations. Whereas direct-to-chip cooling is minimally invasive in terms of deploying and maintaining servers, immersion cooling requires IT organizations to retool how they approach server maintenance. Immersion cooling also presents challenges regarding component warranties.

In either case, the PUE associated with liquid cooling—direct-to-chip or immersion—far exceeds that of air-based cooling. Whereas the average datacenter has a PUE of 1.8, direct-to-chip cooling can reduce PUE to less than 1.2, while immersion cooling can reduce PUE to the 1.02 range. If an organization can get its PUE down below 1.2, this is significant in terms of power (and cost) savings.

What Are The Server Vendors Doing?

Each server vendor acknowledges the challenges that are being faced as power consumption continues to rise and form factors continue to shrink. And each seems to have a response.

Lenovo’s fifth generation of the Neptune cooling system utilizes direct-to-chip and a rear door heat exchanger to deliver system-level cooling. The company has also invested a lot of resources in engineering an elegant tubing system.

HPE has leveraged cooling technologies (and knowledge) through its acquisitions of Cray and SGI. It has built on these technologies to deliver various cooling solutions, depending on workloads and customer needs.

Dell also offers a direct liquid-cooling solution, besides being very active in developing partnerships with ecosystem partners. The company has also actively published its research and participates in standards bodies such as ASHRAE and the Open Compute Project.

It should also be mentioned that Supermicro has also been quite active in liquid cooling. Given its agile business model, it is perhaps the nimblest of all the server vendors.

The Cooling Ecosystem

While each server vendor has its own solution, the market is also full of best-of-breed solutions that span direct-to-chip, single-phase and two-phase immersion. Companies such as Zutacore, Liquidstack, CoolIT, GRC, Submer and Vertiv are all vying to establish themselves in a market that is still in flux. And there are many more vendors than these in the market.

Which is the best company to work with? That is a decision to be made on a company-by-company basis. My suggestion for IT or facilities organizations looking to deploy liquid cooling is to do the following:

  1. Understand your needs and capabilities fully. What is required to stand up a liquid cooling solution?
  2. Get (more) educated on liquid cooling types and what fits your needs best.
  3. Contact your server vendor, channel partners and the cooling standards bodies.
  4. Create a short list of vendors and perform a deep evaluation.
  5. Look for references. Have organizations that are like yours (e.g., in the same vertical and with the same datacenter type) adopted liquid cooling? Which technology and vendor was adopted?
  6. Start small and scale slowly.

Closing Thoughts

Liquid cooling will one day be the norm in the datacenter, there is no doubt. Will it be direct-to-chip? Immersion cooling? Will the industry develop a two-phase liquid that is environmentally safe? Will air cooling still exist?

I think the answer to all of the above questions is yes. The future state of the datacenter will be one in which all cooling methodologies (and maybe a few new ones we haven’t dreamed up yet) will be deployed to meet the full range of requirements. This won’t be tomorrow, or the day after. But it’s coming.

Until then, all the AI and HPC workloads flying around require alternative cooling methods—now, today, already. So what are you going to do?

+ posts
Matt Kimball is a Moor Insights & Strategy senior datacenter analyst covering servers and storage. Matt’s 25 plus years of real-world experience in high tech spans from hardware to software as a product manager, product marketer, engineer and enterprise IT practitioner.  This experience has led to a firm conviction that the success of an offering lies, of course, in a profitable, unique and targeted offering, but most importantly in the ability to position and communicate it effectively to the target audience.
+ posts
Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.