IBM Goes All In On ML Inference With IC922 POWER9-Based Server

IBM is a company that I follow very closely, in particular its System Division, Red Hat and corporate. It has been particularly interesting to watch IBM Systems over the last several years as it has rolled out its POWER9 systems to great success (see my coverage of the initial launch in 2017 here, and a follow up on the second phase of rollout here for context).

POWER9 was built from the ground up for AI and machine learning workloads and it excels driven, in part, to its superior accelerator technologies, memory addressability and threads per core. While it has found a lot of success, with customer wins from the Department of Energy, Google, Wall Street, NASA, and more, to date it has mostly been targeted to training ML workloads. IBM’s Power System AC922, equipped with up to 6 NVIDIA V100s, excels in that area.

This week, however, IBM announced it was expanding its scope to inference with the unveiling of its new IC922 Power Server. Let’s take a closer look at the new offering.

Rounding out the DTI (Data-Training-Inference) lineup

IBM introduced a concept last year that it calls “DTI”. Consisting of three components—data, training, and inference—IBM says this model is a framework to aid clients in the respective AI journeys. To be blunt, this “is” the way clients think of this process and is the reality among practitioners.

IBM stresses the fact, and I agree,  that DTI is not a linear model—rather the three components are constantly interacting with each other in a continuous loop. Data, is well, data. It’s the foundation that all AI is built on. Without a solid foundation of information to process and analyze, machine learning is useless. Training is the stage in which this data foundation gets converted into AI models. Lastly, Intel describes Inference as “the sum of all parts.” This is why this new inference server is such a big deal.

The IC922 Power Server is a storage-dense, high bandwidth server, the first from IBM that is purpose-built for AI inference. It can support as many as 6 NVIDIA T4 GPUs, and IBM plans to expand this number to 8 and provide additional accelerator options in the unspecified future. IBM touts the solution’s flexibility, and I agree, saying that this will allow it to tailor to specific needs and environments so it can perform inference both in the central datacenter, and outwards to distributed data centers closer to the edge.

The IC922 features 24 SAS/SATA storage bays, which I believe will enable its customers to construct a structurally sound data foundation upon which to run their AI models on (as mentioned earlier, this is crucial for AI). Additionally, IBM says clients can configure the IC922 to function as either a data or an inference server. Looking towards the future, IBM says additional support for NVMe is planned which I believe is very important for the highest throughput workloads.

The server also features a maximum system memory of 2TB. Additionally, IBM says the server’s 8 DDR4 ports at 2666MHz translates to a 170 GB/s peak memory bandwidth per CPU and also features 32 DDR4 RDIMM slots.

It features advanced I/O architecture with PCIe Gen4 and OpenCAPI support, supporting my naming POWER9 as the “Swiss Army knife of acceleration”. To top it all off, it comes with IBM’s WMLA-inference, which is a complete AI software stack for inference.

Finally, IBM boasts that the IC922 features 33% more GPUs per 2U server than its Intel-based competitors, which should compute to a lower operational cost. Utilizing its recently acquired RedHat OpenShift technology, it could be the most economical private container cloud deployment infrastructure when you factor in its 2.35x better price/performance claim over its competitors. I didn’t do the analysis myself, but when you look at how threaded POWER9 is, this makes sense to me.

Wrapping up

With the launch of the IC922, IBM’s POWER9 AI endgame is finally coming into focus. It now has the hardware to support every aspect of its DTI (Data-Training-Inference) workflow model, giving IBM a complete end-to-end AI-to-cloud solution.

The IC922 appears to be economical and flexible, making it an easy sell to customers looking to get into the AI inference game. The fact that IBM has an entire data, AI and cloud stack is very important if pre-integration and one throat to choke matters. Of course, I’ll have to dig a bit deeper into the claims and benchmarks even further myself, but it passes my initial smell test based on architecture alone. I’ll be watching with interest and look forward to talking with full IBM DTI clients.