Today, IBM announced its much anticipated POWER9 chip, its first POWER9-based server, and support from Google and Department of Energy CORAL. IBM, in this first wave of rolling thunder announcements, is focusing this launch AI and machine learning, a space where POWER9 excels. The POWER9 architecture can do a lot more well like databases, but IBM wants to deservedly get credit for bringing out an architecture built from the ground up for AI and ML workloads. With POWER9, IBM achieved that goal.
Industry-wide, there are three different “flavors” of server chips that speak different software languages. Intel and Advanced Micro Devices servers speak “X86”, Qualcomm and Cavium speak “ARM”, and IBM and Suzhou PowerCore speak “POWER”. Across the server chip industry, raw CPU performance is still increasing, but at a slower pace than it was in the previous decade. Since there is still the need for speed to power the latest and greatest server applications, the industry responded by adding “accelerators”, or processors that do much more specific and focused tasks than general purpose CPUs.
These server accelerators cam be integrated into the CPU (ie Intel AVX 512, IBM AES) but are typically built onto cards that plug into servers with specialized cards and come in the form of GPUs (ie NVIDIA Tesla, AMD Instinct), FPGAs (ie Xilinx and Intel Altera), and ASICS (ie Google TPU and Intel Nervana).
With this long introduction, let’s jump into POWER9.
- Linux focused, scale-out “SMT4”: 2-socket, 24 cores (2x128bit super-slicing) with standard memory up to 170GB/sec bandwidth on 8xDDR4 ports
- PowerVM focused, scale-out “SMT8”: 2-socket, 12 cores (4x128bit super-slicing) with standard memory up to 170GB/sec bandwidth on 8xDDR4 ports
- PowerVM focused, scale-up “SMT8”: 4+ socket, 12 cores (4x128bit super-slicing) with buffered memory up to 230GB/sec bandwidth on 8xBuffered channels
I really like this flexibility as the Linux-focused systems don’t inherit the cost of the scale-up systems and the thread super-slicing better meets the needs of the targeted software architectures. Let’s me move onto IO.
Big-time IO capabilities
As I said earlier, POWER9 is the Swiss Army knife for ML and HPC accelerators like GPUs where IO matters a lot. IBM offers a dizzying array of options depending on the desired workload with different speeds, latency, coherency and price point:
- CAPI 2.0: 4x bandwidth of POWER8 using PCIe Gen 4 (192GB/sec)
- OpenCAPI 3.0: High bandwidth, low latency and open interface using 25G Link (300 GB/sec)
- NVLink 2.0: Next generation of GPU/CPU bandwidth and integration (TBD speeds)
- PCIe Gen 4x 48 lanes: 192 GB/s duplex bandwidth. This is 2X the speed of PCIe Gen 3.
- 25G Link x 48 lanes: 300 GB/s duplex bandwidth. This is 7-10X the speed of PCIe Gen 3.
- On-P9 Acceleration: Gzip x1, 842 Compression x2, and AES/SHA x2.
Given the 7-10X speeds over PCIe Gen 3 and memory coherency options, this puts PCIe Gen 3 to accelerator shame. If you are a business, CSP, or lab and get into heavy-duty acceleration, you need to check out POWER9.
In addition to the POWER9 chip, IBM is releasing its first servers in two configurations with support for POWER9. The AC922 server is currently deployed and running inside Summit and Sierra supercomputers at CORAL, the Department of Energy collaboration between Oak Ridge, Argonne and Livermore Labs.
The AC922 comes in two flavors, one with four NVIDIA V100 GPUs with an air-cooled thermal option (Q4, 2017) or water cooling (Q2, 2018) option and a six V100 GPU, water-cooled version available in Q2, 2018. Both of these configurations offer a ridiculous amount, 5.6X the CPU to GPU data throughput, of other high-end systems.
AC922 screams on ML and HPC
The AC922 server with POWER9 is a very fast machine learning and HPC platform. While IBM is running and publishing initial benchmarks, not an independent third party, they do pass my initial smell test based on the architectures and the workloads tested. I am expecting third-party benchmarks in January 2018.
IBM says CORAL is running AC922 systems today and seeing 3X AI exaflops (tensor ops), 10X the theoretical performance and 5-10X the application performance over their earlier supercomputer, called Titan.
In Caffe, IBM says the AC922 showed a 3.8X reduction time versus an Intel Xeon E5-2640 v4 aka Broadwell system, both with four NVIDIA V100s, running 1,000 iterations to train 2240x 2240 images.
In Chainer, with the same systems, IBM says the AC922 showed a 3.8X reduction time running the same dataset. While I would have rather seen performance numbers against Intel’s Purley, not Broadwell, I do believe the numbers are close as this is about bandwidth and accelerator capabilities, not about CPU core performance or even core to core architecture.
Net-net, the AC922 looks like it screams on machine learning training.
PowerAI software a differentiator
This may be a hardware launch, but hardware without ML software is like a car without gasoline- it looks pretty but it isn’t going anywhere. I haven’t been shy about my admiration of PowerAI and how I think it speeds up the deployment of deep learning frameworks and libraries on the Power architecture.
I saw many of IBM’s competitors step up to the plate at SC17 with their AI toolkits, so while early on, IBM had complete exclusivity on the idea, the competition is heating up. I still give IBM the lead.
Google voices support but doesn’t yet commit to deployment
I am impressed by the CORAL deployment with full acceptance in 2018 and what IBM Power Vice President Gupta characterized as a “broad set of enterprise customers using PowerAI who will be shipped this month.” This alone sounds like real customers buying real boxes with a lot of momentum behind it. But to get the big volumes going, I believe IBM needs Google to deploy POWER9, not just evaluate it. Google voiced their admiration for POWER8, did a ton of work by porting all their software to it and built numerous hardware designs, but never did a large-scale deployment. I’ll take the up-leveled Google press release quote from Bart Sano, VP of Google, as a higher vote of confidence on POWER9 than we saw with POWER8.
IBM is off to a good start with POWER9 and their first server, the AC922. IBM had the foresight to see the need for an AI and ML optimized chip and systems, and the company deserves a lot of credit for delivering it. I am impressed with DOE’s CORAL and the many enterprise companies IBM says is shipping this month but would have been more impressed with a list of five companies including Google who were committed to deploying POWER9 systems. I have no doubts that the POWER9 chip and the AC922 server crushes machine learning and HPC workloads, one of the sweetest spots in the industry. I believe customer testimonials and third-party, independent benchmarks (coming January) will help tell this story. Beyond AI and ML, I am looking forward to when IBM outlines their database performance on SAP HANA, Oracle, DB2, and NoSQL as this was an area of strength for POWER8.
POWER9 has been a five-year journey focused on the next generation AI needs of real customers and because of that, I believe POWER9 will see much greater customer adoption than POWER8. This is a big win for IBM.