Renewed Competition in HPC Will Enable Workload Optimization
Intel has enjoyed a dominant position in High Performance Computing (HPC) processors for nearly a decade now, with only IBM POWER offering a viable CPU alternative, since Advanced Micro Deviceseffectively, albeit perhaps unintentionally, exited the market in 2010. Meanwhile, NVIDIA has kept things interesting by delivering GPUs as computational accelerators for those workloads that can be sufficiently parallelized and whose kernels have smaller memory footprints that can fit in the relatively small 16-32 GB per GPU. More recently, Intel has reinforced its leadership in CPUs with its multi-core Xeon Phi (Knights Landing) to the forefront, especially for similar highly threaded workloads that demand more memory. Now, things are about to get even more interesting as AMD, Intel, IBM and ARM Holdings’ partners Cavium and Qualcomm ready new server SOCs, touting more cores, more throughput and more memory bandwidth, presumably at lower prices, to attract the HPC crowd and cloud service providers customers to their camps.
AMD rolled out a new chip with a new brand & logo for its 32-core server SOC. (Source: Advanced Micro Devices)
Don’t think about this evolution as just another inning of the old faster-and-cheaper ball game. Rather, each of these architectures (Intel Xeon and Xeon Phi, NVIDIA and AMD GPUs, AMD EPYC CPUs, ARM SOCs and IBM POWER CPUs) will attract specific workloads that are inherently aligned with each vendor’s chips. We’ve seen this game played out before, as SPARC, MIPS, POWER and Itanium all vied for position and market share, until they all fell by the wayside, with the notable exception of IBM POWER. The industry is about to become more diverse once again, and HPC users will need to balance potential performance with the cost of application optimization and portability in making their buying decisions. This renewed diversity will likely lead to significantly more competition and innovation in the HPC market.
What’s New In Processor Land?
Let’s start with the current leaders: Intel, IBM and NVIDIA. Each is in the early innings of a major product refresh, with Intel bringing out Knights Hill later this year to add (sorely missing) Deep Learning features to its Xeon Phi product and the new Purley-based Xeon Scalable Processor (Skylake) Family expected soon. While Skylake is perhaps a bit later than initially planned, by the end of this year Intel’s portfolio be stronger than ever. In fact, Xeon Skylake will more than double the HPC performance of its worthy predecessor, Broadwell, for floating point operations thanks to the addition of AVX512 vector processing, currently only available in the Knights Landing (KNL) Xeon Phi family. And the chip will sport Intel’s OmniPath, the on-die fast interconnect.
Meanwhile IBM is readying its highly anticipated OpenPOWER9 processor for deliveries later this year. This architecture has already been awarded two significant HPC projects by the US DOE: the Oakridge Sierra and the Lawrence Livermore Summit supercomputers. Both of these mammoth systems also use NVIDIA’s new VOLTA GPU and the Mellanox Infiniband interconnect, built into nodes consisting of a pair of POWER9 processors and six NVIDIA Volta GPUs. Scaling to 4600 nodes to hit its targeted 150 petaflops of peak performance, and perhaps reaching as high as 200 petaflops, these systems may vie for the coveted mantel of the fastest supercomputer in the world. Both IBM POWER9 and NVIDIA Volta are technology tours de force, being two of the largest and most complex silicon devices ever produced. The high level of GPU scaling also makes these systems ideal for conducting research in machine learning, with the NVIDIA Volta delivering up to 120 teraflops per GPU.
With so many products just announced, but only about to begin initial production deliveries, it is impossible to pick a winner. Certainly, NVIDIA’s lead in GPUs, especially with Volta, will be a tough act for AMD to follow, while the AMD EPYC CPU shows a lot of promise. Note that the upcoming AMD Vega GPUs do not support native double-precision floating point and Error Correcting Memory, which is a deal breaker for most (but not all) HPC workloads. (In full disclosure, I was VP Marketing at AMD while EPYC was being designed.) While ARM has a lot to prove to make up for years of over-promising and under-delivering, the new batch of CPUs from Cavium and Qualcomm show promise. Meanwhile, Intel looks very strong, but it has not faced a credible x86 competitor in many years. IBM POWER9 has some very impressive specs and heritage, not to mention that it is the only CPU to offer native NVLINK as well as Open CAPI for tightly integrating CPUs with GPUs and FPGAs like Xilinx.
Suffice it to say, picking a processor for your HPC workload just got a lot more difficult, but the opportunity to fine tune your HPC CPU, GPU or FPGA to meet the needs of specific workloads and installation requirements can help lower costs and power, while increasing performance. And in HPC, that’s the name of the game!