Advanced Micro Devices (AMD) Radeon GPUs have been gaining popularity in gaming and virtual reality, thanks to the performance afforded by their use of 14nm FinFET manufacturing technology and High Bandwidth Memory, coupled with aggressive pricing. However, the company’s High Performance Computing (HPC) products, previously branded as AMD FirePro, have not gained much traction in spite of their double precision floating point performance and support for lots of GPU memory. This is in large part due to the fact that AMD’s GPU software (drivers, libraries, etc.) was designed for a Windows environment for workstations and gaming, not for Linux and the server applications that dominate the datacenter acceleration market. But this may be about to change.
Recently AMD began to reposition their newer workstation and presumably future server GPU’s as “Radeon Pro”, and has launched a completely revamped Linux software stack to enable AMD to compete with NVIDIA in the fast-growing datacenter markets for HPC and Deep Learning. Note that NVIDIA recently generated $151M in these segments during their latest quarter, over twice the level of a year ago. The new AMD software could pave the way for AMD to gain a foothold in this fast-growing market, although the company must address significant challenges to realize this potential.
What has AMD announced?
AMD is delivering a new software stack called the Radeon Open Compute Platform, or “ROCm”, to address the HPC and Deep Learning market. Prior to this software release, the prospective AMD customer typically had to port their code to OpenCL, or use C++ with an entirely different approach for parallelizing their application, to run on an AMD GPU. Now, the programmer simply runs his/her CUDA code through an AMD “HIPify” tool to create the Heterogeneous-compute Interface for Portability (HIP) source code, which can then be compiled using AMD’s new HCC (High-Performance Compute Compiler) or NVIDIA’s NVCC compiler. The AMD code would then execute on a brand new Linux driver, called ROCk, which supports a handful of AMD’s more recent GPUs. This is a smart approach, as it does not place undue burden on the programmer, and allows him/her to continue to maintain a single source code for both AMD and NVIDIA execution. In addition, AMD is providing a slew of libraries, applications, benchmarks, tools and HSA runtime extensions to ease the transition to their hardware for HPC and Deep Learning.
How Well Does ROCm Perform?
AMD has provided data to demonstrate the performance of the new ROCk driver, compared to the prior Catalyst (Windows) driver. As you can see, the new driver delivers dramatically lower latencies to dispatch compute kernels, a key metric for running GPU parallelized codes. AMD also has provided data showing that the use of the HIP abstraction layer for CUDA codes does not significantly impact performance on either AMD or NVIDIA hardware.
AMD says ROCm’s new open source driver dramatically reduces latencies compared to the prior Catalyst driver, which was optimized for Windows, not for computational Linux workloads. (Source: AMD)
Will this help AMD get (back) into the HPC game?
Well, let’s say this software is necessary but insufficient. ROCm provides the needed software for datacenter applications, but AMD’s hardware in this space is now lagging behind NVIDIA. Specifically, while the Firepro S series of GPUs compared favorably to NVIDIA Kepler GPUs in double precision performance and memory size when it was introduced in 2014, those older NVIDIA GPUs are being replaced with the new, more powerful PASCAL generation, which also offer High Bandwidth memory at the high end of the product line (P100 with NVLink). And the newer Radeon Pro products, based on Polaris chips, are designed for professional graphics at the time of this writing, not servers. If AMD introduces a Radeon Pro for servers based on Polaris, it would have solid single precisions performance but would likely not provide double precision math and ECC memory needed for many scientific HPC applications. But that might be ok if AMD wants to focus on codes such as seismic analysis or Machine Learning. For the latter fast-growing segment, AMD would need to augment their current GPUs with devices that natively support ½ precision floating point and 8-bit integer arithmetic to compete with NVIDIA PASCAL. So, while AMD has taken two steps forward with their software, they’ve taken one step back for HPC GPUs from a competitive perspective. We look forward to hearing more about their roadmap beyond Polaris, especially as it relates to support for the specific math operations mentioned above. Even then, the company would need to deploy customer-facing experts and invest in the ecosystem, where NVIDIA already has a strong position.
For more detailed information regarding ROCm, please see the recently published Moor Insights & Strategy research paper on this topic here.