Xilinx launched its next-generation 7nm Versal product, based on the Adaptable Computing Acceleration Platform, at the Xilinx Developer Forum in San Jose this week. It also announced its first production PCIe board, called Alveo, around which it is already building a thriving ecosystem. Let’s dive in to understand what was announced and what it might mean for Xilinx and buyers of acceleration platforms for AI and other high-performance applications.
ACAP is essentially a hybrid acceleration platform that purportedly melds the programmability of CPUs, the performance of ASICs (Application Specific Integrated Circuits), and adaptability of FPGAs on a single chip. At a high level, this is all about making it easier to quickly develop apps without the need for hardware development skills. In an address to the participants of the Xilinx Developers Forum, Xilinx CEO Victor Peng was confident that the new Versal family will muscle Xilinx into the Datacenter and 5G acceleration infrastructure markets.
The concept of Domain Specific Hardware is not new, and has always been the foundation of ASICs. However, developing an ASIC is an expensive undertaking, requiring tens of millions of dollars and perhaps years to design. Additionally, programming an FPGA is a challenging process, requiring deep hardware and software development skills to get the most performance out of the device with custom logic. With Versal, though, Xilinx hopes to offer an acceleration platform for a wide range of applications—programmed in software and hardware, with better efficiency than alternative platforms, and much easier to program than traditional FPGAs.
What is “Versal” and what is it good for?
As I wrote last month after Mr. Peng revealed select details at the annual Hot Chips Conference, the Versal (derived from “Versatile” and “Universal”) chip adds significant features to a traditional FPGA. These features allow one to build an acceleration platform with standard software interfaces to acceleration engines, I/O, and memory. It consists of two dual-core ARM CPUs, an array of adaptable hardware, a block of Digital Signal Processors (DSPs), an optional block of “AI Engines” (vector processors and memory), an on-die low-latency interconnect network-on-a-chip, and an impressive array of next-gen I/O, network, and memory controllers. Versal is a veritable Swiss Army Knife of processing accelerators for a wide range of applications and can be programmed in languages such as C, C++, and Python.
Figure 1: The Xilinx Versal “ACAP” chip provides a veritable swiss army knife for datacenter acceleration. XILINX
There will initially be two versions: the Versa Prime, for Wireless 5G and other apps that can be accelerated with DSPs, and the Versa AI Core, for accelerating inference processing of deep neural networks with an array of vector processors for SIMD (Single Instruction on Multiple Data). Xilinx also laid out its ACAP roadmap, with a portfolio of six product series targeting different workloads.
Figure 2: Xilinx announced that there will ultimately by six different Versal series, each with many product feature options for fine-tuning applications. XILINX
The real power of the ADAC approach is its potential to accelerate an entire application workflow, not just the 5G and DNN portions of an application. This vision is quite powerful but may be challenging to realize, as developers will need to think through how to use the programmable hardware, the DSPs, and the vector engines that accelerate DNNs for AI. To facilitate this process, Xilinx says it has developed a new unified software development platform that provides tools to customize traditional software’s execution to the right execution engine. Sounds like magic to me; we will need to await real user experiences to validate the company’s claims. However, I spoke with several developers at XDF who have successfully experienced this software-centric approach with current FPGA products. It holds a lot of promise.
Figure 3: Xilinx is preparing a software development environment that is designed to simplify the deployment of standard software to specific execution engines in the Versal platform. XILINX
While it may take a while for developers to take advantage of the heterogeneous hardware, the rewards could be substantial. Xilinx shared industry projections from Barclays that show inference processing eventually growing larger than the training market that currently dominates investments in AI in the datacenter and at the edge (See Figure 4). The edge and datacenter inference apps are the markets that Xilinx is targeting. While I agree that inference will become the larger market, I have quibbles with the projection that the training market will flatten in just a couple of years. I believe we have just begun to scratch the surface of the universe of applications in which AI will disrupt the status quo.
Figure 4: Inference Processing in the cloud and at the edge will outgrow the currently burgeoning AI training market, currently dominated by GPUs (NVIDIA). XILINX
So how good is Versal? Xilinx showed data that indicates it can outperform CPUs and GPUs for inference processing, especially in low-latency applications and small batch sizes. At sub 7ms latency, Versal outperforms a high-end GPU by 2.5 times, but at sub 2ms latency it can outperform this GPU ( NVIDIA V100) by 8X with a batch size of one. This is typically a strong story for FPGAs, while GPUs typically batch queries to amortize the overhead of CPU-GPU communications over PCIe and to more fully utilize the GPU silicon. I have to point out that V100 is not primarily used as an inference platform, and that sub 2ms latency is beyond the needs of most datacenter applications. Still, in a fast-moving vehicle that latency and single batch size become mandatory. Here, a more fitting comparison would be to the NVIDIA Xavier SOC announced at Hot Chips. We will have to wait a while for that comparison and others to be made.
While these benchmarks are impressive, the real potential for Versal lies in accelerating an entire application, not just the inference portion. A common example here would be in accelerating 5G applications that also require inference processing, such as wireless 5G BeamForming and CloudRAN. In addition, the reconfigurability of the FPGA part of the Versal has significant value in a fast-changing world like AI, where new network algorithms are constantly being invented.
I should point out that the first two Versal series, Prime and AI Core, are actually composed of a wide range of parts. The AI Core series, for example, can be configured as a small 31×31 mm device with 128 VLIW SIMD vector cores and as large as 41×41 mm with 400 cores.
Xilinx has been touting ACAP as a “new category” since Mr. Peng became CEO earlier this year. I was initially skeptical, thinking this was just a natural evolution of the earlier Xilinx Acceleration and ReVision Stacks effort. Now that I more fully comprehend the vision and potential use cases for Versal, I believe that this approach has tremendous potential—not only as a compute acceleration platform for cloud and edge applications, but also for storage, networking, and security acceleration.
In a couple years, we may well look back at this announcement as a pivotal moment that transformed Xilinx from an FPGA company into a provider of innovative domain-specific acceleration platforms, across a wide range of industries and applications.