While Intel has been slowly trickling out information about its discrete GPUs based on the Xe Architecture, at its Architecture Day, it took the wraps off its consumer branding for its GPUs, ARC, and also disclosed that the first ARC product would be codenamed Alchemist, to be available in Q1 of 2022. Alchemist targets the Xe HPG (high-performance gaming) and another highly anticipated GPU codenamed Ponte Vecchio targets the HPC (high-performance computing) segment. As you would expect, with different target markets these GPUs also have different core configurations and thermal envelopes. Be sure to check out Patrick Moorhead’s (Chief Analyst, Moor Insights & Strategy CEO) analysis on other disclosures here.
Xe HPG – ARC Alchemist
Intel sees a vibrant PC gaming market as an opportunity to enter the GPU segment with its discrete ARC offerings. Intel’s Xe HPG product line seeks to continue Intel’s generational graphics performance improvements from what Xe LP (low power) achieved inside of Tiger Lake, doubling Gen 11’s performance. With the discrete Xe HPG part, Intel targets a much higher performance profile with a correspondingly high-power envelop.
One of the critical features of the new Alchemist SoC based on the Xe HPG architecture is its new supersampling feature, which will get many comparisons to NVIDIA’s DLSS. Intel is calling this Xe SS, and since it is using neural networks and matrix multiply hardware - which it is calling XMX cores - it does appear to be fairly like NVIDIA’s approach, which is a good thing. However, there is a version of Xe SS that does not require XMX hardware and instead takes advantage of DP4a to maximize the compatibility of the software with its many integrated GPUs.
The Xe Core – Xe HPG
The Xe Core varies across the Xe family of GPUs, with Xe LP, Xe HPG, and Xe HPC having different core configurations. Inside each core are different “engines,” smaller purpose-built cores, primarily vector and matrix cores. Intel calls these the Vector Engines and Matrix Engines. In the Xe HPG core, you get 16 Vector Engines and 16 Matrix Engines, producing 256 bits and 1025 bits per cycle. This design appears to be fairly like NVIDIA on the vector cores but appears to have twice as much matrix multiplication capabilities. Four Xe Core units make up a render slice, which adds ray tracing units, texture samplers, and geometry/rasterization front ends. A whole Xe HPG Alchemist SoC consists of 8 slices of Xe cores with an L2 cache between them all. This part will also be built by TSMC using its N6 process node, considered a leading process node against Samsung’s 8nm (used by NVIDIA), and TSMC’s own 7nm (used by AMD). Intel gave the ARC family a roadmap, suggesting that the future ARC GPUs will be faster than the previous generation, with Battlemage (Xe2) and Celestial (Xe3) appearing to be built on the same architecture. At the same time, Druid is considered Xe Next Architecture.
Xe HPC – For Hyperscale and more
Intel designed the Xe HPC (high performance computing) GPU to address hyperscale computing needs and HPC, commonly used for government research or other large-scale computationally intensive applications like AI training. Code named Ponte Vecchio, Intel has teased the first Xe HPC GPU for quite some time. We already knew that this multi-chip GPU architecture draws as much as 600W per GPU and requires liquid cooling. We already knew that Ponte Vecchio is an absolute monster of a GPU with over 100 billion transistors, 47 active tiles, and five different process nodes. But we now know that the compute tiles, another name for the Xe HPC cores, will be produced on TSMC’s N5 process node and will have 8 Xe Cores and 4MB of cache per tile. The Base tile will be built on Intel’s seven process node and will have 144MB of cache and a PCIe 5 interface. The Xe Link tile, of which there are eight per GPU, will be built on TSMC’s N7 process node and will enable the interconnect of up to 8 Xe HPC GPUs together at a time. Intel claims that Ponte Vecchio will have a peak 32-bit floating point (FP32) throughput of 45 TFLOPS, a theoretical maximum compute capability. While this isn’t necessarily an accurate measure of performance, it does give a rough frame of reference of what to expect before drivers and other optimizations, and using traditional GPU computing tasks without AI or ML at lower precision like INT8. The Aurora ExaScale Supercomputer at Argonne National Laboratory will utilize Intel’s Ponte Vecchio.. This liquid-cooled solution leverages two of Intel’s new Sapphire Rapids CPUs with six Ponte Vecchio GPUs in a single blade multiplied by many rack mount units per rack, then scaled up into many racks to achieve over an exaflop of performance.
Xe Core – HPC
The Xe Core for HPC has double the vector engines per core and four times as many matrix engines per core which should give you an idea of how much computing Intel is planning on doing with these cores, especially AI computing. With this core design, Intel targets the AI and ML applications on which many of its customers are focused, and will likely compete directly with NVIDIA’s A100. Much like the Xe HPG part, the Xe HPC design combines Xe Cores into a slice, except with the Xe HPC, you get sixteen cores per slice instead of four, an indication of how much more compute is packed into this GPU. Additionally, the XE HPC slices also include ray tracing units, signalling intent to use these GPUs for actual 3D rendering and not just for compute purposes. While we don’t have any details on the ray-tracing performance, it is good to know that even Intel’s HPC parts support ray tracing, which could be utilized for high-performance cloud render farms. A single Xe HPC stack comprises up to four slices (64 Xe HPC cores and 64 ray tracing units), lots of L2 cache, and eight Xe Links. There are also 4 HBM2e controllers for the HBM2e memory on the GPU. A 2-stack configuration offeredeffectively doubles everything.
Overall, we’ve confirmed many details about Intel’s upcoming GPUs and gotten a better idea of where the company is going with its graphics products. Intel made it quite clear that it invested in both matrix multiplication cores (XMX) and ray tracing as the future of gaming and HPC. This may explain why Intel didn’t include XMX cores in its consumer CPUs (Alder Lake), which it also gave details during Architecture Day 2021. Intel is poised to be a competitive player in the mid-range of the consumer GPU market with Alchemist. It remains to be seen how Ponte Vecchio will compete against NVIDIA in HPC and AI tasks. However, the preliminary numbers and figures look pretty promising for Intel; even if Intel’s Ponte Vecchio looks expensive per GPU based on the sheer number of cores and amount of memory, but NVIDIA’s high-end products aren’t cheap either. I think that the future for Intel’s GPU business will be clearer toward the beginning of next year when Xe HPG starts to ship in volumes and when we get a better idea of the availability of Ponte Vecchio and who else has adopted it. At this point, Intel has shown us that it has what many would consider a competitive architecture and the industry is eagerly awaiting a 3rd competitor in the traditionally duopolistic discrete GPU market.
Be sure to check out Patrick Moorhead’s (Chief Analyst, Moor Insights & Strategy CEO) analysis on other disclosures here.
Note: Moor Insights & Strategy writers and editors may have contributed to this article.