Intel and Cray Computer recently announced they will build the Aurora exascale supercomputer at the DOE’s Argonne National Laboratory in 2021, with Intel as the prime contributor and Cray Computer as a subcontractor. Exascale means the system can sustain one quintillion floating-point operations per second, a milestone the Chinese government hopes to reach in 2020. The two companies issued coordinated press releases touting the massive scale of this project, which will be critical to the US DOE, the reputation of US computing, and thousands of scientists across the country. The deal for Argonne’s supercomputer was originally awarded to Intel in 2017; however, this announcement marks the first reaffirmation that the deal will proceed, after Intel canceled the Knights Hill chip Aurora was originally planning to use. Aurora will be used in a wide variety of applications, including cancer research, extreme weather forecasting, mapping the human brain, developing new materials, and understanding how the universe began.
Where's the beef?
Intel has remained quiet on the chip architecture it will use for Aurora. We now know that the system will be built using 200 Cray Shasta chassis, interconnected with the recently announced Cray Slingshot interconnect, and using a next-generation Intel Optane persistent memory. Intel did give the new processor a name, calling it the Xeon Xe compute architecture (I assume e is for Extreme?). Additionally, it said the processor will combine capabilities for High Performance Computing and Artificial Intelligence.
HPC and AI are becoming quite symbiotic, with AI being used to accelerate and augment HPC research. This trend benefits NVIDIA, as I previously explored in this blog. Still, it is difficult to design a single chip that can be good at HPC, which uses legacy x86 codes that typically require 64-bit double precision CPUs, and good at AI, which uses less precision but demands massive parallelism found in GPU accelerators.
Possible scenarios for Intel's next chip
One possibility for this Xe chip is what Intel calls a Configurable Spatial Accelerator, or CSA—a dataflow engine which bears more of a resemblance to startup GraphCore’s Intelligence Processing Unit (IPU) than it does to a Xeon CPU or an NVIDIA GPU. CSA, however, is a fairly radical approach, as it would not run standard x86 instructions nor the tens of thousands of HPC applications common across the industry. By calling it “configurable,” Intel implies that it can build or configure a myriad of processors tailored to specific workloads. It is not at all clear whether this new architecture would be a monolithic chip or built into some sort of hybrid. The only details I have seen on CSA are Intel’s patent filings, and an excellent article by Timothy Pricket-Morgan, of NextPlatform.
A hybrid chip could be built with (x86) CPU cores, AI ASICs based on Nervana or the CSA, and possibly even FPGAs (although I doubt that). Xilinx is taking this approach with its next generation Versal chip on 7nm, which is sort of a silicon swiss army knife, combining ARM CPU cores, an AI engine, a DSP, and configurable FPGA logic to accelerate a wide variety of datacenter workloads. NVIDIA takes a somewhat similar approach with its Xavier SOC, a hybrid chip that has ARM CPU cores and multiple accelerators, and is at the heart of the company’s autonomous vehicle aspirations.
There is another possibility that would be less challenging—combining a CPU chip with an AI chip (possibly based on the Nervana design Intel acquired in 2016). Such a processor could be built on a multi-chip module, or on a single die as AMD has demonstrated with its laptop “APU” designs. Raja Kuduri, whom Intel hired from AMD, certainly knows how to build such a package so I wouldn’t rule this out. The advantage is that it could potentially be faster than a CPU plus GPU; still, it would have significant memory bandwidth challenges to overcome.
I should also point out that one of the unique features of the Cray Shasta design and its Slingshot interconnect is that it can support just about any type of processor and accelerator, be it x86, ARM, IBM POWER, GPUs, FPGAs, or ASICs. Therefore, one also has to wonder if Mr. Kaduri plans to build a discrete GPU for Shasta. Still, I suspect that the Nervana engine would be much more likely since Nervana is targeted to surpass GPUs in performance.
Intel has reaffirmed its intent to deliver the Argonne supercomputer, but we will all need to wait a while longer to learn what magic it has up its sleeves. Cray’s Shasta design gives Intel a strong platform from which it can build a flexible system, capable of building an integrated system out of virtually any processor design or designs it lands on. Clearly, Intel is taking a co-design approach, tailoring its next-gen chip for HPC and AI to meet the needs of the largest supercomputer site in the USA. Let’s hope we hear more details soon!