Intel’s Architecture Day Showed The Company’s Breadth, Depth, And Future

Intel Architecture Day INTEL

When providing guidance to aspiring tech industry analysts, one thing I always say is, “you don’t have to love to write, but you can’t hate to write.” Analysts write a lot. And when there’s a boatload of content, sometimes it’s hard to know where to start. 

This was my challenge with Intel’s recent Architecture Day. The company covered technical details behind new CPU cores, accelerated computing blocks, processors and graphics cards that include those cores and blocks, software that enables those processors and graphics cards across PCs, servers, storage, and networking for commercial and consumer applications. Like I said- a boatload of content.

In my columns, I typically share enough of the news to provide an opinion which is the real analyst value, but for this analysis, I want you to read fact sheet here first and I will provide my opinions and key takeaways.  Otherwise, this report will be 10,000 words long. 

A final caveat- Intel did not launch products at the event (hence “architecture day”) and therefore, it’s hard for me to surmise how competitive the products will be. Product frequencies, core counts, die size, and power draw were not shared but I will do my best to extrapolate competitiveness.  Be sure to check out Anshel Sag’s, Moor Insights & Strategy principal analyst, deeper analysis on Intel’s GPU architectures and product updates. 

Let’s start with the new CPU cores.

Heterogeneous X86 “P” and “E” CPU cores

The reason we start talking CPU cores separate from the processor is that most CPU designers reuse designs and IP across a myriad of designs.  

For years, Intel espoused the notion of cores that spin up for performance and spin down to conserve energy and the company was successful with this for over 25 years. Arm came along with its big.LITTLE architecture, used by companies like Apple and Qualcomm. Those companies used big.LITTLE designs for smartphones and now PCs where the “big” cores did the higher-performance chores while the “LITTLE” cores performed background tasks that required less performance at the lowest power. Arm and its partners had growing pains at first as managing workloads between smaller and larger cores is tough and you need some help from the OS provider. I’d say it took Arm and its partners two to three cycles to get heterogeneity “right”. 

At its Architecture Day, Intel introduced the idea of an “E”-core for efficiency and a “P”-core for performance and an algorithm called “Thread Director” to manage those threads effectively. I asked tons of questions on Thread Director with Windows 11 and came away satisfied that Intel and Microsoft learned from what Arm and Android did together, and maybe even it’s a superior implementation. At least that was the claim. Don’t forget that Intel knows a lot about thread management as it has had SMT in its processors for decades. I consider MIC (Many Intel Cores) a market failure but that was all about thread management, but, unfortunately it was working on the wrong kinds of workloads trying to do what a GPU did.

The most interesting thing for me about the P-cores (for datacenter) were Advanced Matrix Extension (AMX) that Intel claims accelerates INT8 machine learning by 8X. Also interesting is that AVX512 is being removed from client processors which means PC ML acceleration will be delivered by AVX2-VNNI on the CPU and DP4a on the GPU. There were no AVX2-VNNI performance claims which I thought was a little odd. We’ll have to wait for the product launch.  

Overall, the performance claims of P-cores and E-cores were impressive but there are so many caveats about any vendor’s claims at this juncture, it’s not worth taking a shot at how they compare to the competition. I can say that P and E-cores are big leaps over their predecessors architecturally.  

Alder Lake PC processor

I talked about the CPU cores above that will be used across client and datacenter product lines. Those P and E cores are then integrated into the Alder Lake processor with other IP blocks for things like I/O, memory, graphics, etc. and then packaged for the appropriate notebook or desktop.  

Alder Lake takes up to 8 P-cores and 8 E-cores to deliver a max total 24 threads, 16 from P-core and 8 from E-core. If this is where the products starts and ends, it says that AMD could likely still have more threads which could be important for higher-end corner case workloads, primarily desktop. Intel will combat that with its “true performance” push saying that most people and apps can’t take advantage of all those cores and that Intel’s ML inference capabilities trump a lot of cores that are idling. This is why I want to know the ML performance of the AVX2-VNNI and the ISVs and workloads that have supported it. Intel had ML acceleration prior but I don’t think it got it much credit in the market as few applications supported it.

I am also interested to see more interesting configurations of P and E-cores and how those would perform and be marketed. Certain geographies care more about the number of cores that the performance of those cores. An 8 E-core configuration in a low priced, low power notebook would certainly be interesting. You want cores? How about an $399 thin and light notebook with 8 Pentium cores?  

Off the record discussions lead me to believe Intel will take back the integer performance lead on a per P-core basis and therefore would mean, Intel will continue to have a robust commercial, notebook and lower core count desktop presence. It’s early though, folks, so we have to wait for an Alder Lake launch, systems to be benchmarked and AMD’s response. 

Also remember that the game isn’t all about performance per workload. AMD has brought the heat the past three years yet can’t muster over 25% market share. That says a lot. The game is also about getting the right and enough platforms created with OEMs and ODMs, getting ISVs to optimize for special features, and co-marketing those through the channels of distribution. This is where Intel has always held a substantial advantage and I don’t expect that to change any time soon.

Xe HPG architecture, Alchemist GPUs, Arc brand for gaming

Intel has expanded its definition of compute over the years from CPU to GPU, NPU, IPU, fixed function accelerator and even FPGA. Intel had done integrated graphics for decades, tried to get into discrete graphics but it never panned out. 4-5 years ago, taking a radically different (for Intel) software first and architecting a “real” GPU (versus MIC). 

Intel introduced three things I thought were most interesting. First, the Alchemist product under the Arc brand is fabbed on TSMC’s N6 node. This gives me manufacturing and deliverability confidence. Sure, it was one of the industry’s worst kept secrets, but it was good to see it official. Secondly, it introduced a new, ML-enabled, upsampling technology called “XeSS”, and third, it has special blocks for ray-tracing, both in a similar methodology as NVIDIA versus AMD. I like this approach as NVIDIA is seen as the highest performance PC gaming graphics so no one will take pot shots at the implementation, only the results. Also, like NVIDIA, I believe Intel will likely have some heavy lifting to get ISVs to adopt it. I am hopeful Intel takes more of a oneAPI approach to this that supports AMD, NVIDIA and Intel graphics to make it easier for game developers.   

Net-net I don’t have a scenario where Intel doesn’t gain discrete gaming graphics market share. Intel will gain share. Worst case Intel slips a quarter and AMD and NVIDIA refresh, Intel still has a solid high-performance midrange product.   

Sapphire Rapids datacenter processor

Alder Lake was for PCs and Sapphire Rapids is for the datacenter and will be branded Xeon Scalable Processor. Unlike Alder Lake, Sapphire Rapids is a modular design, more like AMD’s EPYC using Intel’s EMIB packaging. If Intel can minimize latency between subsystems, an early issue AMD had, I love this decision. I love it because it should reduce fab risk and design variations should be cheaper and faster to engineer. Instead of having to manufacture a larger monolithic die on Intel 7, it only has to fab smaller tiles and vary the design in the package. Smaller dies should increase good die per wafer as well versus a monolithic die.

Like Ice Lake, Intel is really investing in acceleration of key workloads and algorithms, including machine learning (AMX), cryptography, data streaming, crypto, compression, decompression, and even microservices. This acceleration is enabled by a combination of fixed function accelerators, architecturally provided, and optimized algorithms. 

Honestly, I have no idea how well Sapphire Rapids will perform or how much power it will draw with a high degree of confidence. From what I know right know, and based on background conversations, I think Intel will be very performance competitive in accelerated workloads I outlined above. The microservices and streaming performance are very compelling and not corner cases, but very mainstream. As soon as I get more information, I will ratchet that opinion up or down. 

Don’t forget, performance and power is only part of the datacenter game as Intel still has 85% server market share. Like client, in the datacenter, Intel funds designs across OEMs and ODMs reducing their R&D spend, invests colossal resources in software with ISVs and open source, and finally, funds most enterprise marketing programs. At this juncture, I have not seen AMD prepared to make that investment.  

Mount Evans Infrastructure Processing Unit (IPU)

Most all of you know what a CPU and GPU is. But let’s talk about what an IPU is, called by NVIDIA and Marvell as a “DPU.” As I said here, an IPU is designed to offload the main CPU in the datacenter and edge, which gives more predictable and efficient application performance and enables improved virtualization capabilities that CSPs and carriers are seeking. IPUs has become a big industry discussion point as the datacenter and the edge evolves.

Intel announced its first ASIC-based (versus FPGA) IPU called Mount Evans for CSPs. Intel already has FPGA-based IPUs. The company said it co-developed it with a “major CSP” and I’m guessing it’s Microsoft’s Azure given all the FPGA work Intel did at the edge with Azure.

One big surprise in the disclosure was that the Compute Engine was Arm Neoverse N1-based. Surprise!

To be honest, it’s hard to gauge competitiveness versus Marvell DPUs at this point and so far and NVIDIA is really focused on the enterprise datacenter, not CSPs, with its DPUs.

Xe HPC architecture, Ponte Vecchio GPUs, for HPC and AI

Xe HP”G” architecture, Alchemist GPUs, and the Arc brand are for high performance “g”aming GPUs but the Xe HP”C” architecture and the Ponte Vecchio GPUs is “c”ompute for HPC and AI GPUs. Intel calls Ponte Vecchio an “SoC”, but it’s hard for me to call anything that can’t boot an “SoC”. 

Intel’s going straight after NVIDIA’s A100 with Ponte Vecchio with Xe HPC architecture. The company’s claims of 45 TFLOPS FP32 and ResNet training and inference numbers put it on top, but you can’t do a full assessment based on two benchmarks on A0 silicon versus a shipping product. Plus, NVIDIA keeps updating its software and performance.

I am intrigued about the Xe HPC’s scalable architecture and was glad to see ray tracing units for post visualization. Ponte Vecchio is a real beast with over 100B transistors, 3D Foveros stack PLUS EMIB combining an Intel 7 base tile and a TSMC N5 compute and link tile. Intel says the GPU will be released in 2022. 

The Ponte Vecchio GPU can do ML training and inference and so can Habana Gaudi, but Gaudi is ASIC-based, not GOU-based. I like to see Intel hedging its bets here but I’m wondering how long that makes sense. My guess, after looking at all of the other ASIC-based ML training plays and gigantic model sizes, it’ll be a long time.

Wrapping up

When Intel CEO Pat Gelsinger arrived back on the scene, he said he wanted to provide more and better technical information earlier. While I know some Architecture Day attendees wanted even more, I was satisfied with it. For me, architecture days are not an end state, but an interim point until the actual products are launched. And that is exactly what it did for me in many areas. I will wait patiently for the product launches to fill in the rest of the blanks.  Intel did move the ball down the field, but before the actual product launches, it’s hard to give a proper assessment.

Be sure to check out Anshel Sag’s, Moor Insights & Strategy principal analyst, deeper analysis on Intel’s GPUs.