Intel Architecture Day 2020 Gives A Glimpse Into A Brighter Future

Intel’s Raja M. Koduri, SVP, chief architect, and GM of Architecture, Graphics, and Software 

Intel is in the midst of a substantial company-wide transition that is changing nearly every way the company does things. Those changes really started to show themselves last year when Intel made a significant series of disclosures at its 2019 Architecture day. Intel has not had an easy last year or two, especially in the desktop, mobile and server CPU space, where it has lost some share to competition in those spaces but still beat earnings expectations

This year’s architecture day was 100% virtual due to COVID-19, but the sheer volume of disclosures was more than ever before and really shows how even though Intel may be down in 7nm, Intel is far from out. At Intel’s 2020 Architecture day, the company continued on the path of the six pillars that Intel started last year, focusing on critical areas of strength for the company and how it can and will continue to grow as a powerhouse in the semiconductor industry.

The 10nm Process Node and SuperFins

Intel’s process nodes have been a struggle for the company with delays to both 10nm and now 7nm but made many intranode performance improvements to both 14 and 10nm processes. The company is still fundamentally dependent on its own process node technology to manufacture its chips. While this isn’t the case for all of its chips, as Intel does outsource some to TSMC, Intel’s fabs and the process nodes those fabs deliver are still crucial to the company’s success right now. This is where Intel continues to innovate with the company’s own rethinking of the FinFET, which the company initially introduced at 22nm.

Intel is both refining and redefining the FinFET with a multitude of gate innovations, including improved gate pitch and process to improve channel mobility and drive current. Intel is also creating what it calls these SuperFins, which have Super MIM (metal insulator metal) capacitors, which increase MIM capacitance by what Intel claims to be 5x and Novel Thin Barriers which are designed to reduce via resistance by 30%. All these FinFET innovations and redesigning of what a FinFET looks like has resulted in some of the fastest, if not the fastest transistors in the world. 

In fact, Intel’s improvements in the 10nm process and architecture are so significant that the company is claiming a nearly 20% performance improvement over 14nm. With 14nm, Intel made small incremental performance improvements of roughly 4-5% per refresh (+++), which amounted to about 20% across four different CPU architectures. Intel is achieving this with a single step rather than four, which is what makes this transition to 10nm much more significant than many realize.

Largest Intranode Performance Delta In Our History

The way that Intel is thinking about process nodes is also important to consider because the company is not focusing solely on the nomenclature that the rest of the industry has been using. Sure, this is absolutely to Intel’s benefit, but it is also because Intel is finding different ways to improve performance per watt without having to continually push the envelope of process size or creative measurements. In fact, Intel has disclosed that it is already planning to create enhanced SuperFins which improve performance further as well as bring new interconnect innovations and are optimized for data center.

While likely behind in density against so-called “5-6nm” processes, if you look at Intel’s new entire transistor “stack,” it is likely the highest performance in the industry. This was exemplified in the Willow Cove disclosures which we talk about next.

Packaging and Interconnects

As we have mentioned before, Intel is also innovating on packaging and interconnects that give it the flexibility to build things like Intel’s Lakefield processors. In the future, Intel expects to use TSV (through substrate via) hybrid bonding to reduce the pitch between dies to reduce 5x (from 50 um to 10um), which allows for smaller, simpler circuits as well as lower power and capacitance. Intel has also previously disclosed packaging technologies like Co-EMIB for allowing the stacking of compute and memory dies both horizontally (2D) and vertically (3D) on the same chip, allowing for much larger chips than could be produced monolithically. In addition to Co-EMIB, Intel has also previously talked about ODI, which it is still clearly pursuing, which allows for a tighter 3D integration of dies than what Foveros offers with direct power delivery and higher bandwidth interconnects. Intel is also expected to bring a high-performance follow-on to Lakefield with Alder Lake, which is expected to combine Intel’s Golden Cove and Gracemont cores together into a higher performance hybrid architecture. However, this is not the limit of what Intel can do with packaging and interconnects, as the company is planning to us optical IO in the future with ultra-high bandwidth 1 Tbps per fiber. Intel is expecting to deliver 6x better density than PCIe Gen 6 (which is not expected to be finalized until 2021). Intel’s optical IO is also expected to have 50% better power efficiency than PCIe Gen 6 and have comparable latency to electrical IO.

Related to the future of design and packaging in the industry, which I see as chiplets from many companies coming together in a 3D package, I feel confident Intel is extremely competent, maybe even the current lead. We will have to hear more on what TSMC is doing at its event in a few weeks. 3D is a long-term Intel strategy and industry shift, and I don’t see today’s Lakefield representative of what will be many designs in the future. The package commitments of power and bandwidth are compelling and were the most impressive.

Tiger Lake and Willow Cove

Tiger Lake and Willow Cove are Intel’s new 10nm architectures with the Tiger Lake SoC leveraging the Willow Cove CPU architecture. The new Willow Cove architecture leverages the new high-performance SuperFin transistors that ad to the overall improved metal stack. The Willow Cove CPU cores build on the success of the Sunny Cove architecture, and thanks to the improved transistors and architectural improvements translate to a complete phase shift of the voltage and frequency curve to deliver a greater dynamic range in terms of core voltage and clock frequency. This greater dynamic range is what translates to the roughly 20% gain in CPU performance over the previous generation. While Willow Cove’s architectural details are somewhat limited, Intel has said that it has redesigned the caching architecture to a larger non-inclusive, 1.2MB MLC. Intel has also said that Control Flow Enforcement technology has been implemented to protect against return/jump oriented attacks. That said, there are still many more improvements that other will go into more detail that show how Intel is getting an almost 20% gain in performance.

Tiger Lake appears to deliver on what Intel has been saying it wants to do by leveraging its 6 Pillar Strategy. It showed how it could get a 13-25% performance bump just in CPU core scaling with Willow Cove, which is significant as this core will scale across all product lines, desktop, notebook, and server. I think we will see even more significant scaling in its GPU and ML performance as its adding more gates and ASIC capabilities.

Xe-LP inside Tiger Lake

The Tiger Lake architecture is much more than just a CPU; it is also a combination of different fabrics, memory, co-processors, and now a GPU. On the Tiger lake SoC, Intel is also introducing a new GPU based on the company’s Xe graphics architecture, the Intel Xe-LP, which is specifically designed to operate around a 15W TDP but can be dynamically adjusted between 10 and 28W depending on the system design. The Xe-LP GPU is designed to be a replacement for Intel’s 11th Gen graphics architecture and is technically Intel’s 12th Gen GPU architecture but is expected to do away with the Gen naming scheme and follow Xe. The new Xe-LP inside Tiger Lake supports up to 96 EUs (execution units) and has wider EUs than 11th Gen and has a shared thread control for pairs of EUs for improved efficiency. 

The new Xe-LP is designed to be the base-level of the Xe GPU architecture and can and will be scaled up to enthusiast, datacenter, and even exascale HPC. Like Willow Cove, Xe-LP is designed to have a much broader dynamic range compared to Gen 11 graphics in terms of both clock speeds and voltages, which means we could see faster GPUs from Intel at the same power levels or significantly faster GPUs at a slightly higher voltage. I believe that the Xe-LP GPU in the Tiger Lake SoC looks promising enough to give the competition’s entry-level offerings a run for their money in ways that Gen 11 came very close to doing. However, the GPU market is constantly moving, and we expect pretty significant updates from the competition on GPU architecture this year, especially with Nvidia teasing a GeForce announcement in September.

The Xe-LP GPU also has a new media engine that doubles encode/decode throughput, adds AV1 and HEVC screen content coding support, and can playback 8K60 content in HDR/Dolby Vision. The display engine within the Xe-LP GPU has four display pipelines, which allows for dual eDP and support DisplayPort 1.4, HDMI 2.0, Thunderbolt 4, and USB 4 Type-C as outputs. Intel says that it can display up to 8K and support HDR10 and Dolby Vision with up to 12-bit BT2020 color depth and a 360 Hz adaptive sync refresh rate. Intel has told us that Xe can support up to 4x 4K60 HDR, 2x 4K120 HDR, or 8K60 HDR with compression. This architecture is what also powers Intel’s first discrete GPU, the DG1, which is in production and on track to start shipping this year.

Tiger Lake’s platform improvements

In addition to offering 10nm and the fastest CPU and GPU cores in the company’s history, Tiger Lake also brings a multitude of additional and welcome platform improvements. One big improvement comes with the coherent fabric and last level cache (LLC) with a dual ring microarchitecture and 50% LLC size increase to non-inclusive cache, mentioned earlier. This results in a 2x increase in coherent fabric bandwidth, which is important to keep all the different cores, memory, and GPU cores fed with data. In fact, Intel is increasing memory bandwidth significantly with support for up to 86 GB/s of memory bandwidth and the inclusion of a dual memory controller subsystem. This is how Intel is able to add support for not only LP4x- 4267 MHz and DDR4-3200 but also in the future support for LP5-5400, which means that Intel’s memory controllers are already supporting DDR5.

Intel also includes an updated GNA 2.0 (Gaussian and Neural Accelerator), which is designed for low power neural inferencing. One of the common applications that this core is going to be used for initially is for neural noise cancellation, which can be used for images or sound and results in a 20% lower CPU utilization on GNA. On the display IO side, Intel is trying to add support for more displays at a higher resolution and quality, which is clearly illustrated by the Xe-LP’s media and display engine capabilities. To accomplish this, Intel has a dedicated fabric path to memory to maintain the quality of service. This connection provides up to 64 GB/s of isochronous bandwidth to memory. The 6th Gen IPU (Image Processing Unit) is designed to allow the Tiger Lake platform to support video at 4K90 with initial support for 4K30 and still image support for up to 42MP with the initial support of 27MP.  

In addition to GNA 2.0 and display IO, Intel is making major improvements to overall system IO with integrated Thunderbolt 4 and USB 4 support, enabling up to 40 Gbps of bandwidth on each port. While Intel has already integrated Thunderbolt 4 into previous architectures, this will be the first with support for USB 4 and should be one of the first USB 4 capable platforms. Because USB4 and DisplayPort are collaborating to improve compatibility and interoperability, this means that Intel can support display through Type-C, including DP Alt mode over Type-C as well as DP tunneling over Thunderbolt. There will no longer be a need for any ports on future PCs other than a Type-C connector; in fact, we’re starting to see that with some high-end laptops like Dell’s XPS 17. In addition to USB4 support, Intel is also adding PCIe Gen 4 support as well, which means that Intel is finally catching up to AMD with PCIe Gen 4 and will be able to take advantage of some of the superfast NVMe drives that exist in the market today. This is a welcome improvement, and while it will only be 4x lanes at the beginning with the first SKUs, we expect that Intel will offer more PCIe Gen 4 lanes for higher performance higher wattage products that might use a discrete GPU, like the H-Series.

Software will matter a lot on ML in particular for the hardware to matter, and I’m looking forward to seeing more details on what software can actually take advantage of the new capabilities. While it’s unclear how Tiger Lake this competes with AMD’s 4000 at this point as this wasn’t the product launch, the company did seem to scale Tiger Lake pretty well and add new technologies where it needed it.

Xe Architecture Updates

In addition to the new Xe-LP designed for server, embedded, and mobile applications, Intel also disclosed that there would be a Xe-HPG variant. Xe-HPG is a new gaming optimized variant of the first Xe-HP GPU which takes the performance-per-watt architecture from Xe-LP and combines it with the scale of Xe-HP for a bigger configuration and compute frequency optimization from Xe-HPC. It will have a new memory subsystem based on GDDR6 and will have hardware-accelerated ray tracing support and is expected to ship in 2021. Interestingly enough, if you look at the slides from Intel’s Architecture Day, you’ll also notice that the XE-HPG will be manufactured at an outside foundry, which many of us will assume is most likely going to be TSMC. In addition to that, Intel also talked about how GPU driver architectures are going to change with Instant Game Tuning so that users can get game optimizations faster through automatic pushes through Intel’s own driver management suite so that only game-specific fixes/optimizations are made.

Intel also detailed how Xe-HP performance can be scaled through a tiled approach leveraging anywhere from 1 to 4 tiles per GPU to scale performance based on need and has already powered on and is back from the labs. Intel also detailed how Xe-HPC (Ponte Vecchio) will be manufactured as it leverages FOVEROS and Co-EMIB to combine different dies manufactured on different processes. The base tiles and ‘Rambo Cache’ tiles will be manufactured using Intel’s 10nm while the compute tile will be manufactured using Intel’s ‘Next Gen’ process as well as an external fab. The Xe Link, I/O tiles, will also be manufactured externally, just like the Xe-HPG GPU for gamers. Intel’s SG1, DG1, and Tiger Lake products will all be manufactured with Intel’s new 10nm SuperFin process while the Xe-HP will use EMIB for die linking and Intel’s 10nm Enhanced SuperFin, which should also make that a 2021 product as well.

While I liked what I saw on paper, I’m reserving judgment on all Intel’s discrete end products until it ships an end product into the market.

Products, Packaging, and Process Overview

If you thought that was a lot, let’s just say we only covered about half of what Intel talked about at the Architecture Day.

Wrapping up…

All these major architectural developments across the board come in the backdrop of Intel currently losing some CPU unit market share along with a 6-month 7nm delay while still exceeding Wall Street’s expectations on multiple earnings. After Intel’s Architecture Day, I feel better about its future as it didn’t ignore the obvious fab issues and, at the same time, gave reasons to believe architecturally, it could be back on top again. 

Tiger Lake appears to deliver on what Intel has been saying it wants to do by leveraging its 6 Pillar Strategy. It showed how it could deliver a 13-25% performance bump in CPU core scaling with Willow Cove, and I think we will see even more significant scaling in its GPU and ML performance. Software will matter a lot when it comes to ML, and I am looking forward to seeing more details on what software can leverage the new capabilities. While it is unclear how Tiger Lake competes directly with AMD at this point as this was not its actual product launch, the company did seem to scale Tiger Lake’s performance quite well and has added new complementary IO where it was needed. 

While Intel is likely behind in density against “so-called” 5-6nm processes, if you look at Intel’s new entire transistor’ stack’, it is likely the highest performance in the industry as was exemplified by the Willow Cove disclosures. Related to the future of design and packaging in the industry, which I see as chiplets from many companies coming together in a 3D package, I feel confident Intel is extremely competent, maybe even the current lead. This is a long-term strategy and industry shift, and I do not see Lakefield PPW representative of what many designs in the future will be, especially with Alder Lake coming down the pipe next year. Intel’s commitments to improving packaging and IO are among some of the most impressive ways that I see them navigating this new competitive environment and should yield some very interesting products down the road, like Xe-HPC (Ponte Vecchio). As always, execution is key. This was not a product launch, rather a technology disclosure, without any discussion of volumes or ability to execute.

Note: Senior analyst Anshel Sag contributed significantly to this analysis.