Every year, like clockwork, ARM unveils its latest processing core designs at Computex. ARM makes a habit of increasing performance by double digits almost every single release (which its competitors can’t claim). I believe these double-digit performance increases have enabled the ARM ecosystem to grow to a point where ARM cores are prevalent in nearly everything. Furthermore, I believe that ARM’s annual cadence is helping to fuel some of the astonishing performance numbers that we see from smartphones year on year. This year’s Computex saw new CPU, GPU, and ML cores from ARM, some with bigger potential impacts than others. Let’s take a closer look at what was announced.
Introducing the Cortex A77
The first new core is the ARM Cortex-A77, which is architecturally very similar to last year’s A76 (which is now in phones). The CPU cores in the Kirin 980 and Snapdragon 855 are both based on the A76, with modifications from both HiSilicon and Qualcomm . However, ARM made some new modifications to the architecture of the A76 to squeeze out a 20% improvement to IPC. ARM did this, in part, because clock speeds in SoCs are mostly staying flat with the current process nodes, and improving IPC translates to more performance without having to increase clock. With these improvements, ARM says the A77 is 4 times faster than the A15 processor that was released only six years ago. Some of the tweaks that ARM made to the A77 to achieve this include a new 1.5K MOP (macro-op) cache, a larger out of order window (widening from 4 instructions to 6), a fourth ALU, and a second branch execution core. All these architectural changes contributed to the 20% improvement to IPC. ARM also says it has significantly improved ML performance in the A77, which is important because many AI workloads fall back to the CPU if there isn’t a compatible ML processor or GPU.
Additionally, many ML programs are still being built to target the CPU because that’s the lowest common denominator across most Android smartphones. GPUs can vary, but CPUs all have effectively the same cores and instructions. In addition to ML performance, ARM also improved cryptography performance in the Cortex A76 (in some cases double the cryptography performance of the CPU). This is huge because most of the Android OS and many applications are encrypted.
Introducing the Mali G77
In addition to the Cortex A77, ARM introduced the Mali G77. While the G77 may sound like a small performance improvement over the G76, it is actually a complete rebuilding of the Mali GPU architecture into the next generation. One could argue that this justifies naming it the Mali G80, but that would destroy the harmony between ARM’s GPU, CPU, and DPU naming schemes. There’s way too much to cover when it comes to the new Valhall architecture of the Mali G77, but ARM claims an improvement of 40% over the G76. The Mali G77 also features a 30% improvement in energy efficiency and performance density and a 60% improvement for ML workloads (which should be very interesting as more and more applications leverage ML capabilities in hardware). In gaming, ARM claims an improvement of 20-40% in frame rate over the G76, which varies from game to game. My understanding of the G77 is that it is rearchitected to be optimized for low-level APIs like Vulkan. I expect that games that utilize Vulkan are the ones that will see performance uplift closer to 40%. Additionally, as more and more games transition towards using Vulkan to make the most out of the hardware, the Mali G77’s design and new architecture will be more capable of maximizing performance. ARM designed the Valhall architecture for the future, and I expect that we’ll see continued improvements to it in subsequent generations, both in terms of raw performance and Vulkan performance.
In addition to the G77, a few weeks ago ARM also announced the D77, the company’s latest display processor optimized for VR HMDs. The D77 is a big deal because it helps to offload the most common VR workloads from the GPU, which frees up approximately 15% of the GPU for other workloads or just higher performance. I wrote about it in more detail here, if interested.
Last but certainly not least, ARM finally announced its new ML processor, an extension of its Project Trillium program (unnamed as of yet). ARM is still keeping any design wins close to its chest, but it says that an 8-core design of this processor is capable of 32 TOPs of ML performance. ARM also claims an ML performance figure of up to 5 TOPs per watt, but didn’t specify how many cores obtained that metric. We have yet to hear which customers (SoC makers) have adopted ARM’s ML processor into their designs, but we do know that they are significant. It makes sense that ARM would release its ML processor now, since it’s been talking about ML for quite some time with Project Trillium. Having an ML processor rounds out the company’s ML architecture, allowing it to send any given ML workload to the most appropriate processor for the best performance per watt. This ability is ultimately what matters the most when performing ML inference in mobile.
From ARM’s announcements at Computex this year, it’s clear the company’s foot is on the gas pedal. Its new processors have great potential to accelerate both today’s workloads and the workloads of the future for AI and immersive computing. I expect ARM to continue to propel the mobile industry forward with its innovative processor designs—which ultimately enable its customers to build SoCs for the devices of the future. I hope that application developers will be able to make the most of these new processors’ performance improvements; consumers will ultimately feel the benefits.