New ARM GPUs Bring Major Performance Improvements To High-End And Mid-Tier Devices

Arm’s new Mali-G78 and Mali-G68 GPUs.
 ARM

Arm is one of the world’s leading suppliers of GPUs in the world. Between smartphones and embedded devices, the company purportedly shipped over 1 billion GPUs in 2019. While undoubtedly impressive, this number is not that surprising when you consider that Arm has an estimated 85% market share in smart TVs, mostly thanks to MediaTek, and 50% share in smartphones. In fact, many of the smartphone applications we use today would not even be possible without these GPUs (though Arm does face some stiff competition from Qualcomm and Apple).

Arm’s GPU business mainly targets four key applications: AI, entertainment, gaming and XR. This week the company announced two new GPUs to improve performance for these applications—the Mali-G78 and the Mali-G68. Let’s take a closer look.

The Mali-G78

Arm’s newest two GPUs represent continued innovation from the company in the graphics space, with a distinct focus on maximizing performance while still being extremely conscious about power consumption. Many GPU-based applications are very demanding on the GPU and generally run for longer periods of time than the typical application. The new Mali-G78 is the company’s newest top-end GPU, designed to replace the Mali-G77, its previous top-of-the-line processor (it’s worth noting that ARM partners can pick any design they want—for whatever reason, some may choose to stick with the Mali-G77).

The Arm Mali-G78 represents the company’s highest performing Mali GPU to date. It claims 25% better performance than the Mali-G77, due to the architectural process and other optimizations. Most SoCs built on the Mali-G78 next year are expected to utilize TSMC’s 5FF process node. That said, there is nothing preventing an SoC vendor from using 7nm, aside from the fact that the chip would likely be larger than it would be on 5FF. I believe that Huawei, one of ARM’s biggest licensees, will likely have a hard time accessing 5FF next year due to its current situation being on a US Entity list restricting access to US technologies and TSMC. Nevertheless, the company will still likely opt for the G78 in order to be competitive with others in gaming performance. According to Arm’s own figures, the GPU has 15% more performance density, 10% better energy efficiency and 15% faster machine learning performance than the G77, which is in many of this year’s phones. That said, Arm’s performance figures are based on the same process node and similar conditions between chipsetsensuring that only the new design’s performance improvements are represented rather than other differences.

Architecture Details

The Mali-G78 features a multi-core design, like many of Arm’s previous GPUs, but allows up to 24 cores for the scalable GPU designs that enable the GPU’s highest performance point. The Mali-G78 is still part of the Valhalla architecture family. The architectural improvements Arm has made to Valhalla since its initial release in the G77, and the ability to support 24-core designs give the ARM Mali G78 a lot of top-end performance for high-end smartphone gaming and XR experiences. This is especially true when you consider that the Mali-G77 was configurable from 7 to 16 cores—there’s a lot of room for the G78 to scale up.

One of the G78’s big new features is the ability to do what Arm calls Asynchronous Top Level, which enables different parts of the GPU to operate at different frequencies and power levels. The G78 GPU uses two asynchronous clock domains—the shader cores make up one, while the job manager, tiler, MMU, control fabric and L2 cache comprise the other. Different applications should benefit from this new capability—Arm claims a 8-17% performance improvement over the G77 with Asynchronous Top Level. The higher core counts (24) appear to benefit from Asynchronous Top Level more than the lower core count (18), judging by Arm’s numbers. Asynchronous Top Level is also designed to save power, by reducing clocks based on the content. According to Arm, the G78 already consumes 10% less power than the G77, tested under the same process and similar conditions. The addition of Async to the G78 gives it another 6-13% reduction.

Arm also completely redesigned the FMA (Fused Multiply-Add), reducing the energy consumption in the FMA unit by 30%. The unit has a new multiplier architecture, new add/normal architecture and separated F32 and F16 paths which takes up more space but saves energy. Arm also introduced a Fragment Dependency Tracking Feature which helps improve gaming performance by reducing the fragmentation of game assets as they are processed. Across 6 different games, Arm measured performance improvements of 6-17% over the G77.

Mali-G68 – A New GPU Tier

Before this new announcement, the ARM Mali family of GPUs only had 3 tiers: High Performance, Mainstream and Ultra-Efficient. However, the Mali-G68 represents a new fourth category, which Arm calls the Sub-Premium Tier. Sub-Premium sits between the top-end High Performance line and the Mainstream line, making it the new mid-tier for smartphones and other mobile devices. What makes Mali-G68 such a great GPU is that it offers many of the benefits of the G78, while only supporting up to 6 cores (rather than the G78’s 7-24). ARM says it built this new tier in response to feedback from partners wishing to scale the latest features and technology across their entire portfolio of products to reduce costs. Frankly, I believe Arm really needed a tier between the Premium Tier and Mainstream Tier. I expect the company will continue to tweak that tier’s offerings as feedback rolls in from customers. The net-net of the G68 is that developers will able to access many of the G78’s technology innovations across a much broader array of devices.

Final thoughts

Arm has a long history of making promises on GPU and CPU performance, most of which end up being validated by its customers. After a product release, Arm typically expends a lot of resources on making sure it can deliver on the promises from the year before. Arm’s new GPUs and its partners’ SoCs should enable some significant performance improvements in the devices coming out in 2021. The G78 sets a new standard for ARM, while the G68 offers a broad-base, lower price alternative to the G78 with many of the same high-end features. It will be interesting to see next year how G78 and G68-based devices perform against the competition. I can’t wait to see what kinds of devices end up utilizing the new GPUs.