As well as unveiling its latest CPU technology recently, ARM has also announced its next-generation graphics processor that we will likely see heading to smartphones in the future – the Mali-G72. As the name suggests, this is a successor to ARM’s current high-end Mali-G71 design and is based on the same Bifrost architecture.

Delving straight into the numbers, the Mali-G72 is promising a 25 percent improvement to energy efficiency and a 20 percent saving on performance density, when using the same processing node as a G71 design. In terms of performance, SoC designers could immediately put this 25 percent energy saving towards additional performance while sticking within previous power budgets. Other metrics vary depending on the use case, as ARM claims the Mali-G72 sees a 17 percent improvement to GEMM and other improvements, such as changes to the tiler and new instructions, can lend further boost in specific situations.

Combining together an increase in potential core count, implementation on a more efficient processing node, and various micro-architecture improvements, ARM suggest that future Mali-G72 devices could see a graphics improvement of up to 40 percent over typical 2017 devices. Although actual implementations will probably vary from this value.

Unlike ARM’s latest CPU cores, the Mali-G72 is more of an incremental revision than a major shift in the way ARM is propositioning its graphics technology. The GPU has seen hundreds of smaller micro-architectural refinements made to it, which add up to some notable improvements to the design. For starters, the tile buffer memory size has been increased, which can lend up to a 40 percent performance boost to certain use cases. ARM has also rebalanced the execution pipeline to better fit the use cases that many apps were using, including optimizations for FMA & ADD instructions.

The Mali-G72 has seen hundreds of smaller micro-architectural refinements made to Bifrost, which add up to some notable performance, energy, and area improvements over the G71.

The Mali-G72 has also increased the sizes of its L1 cache, and has doubled the throughput for complex operations. For example, the common inverse square root operation has been optimized so that it is now completed in just a single cycle. ARM has also added some new internal GPU instructions to alleviate some of the most common bottlenecks that the company has found, and this will be handled by an upgraded set of drivers for the G72.

Going back to the tweaks to the tile buffer, this is an important change to the GPU that’s definitely worth an additional explanation. With the Mali-G72, ARM has increased the size of the tile buffer memory, allowing for memory savings inside the individual cores. This change, along with other optimizations to the individual cores, has allowed ARM to shrink down the size of the Mali-G72 cores, on the same process node, when compared to the G71. So for a small increase in tile buffer footprint, SoC designers are now able to squeeze more individual cores into the same die area with the G72.

This means that manufacturers will be able to increase performance for the same silicon cost, by increasing the core count, or bring previous high core count chips down to lower cost devices by saving on silicon costs. With the last-gen G71, ARM had targeted 16-20 cores as the optimum footprint for high performance and power efficiency, but now believes this will extend closer to the 32 core maximum core count supported by Bifrost. To clarify, both the Mali-G71 and G72 support up to 32 cores, but there’s a diminishing return in terms of performance, power efficiency, and cost as the number of cores increases. The Mali-G72 has been designed partly to raise this bar to allow manufacturers to ramp up additional performance without sacrificing on energy or cost.

Complemented by Mali-Cetus display

Earlier in the month, ARM also announced its new Cetus display architecture, which can be paired up with ARM Mali or GPUs from other vendors to offload common display tasks. Although not a mandatory accompaniment to ARM’s Mali GPUs, Cetus does offer developers a number of useful co-features and even performance improvements that are worth mentioning in this context.

For starters, Cetus is ARM’s first HDR display solution, granting support for the latest mobile display technologies. The technology is capable of 12-bit internal precision and will support open high dynamic range standards, such as HDR10, with support for some proprietary formats also potentially in the works further down the line. Cetus can also be seamlessly integrated with ARM Assertive Display technology, which adjusts display brightness and colors depending on the lighting conditions, to make the most of HDR content even while viewing in less than ideal circumstances. HDR support pairs nicely with Cetus’ optimizations for 4Kx2Kp90/120Hz displays, a specification which is likely to become more common to meet the demands of virtual reality applications.

Combined with a Mali-G72, or any other GPU, Cetus can offer up high performance 2K and 4K content with HDR support in a low power mobile form factor.

Speaking of 4K optimizations, Cetus is able to process 4K images on a low power budget thanks to the use of side-by-side processing. A 4K image is split into two halves, with the left and rights sides each undergoing their own parallel pass through Layer Processing, Composition, and Display Output Units. By performing two workloads in parallel, the DPU’s clock speeds and therefore power can be kept within the tight limits of a mobile processing package.

On the performance side, the use of a dedicated DPU can offload some tasks from the GPU, such as multi-display composition. Cetus can also make use of ARM’s in-house ARM Frame Buffer Compression (AFBC) lossless image compression format, which can reduce memory usage across the graphics pipeline. In other words, using Cetus in conjunction with a Mali GPU can boost performance by making use this compression technique across multiple components, without the need for a conversion part way through the chain. This is especially useful as display resources can consume up to 60 percent of a SoC’s memory bandwidth and higher resolution displays demand more and more of the system memory.

Finally, Cetus can also be used as an embedded controller to talk to variable refresh rate panels. This technology has been available in larger TV and monitors panels for a few years now and aims to eliminate screen tearing issues on mobile too. The technology stays at least one frame ahead of the panel to smooth out any dips in frame rate and can also be connected directly to the GPU frame rate to reduce the appearance of slow-down and blurring during gaming.

Wrap Up

In summary, the Mali-G72 is a refinement of ARM’s Bifrost architecture, which made its debut with last year’s Mali-G71. The GPU features 100s of small tweaks that all add up to some notable performance improvements, but perhaps most importantly the design is now smaller and more power efficient than before. This paves the way for SoC designers to increase the GPU core count without incurring any extra silicon costs or hits to mobile’s limited power budget. So we should almost certainly see more powerful GPUs inside next year’s SoC.

Just like DynamIQ and ARM’s new Cortex-A processors, we likely won’t see the Mali-G72 appear in devices until sometime in early 2018.

Robert Triggs
Lead Technical Writer covering SoCs, displays, cameras, and everything in between. In his spare moments you'll find him building audio gadgets.