Mali-G71-feature-image

ARM has announced a new mobile GPU, the Mali-G71, based on a whole new GPU architecture called Bifrost. ARM’s mobile GPU products have been through two previous major architectural revisions. First came Utgard, which you find in GPUs like the Mali-400, Mali-470 etc. Utgard supported OpenGl ES 2.0 and was found in devices like the Samsung Galaxy S2. Next came Midgard, a new architecture with support for the unified shader model and OpenGL ES 3.0. Midgard GPUs include the Mali-T604, found in the Nexus 10; the Mali-T760 found in the Samsung Galaxy S6, as well as other devices including some of Acer’s Liquid range; and the Mali-T880, which if found in the Exynos variants of the Samsung Galaxy S7 as well the Huawei Mate 8, the Huawei P9 and so on.

The new Mali-G71, which has been only known by its codename Mimir until now, uses a new architecture called Bifrost. If you are wondering about the names of these architectures they are all based on Norse mythology. Anyone who has seen the Thor movies will remember that Bifrost is the rainbow bridge that reaches between Midgard and Asgard.

ARM-Mali-archs-over-time-16x9-720p

Compared to the Mali-T880, the new G71 offers lots of improvements. It offers a 20% higher energy efficiency (on the same process node, tested under the same conditions). A 20% power saving is very impressive and when coupled with the 40% better performance density , which basically means more performance per square millimeter of silicon, the G71 is clearly going to be ARM’s most advanced GPU yet.

The biggest of the Midgard GPUs, including the T880, could support up to 16 shader cores. The G71 (and all Bifrost GPUs) can be implemented with up to 32 shader cores, effectively doubling the potential shader performance. The G71 also supports 120Hz refresh rates (important for VR), 4x multi sample anti-aliasing, and 4K screen resolutions.

The G71 is optimized for Vulkan and other industry-standard APIs (including OpenGL ES and OpenCL), and builds on innovations from the previous Utgard and Midgard architectures.

 

Bifrost

ARM-Mali-Bifrost-design-16x9-720p

The new Bifrost GPU architecture is a major redesign of the previous generations which has resulted in ARM’s most efficient GPU architecture to date. It offers 1.5 time the performance over the previous generation while adding fully GPU coherency (when used with interlinks like the CoreLink CCI-550).

This means that for the first time the GPU is a full partner to the CPU and not just a slave component. Full coherence means that the GPU gets access to the same cached data as the CPU and reduces the number of times the GPU needs to access main memory to read or write data. Also the combination of the Mali-G71 and the CoreLink CCI-550 allows the CPU and GPU to share the same memory, which removes the need to copy data between CPU and GPU buffers.

ARM-Bifrost-memory-subsystem-16x9-720p

One of the biggest architectural innovations in Bifrost is the use of  “Quad Vectorization” to reduce the number of cycles needed to perform vector  operations. GPUs need to deal frequently with X,Y and Z coordinates. For the purpose of 3D graphics these X, Y and Z numbers need to be manipulated using addition, multiplication and so on. The way Midgard GPUs handled these numbers was to use a SIMD engine.

SIMD stands for Single Instruction Multiple Data, a system that allows all three numbers to be multiplied at the same time. Let’s say that X, Y and Z need to be multiplied by 2, 5 and 7 respectively. The traditional serial (scaler) way to do this is to multiply X by 2, then Y by 5 and then Z by 7. That takes 3 cycles. However since the GPU is doing this often, then it is possible to setup a multiply operation on several numbers at once. The GPU can be told to multiply X by 2, while it is multiplying Y by 5 and Z by 7. In other words the GPU is told to multiply the three numbers in block 1 by the numbers in block 2. The SIMD engine is designed to do all that in one cycle. So now rather than 3 cycles (using the serial approach) it can be done in one. Hooray.

But you may have noticed that computers don’t handle three of things very well, computers like things to be in 1, 2, 4, 8, 16 groups. So the SIMD engine in Midgard was four wide, meaning it can handle four multiply operations in one cycle. For 3D graphics that means that one of the slots in the SIMD engine is now idle.

Now imagine four SIMD instructions being executed by the GPU, four lots of multiplications of X, Y and Z. Let’s call them T0, T1, T2 and T3. Normally that would take four cycles, one for each multiply. What Quad Vectorization does is use that idle forth slot on the SIMD engine to reduce that to three by setting up the SIMD instructions in such a way that T0.x is performed not with T0.y and T0.z as you might expect, but with T1.x, T2.x, and now filling the idle slot T3.x. Then comes the Y multiplications T0.y, T1.y, T2.y and T3.y, and then finally the Z multiplications T0.z, T1.z, T2.z and T3.z. So now it only took 3 cycles. So what Quad Vectorization does is group the SIMD operations into groups of four and executes them in 3 cycles.

ARM-Bifrost-quad-vectorization

To handle all this Bifrost uses a clever Quad Manager along with some execution engines to process the groups of 4 SIMD instructions. The G71 has three such execution engines. This method actually turns out to be very compiler friendly and if the shader code is compiled optimally then the quad execution engine is just fed a constant stream of quad vectors to process.

This also has power saving implications as the GPU only needs to fetch one scalar operation per quad execution engine every clock cycle. This means that there is a significant reduction in instruction cache bandwidth.

Bifrost also includes lots of other clever innovations like index driven position shading, claused shaders and ARM TrustZone, plus the tiler memory structures have been significantly redesigned to reduce the tiler memory footprint. As you can see, Bifrost is the next generation GPU architecture that is destined to be used over the next several years for a range of different GPUs, of which the G71 is the first.

Wrap-up

ARM foresees the rise of VR and AR on mobile and Bifrost is ideally suited to power these immersive experiences. Some see the ability to deliver a compelling VR experience on mobile as critical for the gaming industry’s continued growth and advancement. As such ARM is positioning the Mali-G71 as the GPU needed to make virtual reality and augmented reality an everyday experience on a mobile device.

As is always the case in the semi-conductor industry, there is a delay between when a design is announced and when we will see it in an actual device. ARM has now officially unveiled the G71 and Bifrost. For sure ARM has been working with its partners in the background, long before this announcement was made and that the G71 is already being primed for inclusion in upcoming SoCs. We know that chip makers like HiSilicon, MediaTek and Samsung have already taken licenses. The exact date when we will see actual products using the G71 is uncertain, however we will likely see processors with Mali-G71 GPUs towards the end of this year, and devices sometime during 2017.