Following our briefing at Arm Tech Day 2019 and coinciding with the kick off of Computex 2019, Arm has unveiled two key new entries in its CPU and GPU lineup. The Arm Cortex-A77 takes high-end CPU performance to new heights. Meanwhile, the new flagship Mali-G77 GPU marks the dawn of a new graphics architecture as Valhall replaces Bifrost. No, that’s not a typo, the modern Scandinavian spelling doesn’t have an ‘a’ at the end. Who knew?
If you’re after all the nitty-gritty details, be sure to check out our deep dives on both the Cortex-A77 and Mali-G77. If you’re just after the key takeaways from Arm’s latest announcements, then you’re in the right place.
Expect 20-30 percent more performance next-gen
Next-generation processors are always targeting better performance and in the case of Arm without increasing power consumption. The new Cortex-A77 targets a roughly 20 percent performance improvement over the Cortex-A76 when using the same processing node and clock speeds. That’s also while sticking in the same power envelope and a marginally larger silicon area size as well. We could see a few more percentage points of improvement when SoCs move to improved 7nm processes, but about 20 percent is the ballpark uplift for next year.
The Mali-G77 is a little more aggressive on the performance gains. The new GPU architecture boasts about 30 percent better performance energy efficiency and performance density over the Mali-G76. Manufacturers could even lay down more GPU silicon to boost performance further. Factoring this in and new process improvements heading our way, Arm expects that Mali-G77 performance can reach up to 40 percent higher than the G76. That’s a pretty big deal given Qualcomm Adreno’s perceived performance lead in mobile at the moment.
The Cortex-A77 builds on the A76 design
The Arm Cortex-A77 is a direct successor to last year’s high-end Cortex-A76. We’ll almost certainly see four of these new CPUs inside 2020’s flagship smartphones, paired up with four energy efficiency Cortex-A55.
The biggest changes to the microarchitecture are found in the branch prediction cache and a beefed up ability to handle six instructions per cycle, up from four. There also a new ALU and Branch unit inside the execution core. Ignoring the technobabble, the key thing to understand is that the Cortex-A77 aims to keep the CPU better fed with data for faster throughput. This is done by reducing bottlenecks in the earliest stages of the CPUs hardware and then boosting the number of executions that the core can handle at once.
Wide throughput was already the name of the game with the Cortex-A76, and the A77 improves on this formula further. A more thorough explanation of the technical changes is found in the deep dive.
Valhall is a major change to Arm’s GPUs
While the Cortex-A77 is an iterative CPU design, the Mali-G77 is a brand spanking new GPU design from Arm. Bifrost is out and Vahall is in, and performance can be up to 40 percent higher as a result.
The key to the Mali-G77’s improvements is found in the execution unit. Rather than running three (or two in the case of the Mali-G52) execution units in each core with Bifrost, the Mali-G77 features just a single new execution core with two beefed up processing units inside. There’s also a new Quad Texture Mapper and dedicated instructions for machine learning workloads that can boost performance by 60 percent.
The Mali-G77 will appear in core configurations ranging from 7 to 16 cores. Smartphone designs will likely fall somewhere in the middle, as each core is roughly the same size as the G76. Although owing to the new core design, it’s going to be tougher to compare performance between generations based on core count alone.
Mali-D77 solves some big VR problems
The Mali-D77 display processor was announced a couple of weeks ago, so be sure to check out our coverage for the nitty-gritty. The Mali-D77 is designed specifically for virtual reality headsets. It won’t be appearing in smartphones. Nevertheless, it’s an interesting piece of technology that should produce decent performance improvements in the VR market.
This display processor features hardware support for image re-projection and Asynchronous Timewarp to reduce movement update latency and combat motion sickness. The D77 also performs lens correction and fixes chromatic aberration without taking up GPU cycles, freeing up to 15 percent move GPU resources for higher frame rates.
Arm is hot on machine learning but is keeping quiet
We all know that Arm has its own machine learning processor, but the company is keeping much of its secret sauce under wraps. What we do know is that each machine learning core is capable of 4TOPS of throughput, so a two or three cores puts you in Apple A12 range. The core comprises a large fused-multiple accumulate (FMA) math unit and a second more general purpose core based on an Arm microcontroller, paired with 1MB SRAM. However, the company wouldn’t say if this core is closer to a Cortex-M0 or M7 in terms of performance.
Scalable at up to 32 cores, Arm’s machine learning hardware is designed for everything from very low power applications and phones, right up to cloud processing. The company is working with a few partners, but we’ll just have to wait and see if any names are ever made public.
All-in-all Arm continues to push the performance boundaries in the low-power compute space. With this strive for higher performance, the company is increasingly pushing into the laptop class performance market, and those connected laptops are definitely part of the roadmap. Arm’s approach isn’t just about raw power though. The company continues to improve the heterogeneous compute capabilities of its processors, allowing for neural network and other compute hungry tasks to run efficiently across CPU, GPU, DPU, and its machine learning processors. Needless to say, next year’s smartphone SoCs will be even better than ever before.