The iPhone SE is a compelling affordable smartphone not just for its price, but because it brings along flagship-tier performance too. Apple’s iPhone processors have long had an edge on its Android rivals in both sheer CPU and GPU grunt. In fact, Apple is so convinced in the performance of its custom-Arm chipsets that it’s preparing to drop Intel from its laptop lineup.
For a quick recap of the situation, the $399 iPhone SE bests the $1,200 Samsung Galaxy S20 Ultra in single-core CPU benchmarks. That’s pretty embarrassing on the face of it, although it doesn’t tell the whole story. The Samsung Galaxy S20 Ultra still outperforms the less expensive handset in multi-core, graphics, and memory benchmarks. Still, it’s an impressive showing from Apple’s custom Arm Lightning CPU and highlights a current performance deficit in the Android arena.
Take a closer look: Why the iPhone SE is faster than the Samsung Galaxy S20 Ultra
Android performance junkies long for a competitive CPU and SoC, and they might just have their answer in the Arm Cortex-X1. Arm announced two new performance CPUs for mobile devices in 2021: the Cortex-A78 and Cortex-X1. The latter diverges from the usual roadmap in pursuit of greater performance gains, at the expense of Cortex-A’s usual area and energy efficiency. Although it remains to be seen if the X1 will topple or simply rival Apple’s single-core performance lead.
If you’re wondering how and why CPUs can be so different and what to expect from the Cortex-X1, read on.
Read more: Arm Cortex-X1 and Cortex-A78 deep dive
What makes a CPU more powerful?
The high-level reason for Apple’s lead is that it dedicates more silicon area to its high-performance parts. CPU performance seldom boils down to brute clock speeds. Instead, true performance depends on how much a CPU can get done with each clock cycle. Broadly speaking, bigger CPUs tend to do more per clock as they have more silicon area dedicated to number crunching components. But that costs more in terms of silicon area and power consumption.
Delving a little deeper, there are a few key things to know about how a CPU works to maximize performance. First is the execution core, which comprises math and logic units that actually do the processing. Having more of these for specialized operations like floating-point or machine learning can greatly increase the speed and numbers of tasks done at once. Apple has a whopping nine of these in its A13 Lightning CPU, 50% more than the Cortex-A77.
Apple CPUs are built with a huge number of execution units and lots of cache memory to do lots with each clock cycle.
The next important factor is ensuring these execution capabilities have things to do. This is where the branch predictor and decode/dispatch units come into play. Dedicating more silicon to bigger, smarter predictors and large out-of-order execution windows that can dispatch multiple operations each cycle maximizes the performance of the execution units.
Finally, more cache memory ties the two together. Cache memory is used to store data needed by the processor without having to reach out to slower RAM. Larger caches sizes allow more data to be stored close to the CPU, speeding up its execution and allowing it to swap in and out of tasks more efficiently. Again, Apple prioritizes much more L1 and L2 cache memory than CPUs used in current Android phones.
However, these units take up silicon space and consume power. It’s up to a chip designer to optimize their CPU for cost, power efficiency, and performance. Cache memory, for instance, eats up a lot more area than a basic ALU.
There’s also the topic of heavily optimized instructions and execution units that can speed things along further. Apple has a custom architecture license from Arm, allowing it to make a lot more of these optimizations than chip designers that build Android SoCs. But this is probably going a little too far down the rabbit hole.
Introducing the Cortex-X1: Android’s key to higher performance
In recent years, Apple has opted for much bigger CPU cores than its Android rivals, with wide execution pipelines and lots of cache memory. The Arm Cortex-X1, developed with SoC partners, is a beefed-up CPU core that’s larger than we’re used to in the Android space. Here’s a basic overview of the two compared with the current-gen Cortex-A77 found in the Snapdragon 865 and Arm’s other new Cortex-A78. Remember, this only highlights some of the key CPU features and certainly isn’t a full comparison.
|Apple A13 Lightning Core||Arm Cortex-X1||Arm Cortex-A78||Arm Cortex-A77|
|Logic Unit Count||6x Arithmetic Logic Unit (ALU)|
3x Floating Point (FP) / Vector
4x FP / SIMD
2x FP / SIMD
2x FP / SIMD
|Front-end dispatch/decode||7-wide decode||8-wide decode||6-wide decode||6-wide decode|
|L1 cache||128KB||64KB||32KB / 64KB||64KB|
|L2 cache||8MB (shared)||1MB||512KB||512KB|
|L3 cache||N/A||8MB (shared)||4MB (shared)||4MB (shared)|
We’re not going to dive too deep here, but we can see the general direction of travel. The Cortex-X1 boasts four powerful floating-point math units, bulking up the execution core capabilities to eight total to close the gap on Apple. The X1 has an even wider dispatch to keep these units fed with things to do. Cache hierarchy is difficult to directly compare, as there’s latency and shared access times to consider. For example, Apple’s L2 is shared while the X1’s isn’t, while Arm’s CPU offers a shared L3. However what’s clear is that Arm is also significantly bumping up the total available cache with the Cortex-X1.
The Cortex-X1 bulks up its concurrent processing capabilities and memory footprint, reminiscent of Apple's approach.
Taking a guess at 2021 performance based on these metrics alone would be futile, and Apple still has its own next-gen processor to come anyway. The takeaway is the Cortex-X1 is a departure from Arm’s typical roadmap in order to build a bigger, more powerful processor that definitely shares design similarities with the Apple A13’s Lightning CPU. Next-gen Android SoCs that use the Cortex-X1 will certainly see a healthy boost to single-core CPU performance, although they’re unlikely to fly past their iPhone rivals.
More from Arm: Mali-G78 and Mali-G68 graphics announced
What to expect from 2021 smartphones
There are still a lot of unknowns about how SoCs for 2021 smartphones will shape up. For starters, we don’t yet know which of Arm’s usual partners have access to the powerhouse Cortex-X1. That depends on which partners signed up with Arm’s CXC program this year. There’s also the question about how many X1 cores upcoming SoCs could use. Just a single CPU core would give a decent single performance uplift, and Arm explicitly used the example of one X1 paired with three of its other new Cortex-A78 cores. But we’d need two X1 cores to more closely rival Apple’s setup. Four powerhouse X1 cores in a phone seems unlikely given the area and power requirements.
Two Cortex-X1 cores would bring Android closer to Apple, but we'll have to wait for chip announcements.
Next-gen Android performance depends on SoC designers as much as it does on Arm’s technology, as they can tweak the memory, clock speed, and core layouts. Either way, single-core CPU performance looks set to see a major boost with the X1 compared to current generation chips and even the new Cortex-A78. Given SoCs used by Android phones already offer superior multi-core and energy efficiency scores, Apple will have some serious competition on its hands. We can expect at least one Cortex-X1 based smartphone chipset next year, likely the next Snapdragon.
Of course, there’s much more to smartphone performance than just a single CPU. We’re also well past the point of obvious day-to-day performance gains from just the CPU alone. Graphics, image processing, machine learning, and more all contribute to the snappiness of your handset across various workloads, and we can certainly expect meaningful gains in 2021 here as well.