How big.Little A53/A57 chips will put to shame the S4 Pro, Tegra 4, and Exynos 5 Octa

February 10, 2013
14
94
29 8 86

retired CPUs Credit: Wikipedia

We recently had a look at the up coming Exynos 5 Octa, and dispelled some of the myths about octo-core processors. But some of you in the comments were peering even further ahead into the future, and brought up the interesting topic of the big.LITTLE A53/A57 combination. So, let’s take a look at what’s in store in the next generation of ARM chips, and how they stack up against the current top of the line processors.

big.LITTLE

Time for a quick recap; ARM’s big.LITTLE processor design is an elegant solution to the irritating catch 22 that occurs when manufacturers want to increase processing power, but are limited by battery sizes.

big.LITTLE works by having two sets of processors, a low power group for general activities and a high performance set for gaming and other processor taxing applications. Tasks are then assigned to these processors depending on demand, allowing for large energy savings when only a small amount of processing power is required, without compromising on peak performance.

ARM bigLITTLE Credit: ARM

Samsung’s Exynos 5 Octa is the first chip to sidestep the power issue using the big.LITTLE design, by using a set of low power quad-core ARM A7s and a set of more powerful A15s. But AMD has announced that it will be producing the next iteration of this idea, using an even more powerful Cortex A57 processor as the lead core, and a set of Cortex A53s to save on power consumption.

A53

The A53 is beefed up version of the A7, offering similar performance to the Cortex A9 but using up to 40% less power. The next generation of ARM chips also includes support for 64-bit applications and will produced on 20nm die sizes, eventually be made as small as 14nm. So as well as performance and power consumption improvements, the new chips will also offer heat and code optimizations.

The impressive thing about the A53 is that the processors peak performance should be somewhere around that of the A9 quad-core processor in the Galaxy S3. No-one would complain that the Galaxy S3 is sluggish, and with 40% less power draw your handset won’t need charging anywhere near as often.

What makes the A7 and A53 consume less energy compared to their equivalent high end chips is due to the fact that they use in-order execution, so processes can only be completed in the order they are received. This is power efficient, but reduces performance for multi-threaded tasks compared with out of order execution – which allows processors to speed up processing by reordering instructions.

A57

That’s where the new A57 comes in. The top of the line Cortex A57 is an out-of-order execution 64-bit processor, which offers a significant performance boost over the already powerful A15, but again manages to improve on energy efficiency. The improvements add up to a 20 – 30% increase in performance over the older chip, so right off the bat this new implementation of big.LITTLE will be have a higher peak performance than the older generation.

Cortex A50 series Credit: ARM

32 vs 64 bit

As mentioned, both the A53 and A75 introduce 64 bit application support to Android devices. Although probably aimed at the high end server computer market, there could be some benefits for portable device users if Android was ever to switch over to a 64-bit operating system.

The number of bits, when talking about CPU processors, refers to the width of the processor’s register. In other words how many individual 1’s and 0’s worth of data can the processor pull from other sources to store when it needs to do some work.

If you want to get into the technical aspects of this, the total memory limit on a 32 bit processor is calculated by 2^32, which works out at a maximum of 4GB worth of accessible memory. However, when you take out memory required by system hardware and graphics memory, 32 bit systems are often left with less than 4GB left available for applications.

64 bit processors on the other hand can read from a massive 2^64 worth of memory, which works out to be 16 exabytes worth – or 16 billion gigabytes.

But why is any of this important, it’s not like current smartphones and tablets are particularly slow? Well the benefits come from that fact that RAM is much faster at reading and writing data than hard storage devices. So if you increase the maximum available amount of RAM you can spend less time waiting for data to transfer from slower storage devices, and improve your overall system performance.

64 bit coded applications can also be faster to execute than 32 bit ones, as you can send more data to the processor in one go if you’re utilizing the wider CPU register. Applications can be faster and more efficient with a 64 bit processor, and tablets will finally be able to push above the 4GB RAM mark.

Performance vs Battery

ARM has outlined two specific routes that it wants to take when it comes to mobile devices. For smartphones it is planning a dual and quad core combination, with two Cortex A57 cores providing the power when needed, and four Cortex A53s available for general processing. Tablet implementations will feature two sets of quad-cores, for some additional processing power.

Cortex A50 mobile configurations Credit: ARM

Going back to dual core smartphones from quad core chips like the Exynos 5 Octa, Tegra 4, and S4 Pro might seem like backward step, but there are several clever design considerations which make this a smart choice.

Firstly, remember that the baseline minimum performance is drastically improved over the Exynos 5 Octa. Although there may be additional power consumption whilst idle, the idea is that the most powerful cores won’t be needed at all unless you’re gaming or doing something else really CPU intensive.

As there’s already plenty of processing power from the four A53s, it makes a lot of sense to only add an additional two high performance cores, to prevent unnecessary power drain from four cores which are never likely to be utilized fully.

Cortex A50 performance chart Credit: ARM

Secondly, big.LITTLE aim to strike the best balance between performance and power consumption. Although the A7 is a very low power chip, the A15 is a pretty large power drain. Considering the A7 is quite a weak processor, it’s likely that the A15s will be switching on fairly regularly, draining the juice faster. The new A53/A57 combination offers lower average power consumption, by not having to switch on the hungry A57s as often.

For tablets, where higher resolution displays are likely to become much more common, there is a need for additional power, hence the two extra A57 cores..

Stacking up against the competition

I’m sure many of you are wondering what the difference is between the energy efficiency techniques used in the likes of Nvidia’s Tegra 4 or Qualcomm’s S4 Pro. It all comes down to symmetrical or asymmetrical processors.

You see, there are two ways of organizing your multi-core CPUs; they can either work closely together, sharing memory and such, or they can be more autonomous and work from their own caches and be largely unaware of what the other processors are doing. There are pros and cons to each method, which I shall explain.

Asymmetrical Multi-Processing (AMP) allows for each core to be individually turned off and their voltages controlled depending on processing requirements. This is the most efficient method of saving battery power but can run into trouble when running multi-threaded application, as external controls are needed to make sure the cores communicate properly.

Quad Core Krait shown running a separate video on each core

Credit: Carrypad Quad Core Krait shown running a separate video on each core

Symmetrical Multi-Processing (SMP), on the other hand, transfers the assigning of tasks to the operating system, which is much more convenient. With SMP you can control the frequency of core groups but not individual cores, which is less energy efficient than AMP.

Performance wise, AMP is generally faster and more energy efficient at handling lots of individual tasks, where as SMP is better when you are running multiple processes which share the same memory pool (i.e. multi-threaded applications). So really it comes down to the applications being run.

Qualcomm’s Krait cores (S4 pro) are asymmetrical, so each core can be turned on and off individually to save on power consumption. Big.LITTLE on the other hand is a hybrid of both architectures; the sets of cores are SMP, but each group can be controlled asymmetrically.

Nvidia’s Tegra processors are the weirdest of the bunch. The companion core can be controlled individually, and is asymmetrical to the four main cores. However the Tegra main cores can also be shut off individually by gating their power, but they can’t be clocked individually like a true AMP processor.

But which is the most efficient method of conserving energy?

Unfortunately we can’t compare the actual power draw from each chip yet, so we’ll have to try our best to infer performance from the way the architectures are designed.

Both the S4 Pro and Tegra 4 use ARM Cortex A15 processors, which have a higher minimum power consumption than the Cortex A7 and A53. In the lowest power states the S4 Pro and Tegra 4 will be running a single A15 with a low clock, whereas the big.LITTLE will be running four A53s at a low clock.

Graph shows that even the Cortex A7's highest performance state consumers less power than the A15's lowest performance state.

Credit: ARM Graph shows that even the Cortex A7’s highest performance state consumers less power than the A15’s lowest performance state.

Overall the minimum power consumed is likely to be very similar. The AMP nature of Tegra and the S4 Pro might give them a slight advantage over the A53, but not the A7. However both Qualcomm’s and Nvidia’s designs will require the power hungry cores to be switched on if more than one core is required, instantly adding more power drain. Big.LITTLE can run 4 cores in the lowest power state, only needing to up the core clock speeds and voltage slightly in order to increase performance.

The real benefit of big.LITTLE shines through when dealing with medium performance requirements. The two competing models from Qualcomm and Nvidia have to turn on their high performance cores if anything above minimum power is required. Where as big.LITTLE can stay on the most power efficient cores until totally necessary, resulting in lower average power consumption.

At the top end, the introduction of dual A57s should also help the new chip reduce peak power consumption, which should see big.LITTLE beat out the competitors here as well.

Conclusion

Overall, the big.LITTLE architecture seems to provide the best balance of power consumption, multi-processing support, and peak performance. That’s compared with the competing architectures, which only provide power efficiency at the lowest levels of operation. The A53/A57 improves on the already impressive Exynos 5 Octa chip, and looks to be an outstanding processor when it finally hits the market.

Whilst a direct comparison between the Exynos 5 Octa and the A53/A57 isn’t exactly fair, as newer chips will have obvious speed advantages, it’s the design implementation differences which are really worth noting. The trade off between peak performance, general performance, and idle power drain is a tough balance to strike, but the unique dual core/quad core design for smartphones could well be the sweet spot.

Comments