How big.Little A53/A57 chips will put to shame the S4 Pro, Tegra 4, and Exynos 5 Octa

February 10, 2013
8 86 14

retired CPUs Credit: Wikipedia

We recently had a look at the up coming Exynos 5 Octa, and dispelled some of the myths about octo-core processors. But some of you in the comments were peering even further ahead into the future, and brought up the interesting topic of the big.LITTLE A53/A57 combination. So, let’s take a look at what’s in store in the next generation of ARM chips, and how they stack up against the current top of the line processors.

big.LITTLE

Time for a quick recap; ARM’s big.LITTLE processor design is an elegant solution to the irritating catch 22 that occurs when manufacturers want to increase processing power, but are limited by battery sizes.

big.LITTLE works by having two sets of processors, a low power group for general activities and a high performance set for gaming and other processor taxing applications. Tasks are then assigned to these processors depending on demand, allowing for large energy savings when only a small amount of processing power is required, without compromising on peak performance.

ARM bigLITTLE Credit: ARM

Samsung’s Exynos 5 Octa is the first chip to sidestep the power issue using the big.LITTLE design, by using a set of low power quad-core ARM A7s and a set of more powerful A15s. But AMD has announced that it will be producing the next iteration of this idea, using an even more powerful Cortex A57 processor as the lead core, and a set of Cortex A53s to save on power consumption.

A53

The A53 is beefed up version of the A7, offering similar performance to the Cortex A9 but using up to 40% less power. The next generation of ARM chips also includes support for 64-bit applications and will produced on 20nm die sizes, eventually be made as small as 14nm. So as well as performance and power consumption improvements, the new chips will also offer heat and code optimizations.

The impressive thing about the A53 is that the processors peak performance should be somewhere around that of the A9 quad-core processor in the Galaxy S3. No-one would complain that the Galaxy S3 is sluggish, and with 40% less power draw your handset won’t need charging anywhere near as often.

What makes the A7 and A53 consume less energy compared to their equivalent high end chips is due to the fact that they use in-order execution, so processes can only be completed in the order they are received. This is power efficient, but reduces performance for multi-threaded tasks compared with out of order execution – which allows processors to speed up processing by reordering instructions.

A57

That’s where the new A57 comes in. The top of the line Cortex A57 is an out-of-order execution 64-bit processor, which offers a significant performance boost over the already powerful A15, but again manages to improve on energy efficiency. The improvements add up to a 20 – 30% increase in performance over the older chip, so right off the bat this new implementation of big.LITTLE will be have a higher peak performance than the older generation.

Cortex A50 series Credit: ARM

32 vs 64 bit

As mentioned, both the A53 and A75 introduce 64 bit application support to Android devices. Although probably aimed at the high end server computer market, there could be some benefits for portable device users if Android was ever to switch over to a 64-bit operating system.

The number of bits, when talking about CPU processors, refers to the width of the processor’s register. In other words how many individual 1’s and 0’s worth of data can the processor pull from other sources to store when it needs to do some work.

If you want to get into the technical aspects of this, the total memory limit on a 32 bit processor is calculated by 2^32, which works out at a maximum of 4GB worth of accessible memory. However, when you take out memory required by system hardware and graphics memory, 32 bit systems are often left with less than 4GB left available for applications.

64 bit processors on the other hand can read from a massive 2^64 worth of memory, which works out to be 16 exabytes worth – or 16 billion gigabytes.

But why is any of this important, it’s not like current smartphones and tablets are particularly slow? Well the benefits come from that fact that RAM is much faster at reading and writing data than hard storage devices. So if you increase the maximum available amount of RAM you can spend less time waiting for data to transfer from slower storage devices, and improve your overall system performance.

64 bit coded applications can also be faster to execute than 32 bit ones, as you can send more data to the processor in one go if you’re utilizing the wider CPU register. Applications can be faster and more efficient with a 64 bit processor, and tablets will finally be able to push above the 4GB RAM mark.

Performance vs Battery

ARM has outlined two specific routes that it wants to take when it comes to mobile devices. For smartphones it is planning a dual and quad core combination, with two Cortex A57 cores providing the power when needed, and four Cortex A53s available for general processing. Tablet implementations will feature two sets of quad-cores, for some additional processing power.

Cortex A50 mobile configurations Credit: ARM

Going back to dual core smartphones from quad core chips like the Exynos 5 Octa, Tegra 4, and S4 Pro might seem like backward step, but there are several clever design considerations which make this a smart choice.

Firstly, remember that the baseline minimum performance is drastically improved over the Exynos 5 Octa. Although there may be additional power consumption whilst idle, the idea is that the most powerful cores won’t be needed at all unless you’re gaming or doing something else really CPU intensive.

As there’s already plenty of processing power from the four A53s, it makes a lot of sense to only add an additional two high performance cores, to prevent unnecessary power drain from four cores which are never likely to be utilized fully.

Cortex A50 performance chart Credit: ARM

Secondly, big.LITTLE aim to strike the best balance between performance and power consumption. Although the A7 is a very low power chip, the A15 is a pretty large power drain. Considering the A7 is quite a weak processor, it’s likely that the A15s will be switching on fairly regularly, draining the juice faster. The new A53/A57 combination offers lower average power consumption, by not having to switch on the hungry A57s as often.

For tablets, where higher resolution displays are likely to become much more common, there is a need for additional power, hence the two extra A57 cores..

Stacking up against the competition

I’m sure many of you are wondering what the difference is between the energy efficiency techniques used in the likes of Nvidia’s Tegra 4 or Qualcomm’s S4 Pro. It all comes down to symmetrical or asymmetrical processors.

You see, there are two ways of organizing your multi-core CPUs; they can either work closely together, sharing memory and such, or they can be more autonomous and work from their own caches and be largely unaware of what the other processors are doing. There are pros and cons to each method, which I shall explain.

Asymmetrical Multi-Processing (AMP) allows for each core to be individually turned off and their voltages controlled depending on processing requirements. This is the most efficient method of saving battery power but can run into trouble when running multi-threaded application, as external controls are needed to make sure the cores communicate properly.

Quad Core Krait shown running a separate video on each core

Credit: Carrypad Quad Core Krait shown running a separate video on each core

Symmetrical Multi-Processing (SMP), on the other hand, transfers the assigning of tasks to the operating system, which is much more convenient. With SMP you can control the frequency of core groups but not individual cores, which is less energy efficient than AMP.

Performance wise, AMP is generally faster and more energy efficient at handling lots of individual tasks, where as SMP is better when you are running multiple processes which share the same memory pool (i.e. multi-threaded applications). So really it comes down to the applications being run.

Qualcomm’s Krait cores (S4 pro) are asymmetrical, so each core can be turned on and off individually to save on power consumption. Big.LITTLE on the other hand is a hybrid of both architectures; the sets of cores are SMP, but each group can be controlled asymmetrically.

Nvidia’s Tegra processors are the weirdest of the bunch. The companion core can be controlled individually, and is asymmetrical to the four main cores. However the Tegra main cores can also be shut off individually by gating their power, but they can’t be clocked individually like a true AMP processor.

But which is the most efficient method of conserving energy?

Unfortunately we can’t compare the actual power draw from each chip yet, so we’ll have to try our best to infer performance from the way the architectures are designed.

Both the S4 Pro and Tegra 4 use ARM Cortex A15 processors, which have a higher minimum power consumption than the Cortex A7 and A53. In the lowest power states the S4 Pro and Tegra 4 will be running a single A15 with a low clock, whereas the big.LITTLE will be running four A53s at a low clock.

Graph shows that even the Cortex A7's highest performance state consumers less power than the A15's lowest performance state.

Credit: ARM Graph shows that even the Cortex A7’s highest performance state consumers less power than the A15’s lowest performance state.

Overall the minimum power consumed is likely to be very similar. The AMP nature of Tegra and the S4 Pro might give them a slight advantage over the A53, but not the A7. However both Qualcomm’s and Nvidia’s designs will require the power hungry cores to be switched on if more than one core is required, instantly adding more power drain. Big.LITTLE can run 4 cores in the lowest power state, only needing to up the core clock speeds and voltage slightly in order to increase performance.

The real benefit of big.LITTLE shines through when dealing with medium performance requirements. The two competing models from Qualcomm and Nvidia have to turn on their high performance cores if anything above minimum power is required. Where as big.LITTLE can stay on the most power efficient cores until totally necessary, resulting in lower average power consumption.

At the top end, the introduction of dual A57s should also help the new chip reduce peak power consumption, which should see big.LITTLE beat out the competitors here as well.

Conclusion

Overall, the big.LITTLE architecture seems to provide the best balance of power consumption, multi-processing support, and peak performance. That’s compared with the competing architectures, which only provide power efficiency at the lowest levels of operation. The A53/A57 improves on the already impressive Exynos 5 Octa chip, and looks to be an outstanding processor when it finally hits the market.

Whilst a direct comparison between the Exynos 5 Octa and the A53/A57 isn’t exactly fair, as newer chips will have obvious speed advantages, it’s the design implementation differences which are really worth noting. The trade off between peak performance, general performance, and idle power drain is a tough balance to strike, but the unique dual core/quad core design for smartphones could well be the sweet spot.

Comments

  • MasterMuffin

    Me want 8 a57 cores with 5000mAh battery :D

  • Simon Belmont

    Great write up, guys. Looking forward to the next CPU generation.

    Though I think that the AMD application will be mostly for servers. This architecture will scale to phones and tablets too.

  • http://www.facebook.com/bella.pease.75 Bella Pease

    Mr Triggs wrote “Both the S4 Pro and Tegra 4 use ARM Cortex A15″

    Unfortunately, this shows how very little you know, the S4 Pro uses custom designed Krait cores, that use some similar elements found in the A15, but fundamentally different pipeline and design logic. It seems that apart from reading press releases, you don’t have a first clue on the technical side needed to make an informed technical analysis. Best stick to reviewing fart apps, there’s a good boy.

    • Craig

      +1. Each and every one of his articles is just a babbling, barely-coherent collection of press release snippets.

      • http://www.facebook.com/bella.pease.75 Bella Pease

        Very true, the article is full of crap, his assertion that in-order CPUs are always more efficient than Out-of-Order is donkey balls, and completely ignores the concept of race to sleep, enhanced by advanced power gating and sleep states, is just one of many.

    • Roberto Tomás

      Okay so the S4 Pro is a custom chip. but a lot of the customizations are Qualcomm just implementing in an old A7 design features that ARM implemented themselves in the newer designs. These equivalencies aren’t exact, you are right. But it’s not as different as you seem to think. Also, the other chip manufacturers also modify the design they purchase from ARM: They are expected to. Apple and Nvidia both back-ported (implemented) memory controllers from the next design tier into their latest designs, for example.

  • dan

    “What makes the A7 and A53 consume less energy compared to their
    equivalent high end chips is due to the fact that they use in-order
    execution, so processes can only be completed in the order they are
    received. This is power efficient, but reduces performance for
    multi-threaded tasks compared with out of order execution – which allows
    processors to speed up processing by reordering instructions.”

    In-order and out-of-order execution refers to the order in which individual instructions are executed by the CPU – not the order of execution of processes and threads, which is managed by the kernel. The amount of code that can be executed out of order depends on what the code’s doing and compiler optimizations.

    However, I am curious as to how does in-order execution consume less power than out-of-order, other than allowing for simpler/smaller CPU architecture.

  • john

    The thing is…
    These processors use very small amount of power. Yes, less power consumption is great, but I would rather see them making chips that use same amount of power but with greater processing power-not necessarily a higher clock speed, but more parallel implementation to get more work done in a same clock. Mobile platforms nowadays can be a great place to force developers to acquire knowledge in parallel processing and get used to idea of writing it, cause the only way to make processors with same sized transistors faster without too much power is parallel processing, not a higher clock speed. Oh and micro-watt FPGA modules in a phone wouldn’t hurt anybody- would use tiny amount of power for hardware acceleration, better connectivity, cheaper for device manufactures in long term, and some clever programmers can use it as an accelerator that has unmatched response speed and great operations/watt.

    I mean display is still the huge power user in any phones. So battery life increase due to better processor is going to diminish as the processor sips less power in terms of percentage.

    • Roberto Tomás

      Taking a cue from Intel, it seems like 8 parallel cores is not infeasible in mobile designs.

      • john

        Well I’m inclined to agree. Intel from the dawn of things tried all sort of gimmicks to make their CPUs faster: higher clock lower ops per clock, lower clock higher ops per clock, CISC, RISC, and so on. If Intel says it cannot be done, it cannot be done. I’ve yet to see Intel attempting the heteregenous system. However all these points are null as we are discussing the performance of CPUs and other microcontrollers based on the consumer-dumb- level of features.

  • hot_spare

    Nice. Good article. But you need to correct “Both the S4 Pro and Tegra 4 use ARM Cortex A15 processors…” S4 doesn’t use A15, but Qualcomm’s own Krait architecture.

  • FYI

    You have lot of technical details but the title “How big.Little A53/A57 chips will put to shame the S4 Pro…..” suggest that you are not a technical guy. FYI, ARM provides processor cores and Qualcom, nVedia, Samsung use them to make SOCs like Snapdragon, Tegra, Exynos. Less than 30% of total die size of an SOC comes from the ARM cores. Rest of the SOC area is because of the GPUs, DSPs and zillion of smaller interface blocks. So, in sort, there is no comparison of ARM cores with SOCs.

    Also, qualcom, nVedia and samsung will have A53/A57 based socs (with or without big.little) very soon. Remember, ARM does not produce any processors or SOCs and they never did and may be never will.

  • techpunch

    “Put to shame” – Really? What exact benchmarks have you run before deciding to print such major prophecies? I’m assuming none because A53/A57 are not yet available [and will not be any time this year in a shipping product]. Neither are there any devices currently shipping with Tegra 4 or Exynos 5 Octa. Also, if you look carefully at the Cortex A50 series graph, there’s no “put to shame” kind of performance improvement over the A15/A7 combo.

    Its a pity that you need to resort to such deceptive headlines to attract readers.

  • Rob C

    What will 64 Bits buy you; faster memory access, the ability to adress larger Memory (than the Phone can accommodate), and the ability to do 64 bit (or larger) math faster (than on the CPU, but not the GPU, in most cases).

    The Desktop really only transitioned to 64 Bits 3 to 6 years ago (depending on who you ask / are) and that was driven by large Computer Programs, large Memory, large Disk Drives, and a large Chassis plugged into the Wall Socket.

    It will be 2020 before some Phones might need 64 Bits but I AM glad they are making this Chip. It will be great for Tablets that are intended to replace some Desktops.

    It’s usefulness in a “Phone” is hardly debatable; so no need to upgrade next year (unless you are planning on running Linux, that could make a lot of sense, for Geeks).