One of the most popular smartphone processors at the moment is the Samsung Exynos 7420, mainly because it is the processor used by Samsung for its current range of high-end devices including the Samsung Galaxy S6, the Samsung Galaxy S6+ Edge, and the Samsung Galaxy Note 5. The Exynos 7420 is an octa-core processor which means it has 8 CPU cores, each of which is capable of running a task in parallel with other tasks running on the other cores.
With 8 cores and the possibility to run 8 tasks in parallel, it is important to understand what level of parallelization is offered by this high performance CPU.
Earlier this year I wrote two in-depth articles about the nature of multiprocessing on Android and specifically on ARM based CPUs. The first article debunked the myth that Android apps only use one CPU core, while the second looked at how the Samsung Galaxy S6 uses its octa-core processor.
Both bits of research showed how Android utilizes the parallel (multi-core) nature of modern processors. Samsung’s Exynos 7420 is an ARM based processor with built-in Heterogeneous Multi-Processing (HMP). In general, the quad-core processors found in everything from desktops to smartphones have a set of cores which are all equal in terms of their performance and power consumption. In a HMP CPU, not all the cores are equal (hence, heterogeneous). The Exynos 7420 has a cluster of Cortex-A57 cores and a cluster of Cortex-A53 cores. The A57 is a high performance core, while the A53 has greater energy efficiency. This arrangement is known as big.LITTLE, where “big” processor cores (Cortex-A57) are combined with “LITTLE” processor cores (Cortex-A53).
When tasks are run on the LITTLE cores they use less power, they drain the battery less, however they may run a little slower. When tasks are run on the big cores, they finish sooner but they use more battery to do so.
Once we understand that not all cores are equal, it is then interesting to see how Android uses those cores and what level of simultaneous processing occurs, and on which cores, big or LITTLE?
My previous tests use a tool, which I wrote myself, to determine how the CPU is being used. It uses the various pieces of information about the activity of the Linux kernel which are made available via the /proc/stat file. However, it has a shortcoming. Since the data about the CPU usage is being generated by polling /proc/stat it means that some tasks can appear to be parallel when in fact they aren’t.
The polling interval is around one six of a second (i.e. around 160 milliseconds). If a core reports its usage is 25% in that 160 milliseconds and another core reports its usage is 25% then the graphs will show both cores running simultaneously at 25%. However it is possible that the first core ran at 25% utilization for 80 milliseconds and then the second core ran at 25% utilization for 80 milliseconds.
To delve deeper into the parallel nature of the Exynos 7420 I have switched from using my own tool to the open source Workload Automation tool. Written by ARM it is designed for running tests that exercise the CPU on Android and Linux devices. The key thing is that it supports the Linux kernel internal tracer known as ftrace.
This means that information about the exact scheduling of the CPU cores can be extracted directly from deep within the Linux kernel itself. The result of which is that the polling interval weakness of my CPU usage tool is eliminated.
If I was to ask you what is the most arduous task that your smartphone’s CPU performs, you might think it would be a game like Modern Combat 5 or Asphalt 8, and you would be right to a certain degree. However the thing about big 3D games is that they load the GPU just as much (or even more) than the CPU. Although the CPU is used quite heavily during 3D gaming, a big chunk of the workload is handled elsewhere. If we are looking for a job which makes the CPU sweat a bit, it is in fact web browsing!
Here is a set of graphs which show how the CPU is used when browsing the Android Authority website using Chrome:
There are three graphs. The first one on the top-left shows how the four Cortex-A53 cores are used during 90 seconds of web browsing. As you can see for 18% of the time none of the cores are being used, effectively the cluster of Cortex-A53 cores is idle. For 19% of the time 1 core is being used, for 18% of the time 2 cores are being used in parallel, 3 cores for 19%, and 4 cores for 24% of the time.
The graph on the top-right shows the same data but now for the cluster of big Cortex-A57 cores. For nearly 60% of the time one big core is in use and for 14% of the time 2 cores are in use. In fact, for over 80% of the time 1 or more Cortex-A57 cores are being used.
The graph at the bottom shows the overall level of parallelization across all of the CPU cores. For less than 4% of the time the whole CPU is idle, for 15% of the time 1 core is being used, 2 cores for 16%, and so on. What is interesting is that for over 20% of the time 5 cores are being used in parallel.
If the Exynos 7420 was a quad-core processor then the scheduler at the heart of the Linux kernel would not have the option to use 5 cores simultaneously.
If the Exynos 7420 was a quad-core processor then the scheduler at the heart of the Linux kernel would not have the option to use 5 cores simultaneously. More than that, there are moments when 6, 7 and all 8 cores of the CPU are being used in parallel.
The situation for Firefox is similar, but not the same:
As you can see, Firefox mainly uses 2 and 3 cores in parallel, however for around 10% of the time it uses more than 4 cores. For Chrome, big Cortex-A57 cores were used over 80% of the time, for Firefox that number jumps to over 90% of the time.
At this point you might be thinking, well if Chrome and Firefox are using the big cores heavily then why not just build a CPU with just four Cortex-A57 cores and leave the Cortex-A53 cores out altogether? The answer is that the big cores use more battery life and the way big.LITTLE works is that they are only called upon when needed. The little cores are still being used for around 75% of the workload and, as we will see soon, some workloads don’t even use the big cores!
We shouldn't underestimate the capabilities of the Cortex-A53 cores.
Although we talk about big cores and LITTLE cores, we shouldn’t underestimate the capabilities of the Cortex-A53 cores. They are full 64-bit processing units which can perform exactly the same operations as the bigger Cortex-A57 cores, but they have been designed to have greater power efficiency. However for some tasks the Cortex-A53 is more than sufficient.
Here is the data captured when streaming a 720p YouTube video over Wi-Fi:
As you can see, all of the work is performed by the Cortex-A53 cores. Since the video decoding is actually performed by the GPU or a hardware video decoder, then the CPU is only responsible for the Wi-Fi, for getting the streaming data from the Internet, and for loading the right bits of memory for the video decoder to tackle the next frame. The result of this “relatively easy” load is that the big cores basically sleep the whole time. In fact, the Cortex-A53 cores spend almost one quarter of their time idle as well!
So, if the YouTube app only uses the Cortex-A53 cores because a lot of the video work is done by dedicated hardware, what does that means for games? Do they use the Cortex-A57 at all? Below is three sets of graphs for three gaming apps: Asphalt 8, Epic Citadel, and Crossy Road:
If you look at these graphs you will see that there is a general pattern. For the most part the games use 1 to 3 cores of the processor and occasionally peak at using 4 or 5 cores simultaneously. The Cortex-A53 cores are used for around 60% to 70% of the time, with the cores idling for around one quarter to one third of the time. However the big cores aren’t sitting idly, as with YouTube. What we see is that for Asphalt 8 and Epic Citadel are using 1 big core for at least half the time, and that even Crossy Road tends to lean on at least one big core. This is because gaming is a more complex activity than video streaming. Lots of gaming objects to create, manipulate and track. It is likely that the active Cortex-A57 core is being used for the most complex tasks performed by the CPU and the LITTLE cores for the rest.
I also tested Gmail, Amazon Shopping, and Flickr. However before we look at those, I want to bring your attention to the Microsoft Word app for Android:
As you can see the Word app behaves like many other apps. It uses a mixture of the Cortex-A53 and Cortex-A57 cores and it spends a lot of the time idle, due to the nature of the app. However what is interesting is that when the app has something to do, like creating a new document, it can use all 8 CPU cores. In fact it seems that when it is busy, it jumps straight from using a couple of cores right up to 8. The amount of time it is using 5, 6, or 7 cores is much less than the time it uses 8 cores.
As for the other apps, here are their graphs for your perusal:
The results of this testing is broadly in line with my previous tests and again underlines the parallel nature of Android and Android apps. It also highlights the power of Heterogeneous Multi-Processing and how the LITTLE cores are being used for most jobs and the big cores are being called upon for the heavy lifting.
This data also shows just how powerful a processor the Exynos 7420 is. At no time is the Exynos 7420 being asked to work overly hard, and there are lots of idle moments (which are good as it means that minimal battery power is being used). That being the case, it would be interesting to see how HMP works in other combinations other than just 4+4. For example, the LG G4 uses a hexa-core processor, the Snapdragon 808, rather than an octa-core processor. The 808 uses two Cortex-A57 cores and four A53 cores. Or at the other extreme, how HMP works in the deca-core Helio X20 from MediaTek.
Never underestimate the role of the GPU and other video hardware.
Finally, we must never underestimate the role of the GPU and other video hardware. Both the YouTube test and the gaming tests show the importance of the graphics part of the SoC.
So, what are your thoughts on Heterogeneous Multi-Processing, big.LITTLE, octa-core processors, hexa-core processors, deca-core processors, and the Exynos 7420? Please let me know in the comments below.