Custom cores versus ARM cores, what is it all about?
When reading about CPU core designs you have probably come across the term “custom core,” especially when reading about Apple or Qualcomm. So what is a custom core? Why do people make such a fuss about them? And, who designs them? Well, let’s find out!
The vast majority of Android smartphones (and all iPhones) use CPUs based on the ARM architecture. An “architecture” in this context means the instruction set and the design philosophy behind that instruction set. Other processor architectures for mobile include Intel and MIPS.
The ARM architecture is known as a RISC (Reduced Instruction Set Computer) architecture, where the idea is that by using a simplified instruction set then instructions can be executed quickly, but you might need to execute more than one instruction to achieve the same result as a single instruction on a CISC (Complex Instruction Set Computer) processor. There are also some other design decisions that are fundamental to RISC including that all data processing operates only on register contents, not directly on memory.
Intel uses a CISC approach, however ironically, most CPU designs today take the instructions (complex or simple) and reduce them further into microcode before they are executed.
ARM’s business model is different to Intel’s in that ARM licenses (sells) it CPU designs (i.e. Intellectual Property or IP) to its customers who then in turn build their own chips. ARM gets a royalty fee for every chip sold, plus the licencees need to have their chips certified as being ARM compatible. Intel on the other hand design, build, manufacturer and sell their own chips and only their own chips.
Among ARM’s customers are companies like Qualcomm, Apple, Samsung, MediaTek, Huawei, Rockchip and so on. Each of these companies has a business relationship with ARM that allows them to build processors that are compatible with the ARM architecture. There are two general levels of license: core licenses and architectural licenses. A core license allows ARM’s partners to take a full core design (like the Cortex-A72) and incorporate it into a System-on-a-Chip (Soc) along with a GPU, memory controller, etc. The company has the right to use the core design however it likes, in whatever configurations it wants, however it isn’t allowed to modify the core design. ARM currently licenses four individual 64-bit core designs: the Cortex-A35, Cortex-A53, the Cortex-A57, and the Cortex-A72. There are more core designs in the pipeline, some of which will likely be announced during 2016.
An architectural licensee is allowed to design its own ARM architecture cores and then use those cores however it wants, in any configuration it desires, as long as the core design is compatible with the ARM instruction set. Architectural license holders include Qualcomm, Apple, Samsung, NVIDIA and Huawei.See also: ARM lead architect talks to AA about the Cortex-A72
And this is where the term custom core comes from, it is a CPU core design, made by ARM architectural licensees, that is compatible with the ARM architecture, however isn’t an ARM Cortex-A design.
Most (if not all) architectural licensees are also core licensees , which means that the company will have SoCs in its product range that use both ARM Cortex-A core designs and cores designed by its own teams.
Qualcomm is a “classic” example of a top tier ARM licensee. It holds both architectural licenses and core licenses. If you look at its current range of 64-bit SoCs (I am writing this at the very beginning of 2016) you will see that they are all based on ARM Cortex-A53 and Cortex-A57 designs. For example, the Snapdragon 810 uses four Cortex-A53 cores and four Cortex-A57 cores in a Heterogeneous Multi-Processing (HMP) configuration using ARM’s big.LITTLE technology.
However Qualcomm also has its own “custom” core designs. The most famous, until now, was its Krait core design found in SoCs like the Snapdragon 801 and the Snapdragon 805. As I mentioned above, Qualcomm’s first generation of 64-bit processors all use ARM’s Cortex-A designs, however for its second generation of 64-bit processors Qualcomm will be using both Cortex-A designs and its own designs. The Snapdragon 650 and Snapdragon 652 will use Cortex-A53 and Cortex-A72 cores, however the Snapdragon 820 will use Qualcomm’s home grown Kryo core.See also: Qualcomm Kryo and heterogeneous computing explained
Qualcomm has already sent developers its first Snapdragon 820 powered devices under its MDP (Mobile Development Platform) program, and there are also some initial benchmark scores available. What we know about the 820 is that it has four Kryo cores, arranged in a heterogeneous multi-processing (HMP) configuration, with two high-performance cores clocked higher and paired with more L2 cache, and two low-power cores that have a lower clock speed and smaller L2 caches.
It also seems that Qualcomm has also worked on improving the memory bandwidth of the 820. According to Geekbench 3 it has twice the bandwidth of the Snapdragon 810. The improvements are likely due to improvements to the memory controller and the general architecture governing memory transfers, allowing the chip to make optimum use of the theoretical bandwidth offered by the two LPDDR4 memory controllers, which offer a theoretical bandwidth of 28.8GB/s. The fastest speed clocked across the tests revealed a peak bandwidth of 17.4GB/s on the 820, compared with 7.5GB/s for the Snapdragon 810.
While Qualcomm may represent a typical ARM architectural licensee, Apple is an atypical example. Qualcomm (and others) make SoCs that are then sold to smartphone makers. Qualcomm doesn’t make any consumer devices itself. Apple on the other hand designs SoCs that are used exclusively in the iPhone and aren’t sold to anyone else.
All iPhones, from the original iPhone to the latest, use ARM based processors. Over the years Apple has used ARM Cortex-A designs – the iPhone 4S used a dual-core Cortex-A9 SoC (the Apple A5), as well as its own designs. The iPhone 5 used Apple’s A6 SoC which had two Swift cores. Swift was Apple’s first custom core design. It is a 32-bit ARMv7 compatible design that improves on the Cortex-A9 by adding support for features like Advanced SIMD v2, and VFPv4.
Apple’s decision to move from ARM supplied Cortex-A cores to its own in-house cores was a result of Apple’s 2008 purchase of P.A. Semi, a chip design company founded by Daniel W. Dobberpuhl the lead designer for the DEC Alpha 21064 and StrongARM processors. It took a few years before the team was ready to release its first clean sheet SoC design, however once it did then Apple has never gone back to ARM core designs.
After Swift came Cyclone, a 64-bit core design that caught the rest of the ARM based chip designers sleeping on the job. The Apple A7 SoC was released in September 2013 for use in the iPhone 5S (and various iPad models). About 12 months before hand, ARM had announced its Cortex-A53 and Cortex-A57 designs and listed among the initial licensees companies like Samsung and Huawei. However there was no mention of Qualcomm. At the launch of the iPhone 5S none of the other SoC makers had announced any plans for 64-bit processors.
Following the A7’s release a senior vice president from Qualcomm, called the A7 chip a “marketing gimmick,” however Qualcomm later issued a statement saying that the comments were inaccurate and that “the mobile hardware and software ecosystem is already moving in the direction of 64-bit; and, the evolution to 64-bit brings desktop class capabilities and user experiences to mobile, as well as enabling mobile processors and software to run new classes of computing devices.”
However Apple now had an 18 month head start over Qualcomm in terms of 64-bit computing, and a three year lead in terms of custom 64-bit cores. This is exemplified in Typhoon and Twister, Apple’s second and third generation 64-bit ARM core designs.
With each generation of its custom core, Apple has been able to tweak the designs to get more and more performance. It is thought that the latest performance gains in the Apple A9 SoC come from a bump in the clock frequency, changes to the way the Level 1, 2, and 3 caches are used, and from microarchitecture improvements.
Samsung was the last to join the custom core party. Until now all of its ARM based processors, both 32-bit and 64-bit, have used ARM Cortex-A designs. For example the Exynos 7420 uses four Cortex-A53 cores and four Cortex-A57 cores, much like the Qualcomm Snapdragon 810. However rumors started to emerge during 2013 that Samsung was designing its own ARM core.
Codenamed Mongoose, Samsung confirmed recently that the next generation of Exynos processors would use its own custom CPU cores based on the 64-bit ARMv8 architecture. In fact, the Exynos 8 Octa will use four custom cores and four ARM Cortex-A53 cores in a big.LITTLE configuration.
Little else is known about the design of the Mongoose core at the moment, however it is expected that the Exynos 8 SoC will include an ARM Mali-T880 GPU and will be used in some models of the Samsung Galaxy S7.
It is probably fair to say that NVIDIA’s SoC strategy is incohesive at best, after the relative success of the Tegra 3 (released in 2011), the company embarked on the design of the Tegra 4. However industry insiders say that the development process was dogged with problems and requirement changes, even during the late stages of the development life cycle. The resulting chips, the Tegra 4 – a Cortex-A15 based SoC, and the Tegra 4i – a Cortex-A9 based SoC, had little commercial success.
Next came the Tegra K1, a mishmash of a design which came in either a 32-bit quad core (plus one) Cortex-A15 based version or a 64-bit dual-core version based on NVIDIA’s custom Denver core design. Project Denver was initially envisioned as a general purpose CPU core that ran software for both Intel x86 and ARM processors, by using binary translation and code morphing. The ideas were similar to those used by Transmeta, the ill-fated semiconductor company which once employed Linus Torvalds. According to Charlie Demerjian of SemiAccurate, the Project Denver CPU was originally intended to support both ARM and x86 machine code, but support for x86 was dropped because NVIDIA could not obtain a license from Intel.
About the only device which used the Tegra K1 was the HTC Nexus 9 tablet. In the past NVIDIA included a project known as Parker on its product road maps, a SoC which was also meant to be based on Denver. However it is likely that Parker and Denver are dead. The main problem is that Transmeta tried and failed to commercialize code morphing processors and it was hubris of NVIDIA to think it could succeed where Transmeta failed.See also: Nexus 9 review: Google’s best tablet yet isn’t perfect
After the Tegra K1 came the Tegra X1, a SoC based on ARM Cortex-A core designs, four Cortex-A53 and four Cortex-A57 cores to be precise. Having abandoned its custom core deigns it seems that NVIDIA will be sticking with ARM core designs for the moment.
Are custom cores better?
So here is the question, are custom cores better than ARM cores? Well it depends on what you mean by better. There are several ways to characterize a CPU core, some of which are not technical. As well as performance and efficiency (two technical characteristics) you also need to consider cost, marketing, diversity and purpose.
At this current moment there are probably 4, maybe 5, teams of engineers around the world designing smartphone CPU cores based on the ARM architecture. One team belongs to ARM itself, the others to Apple, Qualcomm and Samsung. Like all industries (e.g. cars, textiles, bio-research, whatever) one team will be ahead of another in terms of one aspect or another.
In terms of who makes the highest performance cores, it could be Apple. I say could be for three reasons. First, smartphones with the Snapdragon 820 aren’t yet available to consumers (at the time of writing) nor are devices using a Samsung Exynos 8 series processor or any processors using the Cortex-A72. Second, benchmarking raw CPU performance across two different operating systems is hazardous. For example, is Geekbench 3 on iOS doing exactly the same things as Geekbench 3 on Android? Thirdly, the apparent performance of Apple’s SoCs can be due to other external factors including the fact that Apple “owns” the whole eco-system from the low level 1’s and 0’s running in the CPU, up through iOS and the compiler, and onto the handset itself. For Android it is a different story, the core design might come from ARM, the physical chip from Qualcomm, the OS from Google and the handset from Samsung. Such a system has advantages, but it also has disadvantages.
Because Apple controls everything from the SoC to the device including the OS and the compilers, there is a lot of speculation about what extra things Apple includes in its processors that are only available to Apple itself. For example, the ARMv8 instruction set includes special instructions for performing encryption in hardware. These instructions are available in all 64-bit ARM compatible cores including those from Apple. Now what if Apple found that iOS performed certain operations in software that could be improved by adding support in hardware? It could implement that support by adding new instructions to its CPU cores or by adding other bits of discrete hardware to the SoC, which are then used by the OS. Other SoC makers add image processors or DSPs to their chips and it is known that Apple adds parts like its motion co-processors, but Apple is very tight lipped about what else it adds. Companies like Qualcomm need to be more open about what is included on a SoC, as they want to sell their chips to handset makers. But Apple doesn’t need to be open at all, in fact Apple is famous for being very hush-hush about these things, unless there is a marketing advantage. MediaTek’s new Helio X20, which uses the Cortex-A72 core, also includes a Cortex-M4 microcontroller (with DSP). This is used by the chip to support diverse always-on applications, such as MP3 playback and voice activation. The question is alongside its custom cores, what has Apple added to its SoCs?
This brings us to the concept of Heterogeneous Computing (HC). Not to be confused with Heterogeneous Multi-Processing (HMP), HC allows tasks to be allocated to the CPU, GPU, DSP, ISP or any other processor that might be able to handle the task most efficiently. You see, processors can be designed to perform certain tasks more efficiently, but a single design struggles to be great at everything. Your typical CPU may be good at serial processing, while a GPU can handle streams of parallel data and a DSP is better optimized for crunching numbers to high accuracy in real time.
One of the things that Qualcomm is promoting with the Snapdragon 820 is it HC capabilities and Qualcomm’s Symphony System Manager. With the Symphony System Manager tasks can be shared between any of its Kryo CPU cores, its Adreno 530 GPU, Hexagon 680 DSP and the Spectra camera ISP. Of course the key is software support, a way for an app developer to tell the underlying OS that a certain activity is suited for a particular type of processor.
Going back to performance for a moment, initial benchmark data suggests that the Snapdragon 820 and the Exynos 8 will offer performance levels similar to the Apple A9 SoC, however they won’t surpass it. Since Apple is on its third generation 64-bit processor design, Qualcomm and Samsung are on their first, and ARM is on its second (the Cortex-A57 followed by the Cortex-A72) then it seems logical that Apple will remain ahead of the field for the time being.
So, today it is probable that Apple’s team is ahead in terms of performance, next year a different team could be. Having said that, each of these teams are made up of highly experienced CPU designers and while the difference in performance is measurable, it isn’t drastically different.
When it comes to efficiency then ARM is the clear winner, for two reasons. First ARM has built its entire business on designing power efficient CPU cores, and cores like the Cortex-A35 demonstrate the company’s abilities in this area. However, it also has the advantage that no one else is designing ARM cores targeted especially for power efficiency. You see all the other “custom” core designers are aiming for performance first, efficiency second. ARM doesn’t just do one core like Apple, it has the Cortex-A35 at the high efficiency end of the scale, and the Cortex-A72 at the high performance end, with the Cortex-A53 and Cortex-A57 sandwiched in between them. Neither Apple, Samsung or Qualcomm cover that kind of range.
Which brings us to the “why,” as the Merovingian might say. What is the point of designing a custom ARM core? To design a custom core is expensive, you need to employ a team of highly skilled CPU engineers over a long period (several years) to build a CPU core, which is at best a few percentage points faster than your rival, plus it will become obsolete in a matter of months. Nobody is talking about the high performance of the Apple Swift core today, it has been surpassed by everyone including by Apple itself. The “why” is marketing. The smartphone market is highly, almost insanely, competitive. The smartphone market is also big business. Apple’s iPhone revenue is bigger than the incomes of Intel and Google’s combined!
If a company like Apple, or Samsung can differentiate themselves from the competition by using custom cores then it seems (at the moment) to make economic sense for them to design their own cores. Of course, there is a risk, if the core turns out to be a “dud” then it can severely damage the business. As for the other smartphone makers, they need to buy their SoCs from a chip maker like Qualcomm and they also demand high performance SoCs. What is interesting to see is that when Samsung and Qualcomm need core designs for non-flagship phones they both opt for ARM’s core designs, you see there is no kudos in having a mid-range phone with a mid-range SoC that features a custom core design. The only possible exception is if you have a proven design that was last year’s top core, but has been superseded. To get more return on your investment then that core design can be re-purposed for the mid-range. That is what Qualcomm did with its 32-bit Krait design.
So what does this all mean to the consumer? At the highest level it means that innovation and progress is very much alive-and-well in the ARM eco-system. This is fueled by competition and rivalry between companies like Apple, Samsung and Qualcomm. ARM is also caught up in this competition and is being pushed further to continue its innovation.
It also means that there is lots of choice, especially at the high-end. Which core design do you want? Kryo, Mongoose, Cortex-A72, or Twister? The choice is yours! For premium mid-range to low-end phones there is even more choice: Cortex-A57, Cortex-A53, Krait and soon Cortex-A35. On top of that, all these cores come in a multitude of different configurations: dual-core, quad-core, hexa-core, and octa-core, all with or without HMP.
Such competition and variety has also proven that there isn’t just one “right way.” Apple’s A9 SoC is dual-core. The Snapdragon 820 is quad-core, with HMP. The Exynos 8 is octa-core, also with HMP. MediaTek’s Helio X20 is a deca-core chip with two Cortex-A72 cores and eight Cortex-53 cores, plus HMP. Each configuration has its advantages and disadvantages.
Finally it shows us that in fact ARM cores are themselves “custom” cores, “custom” cores that just happen to be built by the same people that designed the architecture. But just like Apple, Qualcomm or Samsung, ARM’s engineers work to build cores for different purposes, but not just the high-end, but also for the ultra power efficient market or for the low-end smartphone market.