samsung-exynos-5-octa-official-1

Since the release of the first Galaxy S devices, Samsung has relied heavily on its in-house System-on-a-Chip (SoC), the Exynos. The CPUs in the Exynos chips are based on the Cortex series, licensed directly from ARM. Samsung frequently includes Mali GPUs in Exynos chips, which are also licensed directly from ARM. In contrast, Samsung’s primary SoC competition comes from Qualcomm and Apple – both of whom merely license the ARM instruction sets for compatibility and then design their own CPU architecture. While Samsung’s relatively boilerplate designs have fared well – Exynos chips are consistently among top industry performers – there are increasing signs that, with the Exynos 5 Octa, the chip may not be all it’s cracked up to be.

A bit of history

Samsung faces two major obstacles in the SoC market. First, it’s simply unable to compete with Qualcomm’s LTE baseband products, which severely limits Samsung’s ability to deliver SoCs with integrated mobile connectivity. Second, Samsung is largely at the mercy of others for real improvements to its chips – either by ARM in design or by industry limits on fabrication. Arguably, GPU development is another issue for Samsung, but the inclusion of the beefy PowerVR SGX544MP3 with the Exynos 5 Octa (5410) proves that the company is willing and able to simply buy its way around that issue.

Fortunately for Samsung, its LTE connectivity issues can be resolved in the same manner. To bring LTE connectivity to the Exynos 4 Quad (4412) in the Note 2 or to the Exynos 5 Octa in the Korean Galaxy S4, Samsung paired the Exynos chip with a Qualcomm baseband chip. Apple does the same thing with the iPhone 5. The ongoing problem for Samsung in this space is one of timing. Qualcomm tends to release its latest baseband chip embedded in the latest Snapdragon SoCs and only releases the baseband as a discrete (stand alone) unit months later. This works for Apple because it releases new iPhones in the fall, after Qualcomm’s discrete units are available. By releasing the flagship Galaxy S phones in the late spring, Samsung is left to choose between using the latest baseband technology – and therefore the rest of the Snapdragon SoC – or pairing last year’s Qualcomm baseband with this year’s Exynos. For both the S3 and the S4, Samsung chose the Snapdragon in major LTE markets around the world, chips that have been cannibalizing Exynos sales along the way.

Galaxy S2
Samsung Galaxy S2
Samsung Galaxy S2

The recent history of the CPUs in Samsung’s Exynos processors might suggest that LTE connectivity is not the only compelling reason for the company to prefer not to use its own SoCs. Looking back, it’s possible that the Exynos 4 Dual (4210) in the Galaxy S2 may have been the high water mark for the SoC line. That chip included a dual-core ARM Cortex-A9 CPU built on 45nm. A year later, Qualcomm was shipping the Snapdragon S4, which had dual-core Krait CPU built on 28nm. Krait is Qualcomm’s self-designed architecture built to compete against ARM’s Cortex-A15. The expectation was for Samsung to deliver something similarly advanced in the Galaxy S3. Instead, it released the Exynos 4 Quad (4412). With a quad-core ARM Cortex-A9 CPU built on 32nm, it amounted to only a core increase and size decrease over the previous generation. With double the number of cores and a severely overclocked GPU, the Exynos 4 Quad was strong enough to regularly outperform the Snapdragon S4. It was reasonable at this point to attribute the decision to release the Galaxy S3 in the U.S. with the Snapdragon SoC entirely to fix Samsung’s LTE problem, but the lack of innovation in the Exynos 4 Quad portended larger problems on the horizon.

After it was unable to produce the chip in time for the Galaxy S3 launch, later in 2012, the company finally released the Exynos 5 Dual (5250). It includes a dual-core ARM Cortex-A15 CPU built on 32nm. The chip delivers decent performance, but Samsung’s inability to control the power consumption of the Cortex-A15 design relegated the chip to use only in the Samsung Chromebook and the Samsung-built Google Nexus 10. This is the chip Samsung was supposed to deliver in the Galaxy S3 and instead it came extremely late and was not, in truth, particularly good.

Exynos 5 Octa and the Galaxy S4

All of this leads up to the Exynos 5 Octa and the Galaxy S4. The Octa is used in the GT-I9500 variant of the Galaxy S4 and is supposed to solve the problem of Cortex-A15 power consumption by using ARM’s big.LITTLE architecture. big.LITTLE allows the use of two core clusters, one for high-performance tasks and one for low-performance tasks. In the Octa, this is a quad-core Cortex-A15 cluster and a quad-core Cortex-A7 cluster, all built on 28nm. In big.LITTLE, there are supposed to be three modes for managing threads across all of the cores in both clusters. Evidence thus far suggests that the Octa really only supports one of these modes – the least efficient. Even worse, it appears that this limitation is due to crippled hardware in the SoC and not something that can be fixed in software.

The first unsupported mode in the Octa is called core-migration. In this mode, each of the four Cortex-A15 cores is ‘paired’ with a Cortex-A7 core. At idle, one A7 core in the first pair would be running at minimum speeds and the others are all deactivated. As load increases, either another pair would come online with its A7 core, or the first pair would ramp up to the A15 core. Each pair is able to independently toggle between the A7 and A15 cores as needed depending on load and threading. This adds efficiency and greatly reduced power consumption. The second unsupported mode in the Octa is called heterogeneous multi-processing (HMP), which allows tasks to be scheduled across all 8 cores independently. This has obvious benefits in maximum power output, but might not always yield improvements in power consumption.

big.LITTLE_HMP

The Exynos 5 Octa supports only what is called cluster-migration. In this mode, only one cluster is active at any time. At idle, one A7 core is active and the other 7 remain offline. As single-threaded load increases, the system will switch to the A15 cluster. At that point, when threaded load increases – even very low load – the system will bring more A15 cores online. Limitations of ARM’s architecture exacerbate the efficiency problem of this mode. Each cluster operates on it’s own unified frequency plane. This means if one A15 core is online at maximum frequency, every additional bit of load will bring up another A15 core at maximum frequency – even if that additional load could have been accomplished by an A7 core at minimum frequency.

It appears that the Octa is limited to cluster-migration because of a hardware deficiency. Core-migration requires the use of a part called the Cache Coherent Interconnect (CCI). As the name suggests, the CCI provides a coherent cache across both big.LITTLE core clusters, allowing for a given process to seamlessly transition between both. HMP would ordinarily use this as well, but it can theoretically design around it. Unfortunately, those workarounds would almost certainly cost even more in power consumption. The Exynos 5 Octa includes a CCI, but it is disabled by default. XDA developer AndreiLux has found that it cannot be properly enabled either.

According to Samsung, there is no hardware problem at all and the company chose cluster-migration because it “show[s] increased performance/efficiency.” But the statement does not match up with most people’s understanding of ARM’s big.LITTLE architecture. Moreover, ARM demonstrated core-migration working on a pre-release version of the Octa. And Samsung’s released kernel source code for the Octa includes the drivers for core-migration. But that code does not work in the final release version of the Octa and Samsung has been coy in giving a straight answer. Based on this, Linus Torvalds wrote:“quite frankly, the fact that the Exynos 5 currently only works in ‘either or’ configuration almost certainly means that there is something fundamentally wrong with the hardware design, to the point where no amount of ‘complex patches’ can fix it.” While this certainly makes it seem that Samsung has done something wrong in the Octa, the chip is still entirely based on designs from ARM. Torvalds goes on to point out, he has “very little reason to believe that ARM engineers got their cache handling right. They’ve never done that before. They’ve had some of the crappiest caches on the planet.” So, it is very likely that the problem is not even inside Samsung’s control.

HTC One vs Galaxy S4 benchmarks

After releasing the mediocre Exynos 4 Quad and the disastrous Exynos 5 Dual, the Exynos 5 Octa was supposed to put Samsung back on track in SoC development. But the chip is shaping up to be another disappointment. It is included in a minority of Galaxy S4 devices globally and rumors at this point suggest that the Galaxy Note 3 will include a Snapdragon 800 SoC. Samsung had even been rumored to produce a mid-range Exynos 5 Quad (5210) in 2+2 ARM big.LITTLE configuration, but those plans appear to have been shelved. There is very little information about what Samsung plans to do next with the Exynos line at this point.

Wrap up

Moving forward, Samsung’s best hopes are two major advancements coming to ARM chips. The first is another manufacturing size drop. ARM chip fabricators are struggling to keep up with Intel. They are expected to skip the 20/22nm process in favor of the 16nm process in order to catch up to Intel’s recent push toward 14nm. As the process size decreases, SoC makers can squeeze more performance out of the same CPU architecture while using less power. The other advance is the ARMv8 instruction set and the Cortex-A50 series CPUs. This will bring ARM SoCs into the world of 64-bit processing and the new core architectures are significantly more powerful and efficient the A15 and A7. Fabricators do not expect to be ready to mass produce SoCs on 16nm until late 2014 at the earliest. This means that ARMv8 and Cortex-A50 will likely not hit mass production until early 2015.

Unfortunately for Samsung, the company has very few options for improving the Exynos line for the next 18 months. The company could follow Apple’s lead and start designing its own CPU architecture, but such a move would still require years of investment. There are even options for innovation beyond the designs from ARM as NVIDIA shows with the Tegra series. But the most likely outcome is likely to be Samsung devices with Snapdragon SoCs becoming more common.