Qualcomm Kryo and heterogeneous computing explained
In amongst the frenzy of device releases yesterday, Qualcomm also begun giving its first details about its new Kryo CPU that will debut with its upcoming Snapdragon 820. Although Qualcomm hasn’t mentioned much about Kryo’s architecture and the chip isn’t scheduled to arrive until 2016, we now have a pretty good idea about where Qualcomm is going with the 820.
For a quick recap, all we have been told about Kryo is that it will appear in a quad-core configuration in the 820, clocked with a peak frequency of 2.2GHz,t it will be built on a 14nm FinFET manufacturing process, and offers twice the power or twice the energy efficiency of the current Snapdragon 810.
Qualcomm is licensing ARM’s architecture again for Kryo, but is developing a clean sheet CPU design, so no ARM Cortex-A72s, A57s or A53s this time around. Therefore, it seems unlikely that Qualcomm will be opting for an asymmetrical (big.LITTLE) CPU setup with the Snapdragon 820, instead the chip is probably more reminiscent of its older quad-core Krait Snapdragons, albeit at lower clock speed (2.2GHz vs 2.7GHz with the old 805) and with a new architecture.
Some of the performance and energy gains over the Snapdragon 810 are likely coming from this new CPU design, but a lot will also come from the jump down from 20nm to 14nm. Although not official, it’s possible that Samsung will be manufacturing the Snapdragon 820 on the same process that it used for its Exynos 7420.
Although we know that Android is pretty happy with large multi-core configurations, Qualcomm appears to be bucking this trend with a move back over to a power-house quad-core design. But the company isn’t completely turning its back on the theory of going wide, as there’s a big focus on Heterogeneous Compute with the Snapdragon 820.
The big news alongside Kryo is Qualcomm’s renewed focus on Heterogeneous Computing. Heterogeneous Multiprocessing (HMP) is already big in the Android space, see chips like the Snapdragon 810, Exynos 7420 or Helio X20, but Heterogeneous Compute (HC) is the next evolution. Let me quickly explain the difference.
When we talk about HMP we’re solely in the realm of the CPU; think big.LITTLE, core clusters, and task allocation. This generation of SoCs from all of the mobile players has been making use of ARM’s big.LITTLE technology and various companies have come up with their own task schedulers to allocate loads to the most appropriate CPU core, based on conditions such as energy efficiency, heat and the processing power required.Read more: ARM’s Intelligent Power Allocation adds some more clever to thermal management
Heterogeneous Computing brings additional processing components into the fold. With true HC, tasks can be allocated to the CPU, GPU, DSP, ISP or any other processor that might be able to handle the task most efficiently. You see, processors can be designed to perform certain tasks more efficiently, but a single design struggles to be great at everything. Your typical CPU may be good at serial processing, while a GPU can handle streams of parallel data and a DSP is better optimized for crunching numbers to high accuracy in real time.
With a wider range of options to choose from, the theory is that picking the best processor for any specific task will result in better performance and energy efficiency. The goal may sound familiar to big.LITTLE, but the implementation is quite different. HMP could be compatible with a HC system too, but Qualcomm is likely keeping its CPU setup fairly simple with the Snapdragon 820.
Qualcomm suggests that its Hexagon 680 DSP can be used for image processing while consuming less power than using the CPU or GPU, meaning that those components can under-clocked or switched off. Qualcomm isn’t the only one working on this technology. Huawei, with resources from ARM, has developed its own method to offload image processing to its Mali GPU, using OpenCL, which allows for coding adjustments to be made even after release.
Looking specifically at the Snapdragon 820, HC could allow for tasks to be shared between any of its Kryo CPU cores, its Adreno 530 GPU, Hexagon 680 DSP and the Spectra camera ISP. However, managing the power draw and performance of all of these different processor parts becomes a more complicated task. Qualcomm does have a neat trick up its sleeve though, its Symphony System Manager.
Qualcomm hasn’t given out the full details about its Symphony System Manager just yet, but the company has itself compared it to other CPU core management systems. We can surmise that this system will be managing dynamic processor clock frequencies and gating across all of the chip’s processing components, while also monitoring system power draw and heat output.
It will be interesting to see how Qualcomm’s Symphony System Manager and Kyro CPU stack up against big.LITTLE processors when it comes to power management.
API support is the key
However, all of this wonderful stuff doesn’t happen automatically. Something or someone has to decide which cores are most suitable and which are available to use, then manage the components appropriately. This is what makes HC very difficult to actually implement.
There are already a few HC APIs available for programmers to use to handle additional processing components, such as OpenCL and Renderscript. It’s almost certain that the Snapdragon 820’s HC tricks will remain dependant on manufacturer and developer implementations, unless the company has made some major engineering breakthroughs.
Qualcomm also has its own API, which taps into its CPU, Hexagon DSP and Adreno GPU components, there’s its MARE parallel computing SDK, and some specifics SDKs for tasks such as facial recognition. I would imagine that new builds are on the way to make use of specific Snapdragon 820 features, which are also probably tied into the Symphony System Manager.
Qualcomm will be providing driver and programming support to bring its touted benefits to consumers, which is a considerable investment. However, broad API support makes it more likely that third party developers will implement HC, which in turn should encourage wider hardware support from other companies.
“When a user is taking a picture, Symphony responds to the system demand making sure that the right components are powered up running at the needed frequency and only as long as needed. These components include CPU, Spectra ISP, Snapdragon Display Engine, GPU, GPS, and memory system.”
In summary, Qualcomm should be able to use HC to improve the energy efficiency and performance of certain tasks, and the Snapdragon 820 is an important step on the road towards wider adoption of Heterogeneous Compute.
The Snapdragon 820 is shaping up to be an important chip for Qualcomm, which may reseat the company at the top of the mobile SoC market. We will just have to wait until Q1 2016 to see if Qualcomm can fully realize its performance and power consumption gains.