New CPUs and the move to 64-bit
Mobile CPU technology has come a long way from what once powered the first smartphones of yesteryear. These days, users demand a diversified range of tasks from their devices. There’s the enthusiast crowd who are clamoring for ever higher levels of performance, while existing smartphone performance levels are already suitable for the vast majority of day to day tasks. To the latter group of consumers, improvements in energy efficiency and battery life are more important. In the pursuit of both improved performance and energy efficiency, ARM has already developed its next generation processor designs for use in mobile devices. We’re already anticipating the first Android smartphones that make use of ARM’s most recent 64-bit ARMv8 architecture, and Cortex A53 and A57 CPU cores. For a recap, here’s what we can expect from ARM’s new processor cores, which are expected to arrive in devices towards the end of this year. ARM’s new processor cores arrive with the company’s updated ARMv8-A architecture, which retains compatibility with the prevalent 32-bit ARMv7 architecture. ARMv8 comes with two execution states, which control the widths of registers, instruction sets, etc. Firstly, there’s the AArch32 (A32) and T32 instruction sets, which retain compatibility with the existing 32-bit ARMv7-A architecture and the 16/32-bit Thumb2 state that is frequently used in ARMv7 to decrease memory footprint. Incidentally, the Thumb2 instruction set is the same one used in the Cortex-M processors, often found in wearable products, embedded micro controllers, and IoT devices. However, due to some changes made to A64 and A32, software written for AArch32 will not be compatible with ARMv7-A implementations. The second state is AArch64 (A64), which includes the much talked about addition of 64-bit execution and instructions. A64 is entirely separate from AArch32, and makes use of a slightly different format and new decoding tables. A64 also introduces a new exception and privilege model for developers.
Simply put, the introduction of 64-bit lends itself to improved efficiency in areas like multi-tasking, stress testing, and clustering, as well as the option to access more than 4GB of RAM, if so desired. 64-bit width will also allow the processor to deal with complex pieces of information in a more efficient manner, which will provide performance improvements for specific pieces of software, as well as some general performance improvements. 64-bit computing will arrive in the Android space with the launch of new hardware and the recently announced Android L operating system, which is scheduled to appear this fall. For developers, the move over to 64-bit will help with the creation of more intensive mathematical based software and game development.
The ARMv8 architecture also adds in a selection of new extensions, including Neon – ARM’s SIMD 128-bit engine, Cryptographic extensions, as well as several others. To assist in the move over to a new 64-bit architecture, ARM has already updated its own compiler to support ARMv8-A, and has also lends assistance to GCC with support for its AArch64 set. ARM has also posted a set of patches that implement core Linux kernel support for AArch64 too, although we will have to wait and see what happens with Android. ARM has also been updating its development tools for developers looking to make use of its new cores, including its DS-5 Development Studio. Ok enough of the architecture business, what about the physical processors? What does all this mean for consumers? Starting with the high-end Cortex-A57 core, ARM is targeting the more demanding smartphone consumer, someone who is interested in editing their media content, the productive multitasker, and those who are after a richer, smoother gaming experience. As well as the new ARMv8 architecture, ARM is upping the peak performance of its processors once again with the Cortex-A57 processor. ARM expects that its new chip could offer anywhere from 20 to 55 percent more performance over its existing high-end Cortex-A15 processor designed on the same processing node. The fastest performance gains will be seen when running the core in 64-bit mode, but 32-bit operation should still see performance gains in the region of 20 to 30 percent. Once its chip partners start moving towards smaller production nodes, however, ARM expects that performance could double its existing processor line-up. The other CPU core in the range is the Cortex-A53 processor, which is designed to be a more energy efficient design to meet the demands of the more conservative smartphone user. In other words, someone who prefers extra battery life, rather than lightning fast photo editing.
The Cortex-A53 makes use of the same ARMv8-A architecture, but is targeted as a successor to the energy efficient Cortex-A7. ARM suggests that the A53 will consume less energy while offering performance slightly above its existing A9 processor, which was the basis for somewhat older flagships like the quad-core Samsung Galaxy S3. ARM is offering plenty of choice to its SoC manufacturing partners, the company’s latest processors can still take advantage of big.LITTLE setups, and can scale all the way up to 16 core configurations for server designs. Qualcomm has already announced that its Snapdragon 410, 610 and 615 SoCs will use ARM’s Cortex A53, while its high-end 808 and 810 will use big.LITTLE combinations of Cortex A53s and A57s for higher peak performance. Not only that, but ARM has also designed its processors with the ability to share a coherent memory cache with ARM’s Mali graphics processing units, which opens up the world of GPU compute on mobile devices.
Compare the performance of the latest HTC device relative to the first Android device, (we’re talking) 40 times the CPU performance. To carry to on this growth over the future is a very tough task indeed.
Mali converges on Midgar
This brings us to one of the next big focus areas for ARM, graphics processing. With more pixels being pushed into our displays and users demanding higher quality video and gaming content on their mobile devices, GPUs are set to become increasingly important parts of our mobile devices. We’re certainly going to need a lot more graphics power to support 2K and 4K displays and content, let alone any potential of playing games at those resolutions. ARM’s hot on this trend too though, having announced its latest range of Mali GPUs at the end of last year. The upcoming Mali-T-760, ARM’s future flagship graphics component, boasts a 400% increase in energy efficiency when compared to the Mali-T604, as well as increased performance over the previous generation and scalability up to 16 cores. The mid-range T-720, meanwhile, offers a 150% efficiency boost over the popular Mali-400. However, these GPUs won’t be appearing in smartphones until 2014. With its latest generation GPUs, ARM will unify its high performance and mid-range designs onto the single Midgard architecture. Midgard is designed around several emerging concepts for mobile, including GPU compute, a native 64-bit architecture, reduced memory latency, power efficiency, and job management, as well as the usual boost in peak performance. ARM’s Mali-T700 range of GPUs will also support a range of graphics API, including OpenGL ES 3.1/3.0 /2.0 / 1.1, Microsoft Windows Direct3D 11.1, full profile OpenCL 1.1, and RenderScript/ FilterScript.
GPUs aren’t just being used for game rendering these days. GPU compute is becoming more common for certain tasks, and can yield up to four times the performance compared with using the CPU for the same task. Examples include image processing, facials and speech recognition, cryptography, physics engines, and augmented reality. ARM’s Mali-T600 series of GPUs was the first to bring GPU compute to ARM’s mobile platform through support for OpenCL, and ARM seem to firmly believe that even closer union between all the components of a completed SoC will play an increasingly important role in smartphone development in the future.
ARM & Coherent Systems (Heterogeneous Computing)
While it is easy to get carried away talking about CPU and GPU clock speeds and improvements, there’s a lot more that goes into a completed system chip. Although ARM doesn’t manufacture any SoCs itself, the company is dedicated to helping manufacturers build more efficient SoCs.
Heterogeneous computing seems to be the end game for ARM’s next generation SoC components. By heterogeneous computing we mean a complete system that can use multiple processors at once to perform specific tasks more efficiently. The video below demonstrates this better than I can explain. ARM, along with other big names like AMD and Qualcomm, are leading members of the Heterogeneous System Architecture Foundation. Current SoC designs already have all the components needed for specialized tasks, we are already using CPUs, DSPs, and GPUs for their own benefits, but even greater cooperation between processors can lead to even greater efficiencies. It’s becoming apparent that developers can’t keep throwing higher clock speeds at consumer demands for ever higher performance. We are currently severely limited by the amount of battery power our smart devices can contain. Instead, hardware developers will have to find new ways to squeeze out maximum power while making their batteries last as long as possible.
Over the next 2-6 months we are getting to that transition point where we have enough compute performance for day to day tasks. (In the future) it’s not so much the CPU performance that will increase, but overall improvements in the SoC itself.
big.LITTLE technology is probably the most demonstrable example of a type of heterogeneous system, where energy efficient-cores, such as the Cortex-A7 or A53, are used for less demanding tasks and can help keep power consumption down, while switching on the heavy-lifting Cortex-A15 or A57 processors when the extra processing grunt is needed. Similarly, GPU compute aims to take the strain away from the CPU if the GPU can perform the task more efficiently. The Midgard Job Manager, which manages GPU loading and power balancing, is also designed to keeping the GPU running as efficiently as possible.
Memory is also another major issue with portable devices, though it is often overlooked. Although consumers might be demanding 3GB or 4GB RAM smartphones and larger internal memories, this all has a huge impact on a devices battery life. Reading and writing from memory takes up a fair bit of battery power, therefore hardware manufacturers don’t want to include more than they need. The key to efficient heterogeneous computing is a high bandwidth shared memory, which meets the low power consumption requirements of mobile devices. ARM’s shared cache between ARMv8-A CPUs and Midgard GPUs is one potential solution to this problem. Another is ARM’s investments into frame buffer compression with Adaptable Scalable Texture Compression (ASTC), a lossless compression allowing developers to use a lot less memory when moving graphics data from one location to another. Although you’re not going to hear about it on any smartphone spec sheets, ARM’s CoreLink Interconnect technology is likely to play a crucial role in optimizing completed SoCs. It provides full cache coherency between two clusters of multi-core CPUs, including support for big.LITTLE technology, its Mali T-600 GPU range, and a range of I/O devices, including the all-important modem and WiFi components. The key concept, is to maximize the efficiency of data movement and storage, delivering the performance needed at the lowest possible power.
Powering a new range of product ideas
The important thing to remember is that it is not ARM but its silicon partners like ST, FreeScale, Qualcomm and MediaTek that are going to be implementing these ideas, or perhaps even their own visions. Part of the move towards more efficient designs will be down to the foundries, as well as processor designers. Production capabilities are also set to improve over this year and the next, with foundries finally pushing out 20nm ARM mobile SoCs come 2015. This should yield strong performance and energy consumption improvements over the existing processor line-up. At the high end of the market, we’re likely to see more of ARM’s own technologies make their way into our devices, including more big.LITTLE configurations and eventually heterogeneous solutions that can make better use of the ever improving GPU power found in our smart devices. Although squeezing out more performance is becoming a tougher task these days, energy and performance efficiencies are still there to be made.
While Qualcomm is using ARM’s designs in the pursuit of superior specifications, MediaTek and other mass market Chinese SoC developers are pushing prices way down at the low end of the market. We have already seen our first $60 smartphones, and FireFox is pushing the price boundaries at the $25 mark. The industry is also seeing a large number of product developers pick up ARM’s lower power Cortex-M range of processors to build the next generation of smartwatches and wearable products, from the Samsung Gear 2 to the Fitbit fitness tracker. We have already covered ARM and wearables rather extensively, but it’s important to note that the company isn’t just focused on the smartphone market these days.
There’s plenty of development opportunities to be had, both for ARM and developers, in the growing wearables and Internet of Things markets. ARM’s huge range of processors are also finding uses in an array of new smart applications, from the automotive industry to web connected televisions and set top boxes. There is a good reason why ARM has become such a leader in the mobile market. The company’s combination of cutting edge technologies and a sensible business model that allows manufacturers to meet consumer demands has ensured a huge range of practical processor implementations over the past decade. Thankfully, ARM and its partners don’t look to be slowing down either, as we appear to be in for a new wave of innovation over the next couple of years.