Tegra K1: an in-depth look
This year’s CES has seen Nvidia unveil the latest member of its Tegra SoC family. Formally known as Project Logan and Tegra 5, Nvidia’s new Tegra K1 will unite the company’s struggling mobile processor business with its far more successful desktop graphics division.
It’s exciting stuff when Nvidia claims that it can bring “next-gen console gaming graphics” to mobile devices with a power budget of just two watts. But there are some caveats to know. Let’s take a closer look.
CUDA cores galore
The big talking point about the Tegra K1 is the seemingly huge number of graphics cores squeezed into the new GPU, dwarfing the 72 cores offered by the old Tegra 4. GPU pedigree is even more important than core count, and the Tegra K1 does not disappoint. Unlike the Tegra 4, the K1’s architecture comes straight from Nvidia’s high-end Kepler architecture, the same technology that powers the mighty GTX 680, Titan, and 780Ti desktop graphics cards.
Although Kepler makes the transition to mobile pretty much untouched, the comparison to these top-tier GPUs is a little unfair. The Tegra K1 features only a single Nvidia SMX, containing a decent 192 CUDA cores, 8 Texture Units, and 4 ROPs, but that’s significantly cut down from Nvidia’s top of the line cards. The GTX 680, for comparison, contains a far more impressive 1536 CUDA Cores, 128 Texture Units, and 32 ROPs.
Nvidia hasn’t mentioned the core clock speeds or bandwidth for the K1 yet, but the company did list a peak shader performance figure of 365 GFLOPS during the CES presentation. It’s difficult to gauge the chip’s exact performance at this stage, but a ballpark comparison with the low end OEM GT630 (192 cores and 336 GFLOPs) might not be too far off.
But enough of the desktop comparisons, how did they squeeze so much into a chip that draws just 2 watts?
There are noticeable efficiencies to be made by moving from separate chips to a single SoC, and significantly cutting down the number of cores and ROPs would put the K1 well below Nvidia’s range of Kepler laptop chips, which already pull less than 20 watts. The larger 128KB L2 cache will also reduce the energy expended on off-chip memory access.
Particular attention has also been paid to low level optimizations to efficiently manage power consumption. Power and Clock Gating identifies GPU cores that are idle and lowers clock frequencies or completely gates these blocks to reduce power consumption on the fly. Support for ASTC texture compression should also help reduce the amount of work required for both UI and 3D rendering.
The K1 is a massive step up in terms of both graphical horsepower and energy efficiency, but not all of the improvements lie in the hardware department.
Perhaps the most significant new feature, in terms of offering a next-gen gaming experience on mobile devices, is support for the same graphics APIs as the K1’s bigger brothers. You may remember that the Tegra 4 lacked support for common OpenGL, CUDA, and DirectX 11 APIs, but was instead optimized for certain games, depending on the developer. The K1 departs from this less than ideal arrangement, offering full support for OpenGL 4.4, Microsoft’s DirectX 11.2, OpenGL ES 3.0, and Nvidia’s own CUDA 6.
With new APIs comes new graphical improvements, such as support for FXAA and TXAA anti-aliasing to help eliminate jagged edges, realistic physics simulations courtesy of Nvidia’s Physx, and Compute Shaders with a full range of high-end effects, such as Ambient Occlusion. The Tegra K1 will also be the first mobile GPU on the market to support hardware based tessellation, although Qualcomm has its own Adreno 420 in the works which will do much the same thing.
The biggest news here is that developers working on PC and console games could now scale their creations down to work on the K1. Interestingly, the Tegra K1 seems to have a fair bit more grunt than both the Playstation 3 and the Xbox 360, so scaled down multi-platform ports aren’t outside the realms of possibility. Nvidia has already shown off ports of Unreal Engine 4, Serious Sam 3, and Trine 2 running flawlessly on the Tegra K1.
Two CPU designs
The K1 SoC will be released in two CPU flavours with fully compatible pin designs, meaning that manufactures can easily swap between the two. The first is a familiar quad-core plus one Cortex A15 layout, almost exactly like the architecture found in the Tegra 4. The second will implement Nvidia’s own dual core ARM based CPU.
Just like the Tegra 4, the K1 CPU comes with four fully clocked A15 cores designed to do the heavy lifting, with another low power A15 “companion core” to manage the little tasks. Each core can also be gated for reduced power consumption, introducing additional cores only when needed. But there is a subtle difference between the K1 and Tegra 4, the K1 CPU is based on the new third revision of ARM’s Cortex A15.
The main improvement with R3 is increased power efficiency, due to improved clock gating. Additional power consumption has also been saved courtesy of the move over to 28nm HPM manufacturing, which Nvidia has chosen to put towards a ~20% clock speed boost, 2.3GHz up from 1.9GHz.
The Tegra K1’s A15 processor will be a slightly faster than the Tegra 4’s, but it’s not providing the same leap as the Kepler graphics chip. However, the tried and tested quad-core design means that Nvidia can start manufacturing the K1 quickly, with OEMs set to receive shipments this quarter.
It’s all change for Nvidia’s second CPU design, codenamed “Denver”, which drops the companion core and opts for a more traditional dual-core configuration. The cores will be based on the new ARMv8 architecture, which includes 64-bit as well as 32-bit support. Denver will clock in with a peak of 2.5GHz and has a larger 128KB L1 instruction cache and 64KB L1 data cache. Unfortunately little more is known about Denver at this point, but it’s interesting to see Nvidia drop the popular quad-core design in favour of just two cores.
Media features aplenty
Nvidia is also packing the Tegra K1 with plenty of extra features. The chip’s Image Signal Processor, which is in charge of various imaging tasks, has received an upgrade, in fact Nvidia has stuck two of them on the SoC.
Each ISP is capable of processing 600 Megapixels per second with a 14 bit input, up from the 400Mp/s at 10 bits available with the Tegra 4. There are general improvements to noise reduction, higher quality downscaling, and support for a 100 Megapixel image sensor. The inclusion of two ISPs also opens the door for dual camera operations that we’ve seen on other devices. Just like the Tegra 4, 4K video content is also supported via an HDMI output, although it’s doubtful that the GPU will handle 4K 3D gaming.
The Tegra K1 finally looks like Nvidia is playing to its strengths, and for that reason alone the K1 is a pretty exciting prospect. This could well be the chip that we see in the next Nvidia Shield.
The biggest obstacle still remaining for Nvidia is finding a large enough consumer base. Appetite for console quality gaming on mobile devices just isn’t there yet. A few years down the line, initiatives like Steam OS may help push gaming down the Linux road and a strong gaming chip on Android may be a much bigger deal then.
Technically the K1 is strong, but in a “casual gaming” orientated market perhaps it’s not as groundbreaking as it first seems. We’ll have to wait and see what developers make of Kepler on Android, and if Denver presents anything new.