Tegra K1: an in-depth look

by: Robert TriggsJanuary 9, 2014

Tegra K1

This year’s CES has seen Nvidia unveil the latest member of its Tegra SoC family. Formally known as Project Logan and Tegra 5, Nvidia’s new Tegra K1 will unite the company’s struggling mobile processor business with its far more successful desktop graphics division.

It’s exciting stuff when Nvidia claims that it can bring “next-gen console gaming graphics” to mobile devices with a power budget of just two watts. But there are some caveats to know. Let’s take a closer look.

CUDA cores galore

The big talking point about the Tegra K1 is the seemingly huge number of graphics cores squeezed into the new GPU, dwarfing the 72 cores offered by the old Tegra 4. GPU pedigree is even more important than core count, and the Tegra K1 does not disappoint. Unlike the Tegra 4, the K1’s architecture comes straight from Nvidia’s high-end Kepler architecture, the same technology that powers the mighty GTX 680, Titan, and 780Ti desktop graphics cards.

Although Kepler makes the transition to mobile pretty much untouched, the comparison to these top-tier GPUs is a little unfair. The Tegra K1 features only a single Nvidia SMX, containing a decent 192 CUDA cores, 8 Texture Units, and 4 ROPs, but that’s significantly cut down from Nvidia’s top of the line cards. The GTX 680, for comparison, contains a far more impressive 1536 CUDA Cores, 128 Texture Units, and 32 ROPs.

Tegra K1 Graphics CoreImage source: anandtech

Nvidia hasn’t mentioned the core clock speeds or bandwidth for the K1 yet, but the company did list a peak shader performance figure of 365 GFLOPS during the CES presentation. It’s difficult to gauge the chip’s exact performance at this stage, but a ballpark comparison with the low end OEM GT630 (192 cores and 336 GFLOPs) might not be too far off.

But enough of the desktop comparisons, how did they squeeze so much into a chip that draws just 2 watts?

There are noticeable efficiencies to be made by moving from separate chips to a single SoC, and significantly cutting down the number of cores and ROPs would put the K1 well below Nvidia’s range of Kepler laptop chips, which already pull less than 20 watts. The larger 128KB L2 cache will also reduce the energy expended on off-chip memory access.

Particular attention has also been paid to low level optimizations to efficiently manage power consumption. Power and Clock Gating identifies GPU cores that are idle and lowers clock frequencies or completely gates these blocks to reduce power consumption on the fly. Support for ASTC texture compression should also help reduce the amount of work required for both UI and 3D rendering.

The K1 is a massive step up in terms of both graphical horsepower and energy efficiency, but not all of the improvements lie in the hardware department.

Next-Gen APIs

Perhaps the most significant new feature, in terms of offering a next-gen gaming experience on mobile devices, is support for the same graphics APIs as the K1’s bigger brothers. You may remember that the Tegra 4 lacked support for common OpenGL, CUDA, and DirectX 11 APIs, but was instead optimized for certain games, depending on the developer. The K1 departs from this less than ideal arrangement, offering full support for OpenGL 4.4, Microsoft’s DirectX 11.2, OpenGL ES 3.0, and Nvidia’s own CUDA 6.

With new APIs comes new graphical improvements, such as support for FXAA and TXAA anti-aliasing to help eliminate jagged edges, realistic physics simulations courtesy of Nvidia’s Physx, and Compute Shaders with a full range of high-end effects, such as Ambient Occlusion. The Tegra K1 will also be the first mobile GPU on the market to support hardware based tessellation, although Qualcomm has its own Adreno 420 in the works which will do much the same thing.

The biggest news here is that developers working on PC and console games could now scale their creations down to work on the K1. Interestingly, the Tegra K1 seems to have a fair bit more grunt than both the Playstation 3 and the Xbox 360, so scaled down multi-platform ports aren’t outside the realms of possibility. Nvidia has already shown off ports of Unreal Engine 4, Serious Sam 3, and Trine 2 running flawlessly on the Tegra K1.

Two CPU designs

The K1 SoC will be released in two CPU flavours with fully compatible pin designs, meaning that manufactures can easily swap between the two. The first is a familiar quad-core plus one Cortex A15 layout, almost exactly like the architecture found in the Tegra 4. The second will implement Nvidia’s own dual core ARM based CPU.

Tegra K1 CPU Versions

Just like the Tegra 4, the K1 CPU comes with four fully clocked A15 cores designed to do the heavy lifting, with another low power A15 “companion core” to manage the little tasks. Each core can also be gated for reduced power consumption, introducing additional cores only when needed. But there is a subtle difference between the K1 and Tegra 4, the K1 CPU is based on the new third revision of ARM’s Cortex A15.

The main improvement with R3 is increased power efficiency, due to improved clock gating. Additional power consumption has also been saved courtesy of the move over to 28nm HPM manufacturing, which Nvidia has chosen to put towards a ~20% clock speed boost, 2.3GHz up from 1.9GHz.

Tegra K1 vs Tegra 4

The Tegra K1’s A15 processor will be a slightly faster than the Tegra 4’s, but it’s not providing the same leap as the Kepler graphics chip. However, the tried and tested quad-core design means that Nvidia can start manufacturing the K1 quickly, with OEMs set to receive shipments this quarter.

It’s all change for Nvidia’s second CPU design, codenamed “Denver”, which drops the companion core and opts for a more traditional dual-core configuration. The cores will be based on the new ARMv8 architecture, which includes 64-bit as well as 32-bit support. Denver will clock in with a peak of 2.5GHz and has a larger 128KB L1 instruction cache and 64KB L1 data cache. Unfortunately little more is known about Denver at this point, but it’s interesting to see Nvidia drop the popular quad-core design in favour of just two cores.

Media features aplenty

Nvidia is also packing the Tegra K1 with plenty of extra features. The chip’s Image Signal Processor, which is in charge of various imaging tasks, has received an upgrade, in fact Nvidia has stuck two of them on the SoC.

Each ISP is capable of processing 600 Megapixels per second with a 14 bit input, up from the 400Mp/s at 10 bits available with the Tegra 4. There are general improvements to noise reduction, higher quality downscaling, and support for a 100 Megapixel image sensor. The inclusion of two ISPs also opens the door for dual camera operations that we’ve seen on other devices. Just like the Tegra 4, 4K video content is also supported via an HDMI output, although it’s doubtful that the GPU will handle 4K 3D gaming.

Final thoughts

The Tegra K1 finally looks like Nvidia is playing to its strengths, and for that reason alone the K1 is a pretty exciting prospect. This could well be the chip that we see in the next Nvidia Shield.

The biggest obstacle still remaining for Nvidia is finding a large enough consumer base. Appetite for console quality gaming on mobile devices just isn’t there yet. A few years down the line, initiatives like Steam OS may help push gaming down the Linux road and a strong gaming chip on Android may be a much bigger deal then.

Technically the K1 is strong, but in a “casual gaming” orientated market perhaps it’s not as groundbreaking as it first seems. We’ll have to wait and see what developers make of Kepler on Android, and if Denver presents anything new.

  • Balraj

    what about heat? that’s biggest draw back every phone suffers from ?
    irrespective of platform…..

    • Shark Bait

      I dont think its aimed at smartphones. No integrated LTE will scare OEM’s away, same as tegra 3 and same as tegra 4…..

      • Balraj

        Ya maybe just gaming console like another user mentioned
        Still I think certain tablet Will use it
        So heat Will be a big factor

        • IDBash

          Ah but you are forgetting that heat comes from power consumption. By increasing efficiency you will decrease heat-loss. Based on the figures above, it should actually run cooler than current Tegra technologies.

          That said, heat is a big issue in all environments, not just mobile.

    • Roberto Tomás

      I think it is clear looking at the design that the 64 bit version is targetted solidly at consoles, desktops, AIOs, televisions, and other things that can take a heat sink. There is no small core to offset low power tasks. But the other one could find its way into designs. One of the problems with the A15 before was power gating was poor, leading to high thermals. Some of the A15 phones currently out peek over 5w with GPU and CPU running together. Apparently, this will top off at about 3.5W, well within current design thresholds.

      • renz

        maybe the design of denver itself making low power core are no longer needed.

  • Shark Bait

    Perhaps Ouya should give it another go with this!

    • apianist16

      Don’t know how they’ll be able to afford this chip. I would guess that Ouya 2 will have Tegra 4.

      • renz

        i think it is possible for OUYA2 to get tegra 4. Huawei android micro console TRON have almost the same spec as MAD Catz MOJO but the pricing is expected to be 120 although it is only for china market.

  • David

    The Dual Core looks much more pomising than a super duper hyper 4+1. High Power DualCore and decent gfx is maybe perfect for battery life.

  • John Jackson

    I would have love to see the 64 bit chip in quad core instead of dual core!!! That’s just ass backwards.

    • filaos

      Core count has NOTHING to do with future or past.
      This dual-core will be much more powerful than the old quad-A15. Denver IS the future.

      • John Jackson

        Maybe so. Quad or Octa just sounds a lot better!!!

        • filaos

          Apple uses their third dual core design in the current iphone. They have no need to add mores cores as long as they produce the most powerful and less power-consuming processor. “Quad” or “Octa” only sound better.

          • John Jackson

            Like how you put it!

    • IDBash

      Simply put, the power is not in the CPU cores, it is in the CUDA cores aka the GPU. Ask yourself this question, how many major tasks or applications are you going to to at one time on a mobile device? One or 2 is all most people are capable of especially when you are working on a touchscreen. By having more than 2 cores, you are wasting power that could be better spent in CUDA cores. These cores take over all the simple number crunching and then let the CPU cores assemble it at the end causing system speed to skyrocket.

      I will let you dig beyond that basic explanation. :)

      • John Jackson

        I see you.but you did leave me with a lil to think about. Thanks

  • Jayfeather787

    More interested in Snapdragon 805. This looks awesome for gaming, but the Snapdragon seems to be better for actual processing.

  • MasterMuffin

    “100 Megapixel image sensor” It’s coming, Lumia 2020. You heard it here first.

    Just can’t wait for people to start porting stuff like Blender for Android, put that power in use!

    • filas

      You can start with Dolphin emulator for Android… A nice way to burn your phone xD

      • MasterMuffin

        PPSSPP and My Boy! are good enough for me :)

  • Roberto Tomás

    what they did with the GPU is really awesome. It is like the PowerVR performance, but with DirectX 11 and probably half the power. But I am confused, aren’t we going to offload GPU tasks to the cloud soon? Why keep pressing so much the GPU?

    • apianist16

      The problems with using the cloud for GPU tasks are mainly latency and bandwidth Even if you have a gigabit connection, you are not going to be able to offload everything (RAM operates at gigabytes/second) and the latency really kills it, especially for twitch-reflex games like FPS and some RPGs.

      • Daniel

        Not to mention network dependency. Which not only precludes usage in certain situations – and customers don’t appreciate being limited like that at all – but also has its own power consumption issues. You can optimise and perfect the power usage of your chips, but there are very real physical restraints on saving power for network connectivity. On my own Android phone, having WiFi just enabled (not actively using it) kills battery life by 50%.

      • Roberto Tomás

        Latency isn’t really a problem, nvidia published results that showed that latency from the video card to the screen was higher than latency over the network, effectively making rendering across even pretty huge distances something that can be done without penalty. bandwidth isn’t a part of that. Bandwidth really only becomes a problem in 4k video, 1080p is only 360mbps maximum, without compression: about 0.036mbps with typical compression nowadays.

  • rmcrys

    This is very nice BUT doesn’t make much sense unless they are waiting a Windows 9 ARM soon. In the mobile market, it is much more important integration vs. power, and Qualcomm chips have all inside, und they are one of the most powerful chips. Nvidia is trying to be the most powerful BUT fails on integration and power consumption. The chip could be a GPU beast, but manufacturers and customers search the best power/integration/consumption/price, and nvidia just offers power. No good IMHO. Moreover games companies aren’t building for all chips, but for he most common and sold, that faces Qualcomm/Samsung, not Nvidia. They are totally dumb thinking selling SoCs is the same as selling graphics cards.

    • Guru Tim

      How exactly do Nvidia fail on integration? It’s a SoC which means it’s all one module. They even went as far as making both the quad core, and dual core chips use the same pin connections lol. What are you on about?

    • renz

      AFAIK about the power consumption it is not entirely nvidia fault. take tegra 4 for example. is it based on ARM A15 that are known to be power hog. even ARM acknowledge this hence the newer revision of A15 (r3p3) are focusing to reduce power consumption. if nvidia could use ARM core with much better power efficiency they already used them. since they have none they have to take whatever ARM has to offer though i can’t say anything about denver since it is nvidia first attempt making their own custom ARM core. same thing happen to exynos 5. the early exynos 5 5410 only clocked around 1.6ghz. only the later version (5420) were clocked at 1.8Ghz. the only thing nvidia really lacking was integrated LTE.

  • MrMagoo

    From the first time I came on this site and saw someone else say it, I’ve been wanting a moment where it was justified for me!

  • bhendrajana

    pocket workstation…?