Affiliate links on Android Authority may earn us a commission. Learn more.

Google's latest on-device AI model is custom-made for your laptop

It bridges the gap between the E4B and 26B MoE models.

•

Jun 4, 2026 — 4:47 AM ET

•

Google

Add us as preferred source

TL;DR

Google has released the Gemma 4 12B model aimed at consumer laptops with at least 16GB RAM.
Gemma 4 12B is the company’s first mid-sized model to support native audio input.
It utilizes an encoder-free architecture to offer multimodal performance without the latency introduced by encoders. The new model performs close to the Gemma 4 26B MoE model in benchmarks.

Back in April, Google released its mobile-friendly Gemma E2B and E4B models, bringing on-device multimodal AI to Android and iOS devices. It also released the high-end 26B Mixture of Experts (MoE) and 31B Dense models for higher-end devices with dedicated AI GPUs. Now, the company is launching another Gemma model that sits nicely between the four.

Google today announced the Gemma 4 12B model aimed at bringing on-device AI capabilities to laptops. It offers multimodal features and is the first mid-sized model from Google to support native audio input.

Don’t want to miss the best from Android Authority?

Set us as a favorite source in Google Discover to never miss our latest exclusive reports, expert analysis, and much more.
You can also set us as a preferred source in Google Search by clicking the button below.

The company claims that its 12B model delivers performance similar to the 26B MoE model in benchmarks, while being small enough to run on normal consumer laptops with 16GB of RAM.

To achieve this, the company came up with unique solutions for supporting multimodal inputs without increasing latency and memory usage. Gemma 4 12B uses an encoder-free architecture to avoid the memory costs associated with encoders that are typically used in most multimodal AI models.

Google

For vision, it’s using a lightweight module that utilizes “single matrix multiplication, positional embedding, and normalizations,” allowing image data to be passed to the LLM without requiring an encoder in the middle.

It also completely does away with encoding for audio inputs. Google was able to project the raw audio signal directly into the same dimensional space as text tokens.

What that means is that Gemma 4 12B can handle multimodal inputs, just like the other Gemma models, but without the added overhead of encoding such inputs. This should result in much better performance on laptops without the need for dedicated AI hardware.

Interested users can try the new model right now in LM Studio, Ollama, Google AI Edge Gallery, and more. If you’re interested in running it locally on your laptop, the weights are available to download from Hugging Face and Kaggle.

News

AI Google

Thank you for being part of our community. Read our Comment Policy before posting.