Google DeepMind has released Gemma 4, the fourth generation of its open model family. The release spans four model sizes — Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense — and ships under an Apache 2.0 license. According to the announcement, the 31B model ranks third among open models on the Arena AI text leaderboard and the 26B MoE ranks sixth. The post states that Gemma 4 “outcompetes models 20x its size” at those positions.

Since the first Gemma generation, developers have downloaded Gemma over 400 million times, producing more than 100,000 variants. The post describes Gemma 4 as built from the same research and technology as Gemini 3, positioning it as the open counterpart to the proprietary Gemini family.

Model sizes and what each is optimized for

The four sizes serve distinct deployment contexts. The 26B MoE prioritizes latency: it activates only 3.8 billion of its total parameters during inference, delivering fast tokens-per-second while keeping resource use in check. The 31B Dense maximizes raw output quality and provides a strong foundation for fine-tuning. Both larger models fit in unquantized bfloat16 form on a single 80GB NVIDIA H100 GPU; quantized versions run on consumer GPUs for local IDEs, coding assistants, and agentic workflows.

The E2B and E4B models are engineered for edge and mobile deployment. The post describes them as built “from the ground up for maximum compute and memory efficiency,” activating an effective 2 or 4 billion parameter footprint during inference to preserve RAM and battery life. According to the announcement, these models run completely offline with near-zero latency on hardware including phones, Raspberry Pi, and NVIDIA Jetson Orin Nano. DeepMind worked with the Google Pixel team, Qualcomm Technologies, and MediaTek on these models. Android developers can prototype agentic flows using E2B and E4B in the AICore Developer Preview, with forward-compatibility with Gemini Nano 4.

Capabilities across the family

The post lists a set of capabilities shared across all four sizes. Native function-calling, structured JSON output, and native system instructions are present throughout, enabling autonomous agents to interact with external tools and APIs. All models natively process video and images at variable resolutions, with the announcement highlighting OCR and chart understanding as specific strengths. The E2B and E4B models add native audio input for speech recognition and understanding.

Context window lengths vary by tier. Edge models support 128K context; larger models support up to 256K, which the post says allows passing “repositories or long documents in a single prompt.” All models were natively trained on more than 140 languages.

For code generation, the post describes Gemma 4 as capable of “high-quality offline code, turning your workstation into a local-first AI code assistant” — a framing aimed squarely at developers evaluating local inference setups.

Apache 2.0 and what it means for deployment

The license change is the announcement’s most commercially significant detail. Gemma 4 ships under Apache 2.0, which the post characterizes as providing “complete developer flexibility and digital sovereignty” — meaning unrestricted use, modification, and distribution without royalty obligations. Previous Gemma models used a custom license with use-case restrictions; Apache 2.0 removes those barriers.

The post frames this as a response to developer feedback: “You gave us feedback, and we listened.” For enterprises, sovereign governments, or research institutions that require full control over models and infrastructure, Apache 2.0 removes a key procurement hurdle.

Fine-tuning results and ecosystem support

The post cites two fine-tuning examples. INSAIT used Gemma to create BgGPT, described as a pioneering Bulgarian-first language model. Yale University worked with DeepMind on Cell2Sentence-Scale, focused on discovering new pathways for cancer therapy. Both are presented as demonstrations of what targeted fine-tuning can produce on top of the base models.

The ecosystem support list is extensive. Day-one integrations include Hugging Face (Transformers, TRL, Transformers.js, Candle), LiteRT-LM, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM and NeMo, LM Studio, Unsloth, SGLang, Keras, and others. Model weights are available on Hugging Face, Kaggle, and Ollama. Google Cloud deployment paths include Vertex AI, Cloud Run, GKE, and TPU-accelerated serving.

The combination of competitive benchmark rankings, a permissive license, broad hardware support from mobile to H100, and deep ecosystem integrations makes Gemma 4 a materially different offering from its predecessors. The Apache 2.0 decision in particular removes the main friction point for organizations that needed open weights but couldn’t accept usage restrictions.