Introduction
Google has made significant strides in the development of its Gemini AI models, culminating in the announcement of the Gemma 4 open AI models. These models are designed to provide developers with greater freedom and flexibility, particularly in local usage scenarios. Notably, Google has transitioned from its previous licensing model to the more permissive Apache 2.0 license, addressing developer concerns regarding restrictions.
Key Highlights
- Launch of Gemma 4 open AI models in four sizes.
- Transition to Apache 2.0 license for improved developer freedom.
- Optimised for local hardware, including consumer GPUs.
- Enhanced capabilities for mobile devices with E2B and E4B models.
- Support for native function calling and structured JSON output.
What’s New in Gemma 4
The Gemma 4 models come in four distinct sizes, optimised for various hardware configurations. The two larger variants, 26B Mixture of Experts and 31B Dense, are designed to run on high-performance local machines, specifically targeting the Nvidia H100 GPU. These models can operate in unquantized bfloat16 format, which, while requiring expensive hardware, allows for efficient local processing.
For developers looking to run models on more accessible hardware, quantised versions of these models can fit on consumer-grade GPUs. Google has focused on reducing latency, enabling the 26B Mixture of Experts model to activate only a fraction of its parameters during inference, significantly enhancing its tokens-per-second performance.
Mobile Optimisation
The other two models, Effective 2B (E2B) and Effective 4B (E4B), are specifically tailored for mobile devices. These models are designed to maintain low memory usage and battery consumption while offering near-zero latency. Collaborating with Qualcomm and MediaTek, Google has optimised these models for devices such as smartphones, Raspberry Pi, and Jetson Nano.
Performance and Capabilities
Google asserts that the new Gemma 4 models outperform their predecessors, with the 31B Dense variant expected to rank third on the Arena list of top open AI models. Despite their smaller size compared to competitors, these models offer enhanced reasoning, mathematical capabilities, and instruction-following abilities.
In response to the evolving landscape of AI, Gemma 4 supports native function calling, structured JSON output, and is optimised for code generation. This positions it as a viable option for developers seeking to create high-quality code in offline environments. Additionally, Gemma 4 improves visual input processing, enhancing tasks such as OCR and chart understanding.
Licensing Changes
The shift to the Apache 2.0 license marks a significant change for developers. Previous versions of Google’s open models were governed by a restrictive custom license that many found cumbersome. The new Apache license offers a more flexible framework, allowing developers to utilise Gemma models without the fear of unilateral changes to licensing terms.
Future Developments
The introduction of the E2B and E4B models also signals Google’s commitment to advancing smartphone AI capabilities. The upcoming Gemini Nano 4 will be based on these new models, further enhancing local AI functionalities on devices like Google Pixel phones.
Conclusion
With the launch of Gemma 4, Google is not only enhancing the capabilities of its AI models but also fostering a more developer-friendly environment through the adoption of the Apache 2.0 license. As the AI landscape continues to evolve, these advancements position Google as a key player in the local AI processing domain.