Gemma 4 12B launches encoder-free multimodal architecture for edge AI inference
Google DeepMind introduced Gemma 4 12B, a new unified multimodal model that eliminates separate vision encoders and delivers state-of-the-art performance on vision and language tasks with a 12B parameter budget. The architecture enables efficient on-device and edge deployments for vision-language reasoning.
The release targets enterprise use cases where parameter efficiency and latency matter more than frontier performance, reflecting a trend toward smaller, specialized models that can run on constrained hardware and still deliver competitive capability.