Google DeepMind's DiffusionGemma offers 4x faster local text generation
Google DeepMind released DiffusionGemma, a new generative model that enables 4x faster text generation via a diffusion-based decoding approach compared to standard autoregressive models. The technique trades off latency for quality trade-offs, positioning diffusion as a potential avenue for low-latency inference in edge and consumer AI applications.