NVIDIA Accelerates Google DeepMind's DiffusionGemma for Local AI Inference
NVIDIA has optimized Google DeepMind's DiffusionGemma model for RTX GPUs via its AI Garage, enabling 4x faster text generation on consumer and enterprise hardware. The move targets the growing demand for on-device, low-latency inference without cloud round-trips.
Joint NVIDIA–DeepMind work on DiffusionGemma acceleration fits the broader push toward edge and confidential AI compute, reducing API dependency for latency-sensitive applications.