NVIDIA Vera CPU launches for agentic AI; Databricks adds GPU support to free tier, NVIDIA Agent Toolkit integration
NVIDIA launched Vera, a new CPU purpose-built for agentic AI workloads, alongside a major expansion of its partnership with Databricks announced at the Data + AI Summit (June 15-18). Vera is an Arm-compatible CPU engineered for three use cases: agentic workloads (tool calling, agent orchestration, multi-step reasoning), reinforcement learning, and CPU-based data analytics. The chip delivers up to 3x faster SQL queries and 80% faster agentic performance than traditional CPUs, with high single-thread performance and memory bandwidth optimized for the latency-sensitive, bursty compute patterns agents require.
The core insight: while GPUs excel at model inference, agent harnesses, tool calls, and multi-step reasoning run on CPUs—and modern CPUs have become the bottleneck. GPUs generate responses in seconds, but agents must call tools, wait for results, manage context, and reason in a feedback loop. Vera's design eliminates latency overhead between agent steps and improves prediction for dynamic token allocation. Early deployments include Alibaba Cloud, CoreWeave, Meta, and Oracle Cloud Infrastructure (OCI), along with system makers Dell, HPE, Lenovo, and Supermicro.
Databricks is integrating Vera into its platform alongside new NVIDIA capabilities: Databricks AI Runtime now supports multinode training with NVIDIA Hopper GPUs and NVIDIA InfiniBand; GPU support is now available in Databricks Free Edition (developers, startups, students get GPU access); Model Serving gains Triton Inference Server optimization; and NVIDIA Agent Toolkit is natively integrated into Databricks Apps. The vision: an end-to-end NVIDIA stack where Hopper GPUs run model inference and Vera CPUs orchestrate agents, each silicon purpose-built for its workload.
For practitioners: this signals that agentic AI infrastructure is moving toward specialized silicon. Vera availability on major cloud platforms (AWS, Azure, GCP) is critical—verify before committing to agent workload placement. The GPU-free tier on Databricks lowers barrier for agent prototyping on governed data. Watch for: (1) Vera availability timelines in your target cloud, (2) pricing vs. standard CPUs at scale, (3) whether OCI and Alibaba Cloud availability translates to availability in U.S. regions. Teams building agents should evaluate whether Vera's latency profile justifies migration from existing CPU infrastructure, especially for tool-calling-heavy workflows (research, code generation, knowledge retrieval).