Claude in Microsoft Foundry now runs on NVIDIA GB300 Blackwell Ultra in Azure
Anthropic's Claude models in Microsoft Foundry, hosted on Azure and running on NVIDIA's GB300 Blackwell Ultra GPUs, are now generally available. Microsoft has deployed the world's first large-scale production cluster with over 4,600 Blackwell Ultra GPUs connected via NVIDIA Quantum-X800 InfiniBand, each rack integrating 72 Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs into a cohesive unit optimized for reasoning models, agentic AI, and multimodal generative AI.
The cluster delivers exceptional memory bandwidth: 37 terabytes of unified fast memory per rack (20 TB HBM3E GPU + 17 TB LPDDR5X CPU), 130 TB/s NVLink bandwidth within each rack, and up to 1.44 exaflops of FP4 Tensor Core performance per VM. Cross-rack, 800 Gb/s of interconnect per GPU via Quantum-X800 InfiniBand enables non-blocking scale to tens of thousands of GPUs. Microsoft says this infrastructure reduces model training from months to weeks and supports training of models exceeding 100 trillion parameters.
In recent MLPerf Inference v5.1 benchmarks, the GB300 NVL72 delivered up to 5x higher throughput per GPU on DeepSeek-R1 (671B parameters) versus NVIDIA Hopper, with leadership performance on Llama 3.1 405B and other newer benchmarks. The architecture is purpose-built for test-time scaling and agentic reasoning, where longer thought chains and tool calls drive higher compute variance.
For architects deploying Anthropic models at scale, this marks a shift in the inference stack: Blackwell Ultra's redesigned memory and networking are optimized for reasoning workloads with high context and long-form outputs. Enterprises on Azure now get Claude backed by the densest NVIDIA fabric available, making it viable to run trillion-parameter reasoning models in production without relying on batching tricks. This is the infrastructure inflection for cost-per-token competitive reasoning.
Sources
- Primary source
- azure.microsoft.com
“Microsoft delivers the first at-scale production cluster with more than 4,600 NVIDIA GB300 NVL72, featuring NVIDIA Blackwell Ultra GPUs connected through the next-generation NVIDIA InfiniBand network.”
- tomshardware.com
“In recent MLPerf Inference v5.1 benchmarks, NVIDIA GB300 NVL72 systems delivered record-setting performance using NVFP4. Results included up to 5x higher throughput per GPU on the 671-billion-parameter DeepSeek-R1 reasoning model compared with the NVIDIA Hopper architecture.”
- blogs.nvidia.com
“Microsoft Azure today announced the new NDv6 GB300 VM series, delivering the industry's first supercomputing-scale production cluster of NVIDIA GB300 NVL72 systems, purpose-built for OpenAI's most demanding AI inference workloads.”