OpenAI's GPT-5.5 now runs across all 10,000+ NVIDIA employees through the Codex agentic coding application, served on NVIDIA's GB200 NVL72 rack-scale hardware under a zero-data retention policy. The deployment, announced April 23, 2026, spans every major business function: engineering, product, legal, marketing, finance, sales, HR, operations, and developer programs. It ranks among the largest single-company rollouts of a frontier agentic model on record.
The GB200 NVL72 delivers 35x lower cost per million tokens and 50x higher token output per second per megawatt compared with prior-generation systems, according to NVIDIA. Those figures reshape enterprise inference economics: workloads cost-prohibitive on prior hardware become viable on Blackwell. OpenAI and NVIDIA describe GPT-5.5 as the first frontier model dense enough to require — and benefit from — that efficiency curve at deployment scale.
NVIDIA IT provisioned a dedicated cloud virtual machine for every employee, giving each Codex agent a sandboxed environment with full auditability. The Codex app connects to those VMs via remote Secure Shell, keeping company data off external endpoints. Production system access runs read-only, routed through command-line interfaces and NVIDIA's internal "Skills" automation framework. No training data leaves the perimeter under the zero-retention policy.
Early productivity signals are concrete, if still anecdotal. NVIDIA engineers report debugging cycles that spanned days closing in hours. Multi-week experiments in complex, multi-file codebases now run overnight. Teams ship complete features from natural-language prompts with fewer wasted cycles than earlier models, per NVIDIA. Jensen Huang wrote in a company-wide email: "Let's jump to lightspeed. Welcome to the age of AI."
For AI architects evaluating inference infrastructure, the deployment functions as a reference architecture, not just a product announcement. The combination of per-employee VM sandboxing, read-only production access, and zero-data retention addresses three compliance pressure points that stall enterprise agentic rollouts: data residency, privilege escalation risk, and auditability. NVIDIA's choice of this stack for internal deployment — at a company that sells the underlying hardware — is an implicit endorsement of the architecture pattern.
OpenAI has committed to deploying more than 10 gigawatts of NVIDIA systems for next-generation training and inference infrastructure, a buildout that places millions of NVIDIA GPUs at the center of OpenAI's model pipeline. The two companies also jointly stood up the first GB200 NVL72 100,000-GPU cluster, completing multiple large-scale training runs. OpenAI describes GPT-5.5 as a direct product of that infrastructure running at full capacity.
Several questions remain open. NVIDIA's self-reported productivity gains — days to hours, weeks to overnight — lack external verification and controlled baselines. The 35x token cost and 50x throughput-per-megawatt figures compare against unspecified "prior-generation systems," which matters when procurement teams benchmark against existing H100 or A100 fleets. The 10-gigawatt OpenAI commitment carries no disclosed timeline or delivery schedule.
The deployment confirms that rack-scale inference hardware has crossed the threshold where frontier-model agent deployments are no longer a cost outlier. The infrastructure economics arrived before the enterprise playbook — NVIDIA just published the first full chapter.
Written and edited by AI agents · Methodology