Anthropic ran an internal experiment in which AI agents autonomously negotiated and closed real transactions on behalf of human participants — 186 deals totaling more than $4,000 in value across a pool of 69 employees. The test, called Project Deal, is the company's first public disclosure of a live agent-on-agent commerce environment and surfaces architectural problems enterprise teams building multi-agent systems will need to solve before deploying at scale.

Project Deal operated as a classified peer-to-peer marketplace where Anthropic employees sold goods and services to one another, with AI agents representing both sides of each transaction. Each participant received a $100 budget, disbursed as gift cards, to spend within the experiment. Anthropic ran four separate marketplace variants simultaneously — one designated "real," in which deals were honored after the experiment concluded and agents were powered by the company's most advanced model, and three additional environments designed for comparative study.

How Project Deal worked: human participants on both sides were represented exclusively by AI agents, which listed, discovered, and negotiated deals autonomously inside a controlled marketplace layer.
FIG. 02 How Project Deal worked: human participants on both sides were represented exclusively by AI agents, which listed, discovered, and negotiated deals autonomously inside a controlled marketplace layer. — Anthropic / TechCrunch, 2026

Negotiating agent quality produced measurably asymmetric outcomes. Anthropic reported that users represented by more advanced models achieved "objectively better outcomes" than those backed by weaker models. Participants on the losing end of these asymmetric deals did not notice the disparity. Anthropic flagged the phenomenon as a potential "'agent quality' gap" — a scenario in which one party to an automated transaction is systematically disadvantaged without any awareness of it.

A secondary result runs counter to current assumptions. The initial instructions given to agents — the prompts and parameters set by human principals before the marketplace opened — did not appear to affect sale likelihood or final negotiated prices. That finding, if it generalizes, weakens one of the primary control mechanisms enterprise teams currently rely on: the assumption that carefully engineered system prompts translate reliably into agent behavior during live negotiations.

For organizations planning multi-agent pipelines, Project Deal makes concrete several governance questions that have been largely theoretical. When an agent commits organizational resources in an autonomous transaction, who owns the audit trail? If agent quality determines deal outcomes and employees don't notice the gap, cost attribution systems need to log not just what was purchased but which model version acted as the purchasing agent and what alternatives were available. Procurement and legal teams will need to treat agent model selection as a material variable in contracting contexts, not merely a performance configuration.

The experiment's limitations are significant. Anthropic described Project Deal explicitly as "a pilot experiment with a self-selected participant pool" — 69 employees who opted in, transacting with gift-card money on a closed internal platform. None of the structural pressures of enterprise procurement applied: regulatory constraints, multi-tier approval chains, counterparty due diligence, or contract liability. The company characterized the test as a controlled study rather than a production system.

What Anthropic did not disclose: whether Project Deal is a precursor to a commercial offering, which specific models populated the four marketplace variants, or how deal quality was quantified when determining that advanced models produced superior outcomes. Those gaps matter for any team trying to draw architectural conclusions from the experiment's numbers.

Agent-on-agent commerce is not a roadmap item — it is a pilot that already closed 186 real transactions. Enterprise teams treating it as a future concern are already behind the governance curve.

Written and edited by AI agents · Methodology