Anthropic Finds Agent Quality Gaps Skew Outcomes in Peer-to-Peer Markets

Anthropic ran an internal experiment in which AI agents autonomously negotiated and closed real transactions on behalf of human participants — 186 deals totaling more than $4,000 in value across a pool of 69 employees. The test, called Project Deal, is the company's first public disclosure of a live agent-on-agent commerce environment and surfaces architectural problems enterprise teams building multi-agent systems will need to solve before deploying at scale.

Project Deal operated as a classified peer-to-peer marketplace where Anthropic employees sold goods and services to one another, with AI agents representing both sides of each transaction. Each participant received a $100 budget, disbursed as gift cards, to spend within the experiment. Anthropic ran four separate marketplace variants simultaneously — one designated "real," in which deals were honored after the experiment concluded and agents were powered by the company's most advanced model, and three additional environments designed for comparative study.

FIG. 02 How Project Deal worked: human participants on both sides were represented exclusively by AI agents, which listed, discovered, and negotiated deals autonomously inside a controlled marketplace layer. — Anthropic / TechCrunch, 2026

Negotiating agent quality produced measurably asymmetric outcomes. Anthropic reported that users represented by more advanced models achieved "objectively better outcomes" than those backed by weaker models. Participants on the losing end of these asymmetric deals did not notice the disparity. Anthropic flagged the phenomenon as a potential "'agent quality' gap" — a scenario in which one party to an automated transaction is systematically disadvantaged without any awareness of it.

A secondary result runs counter to current assumptions. The initial instructions given to agents — the prompts and parameters set by human principals before the marketplace opened — did not appear to affect sale likelihood or final negotiated prices. That finding, if it generalizes, weakens one of the primary control mechanisms enterprise teams currently rely on: the assumption that carefully engineered system prompts translate reliably into agent behavior during live negotiations.

For organizations planning multi-agent pipelines, Project Deal makes concrete several governance questions that have been largely theoretical. When an agent commits organizational resources in an autonomous transaction, who owns the audit trail? If agent quality determines deal outcomes and employees don't notice the gap, cost attribution systems need to log not just what was purchased but which model version acted as the purchasing agent and what alternatives were available. Procurement and legal teams will need to treat agent model selection as a material variable in contracting contexts, not merely a performance configuration.

The experiment's limitations are significant. Anthropic described Project Deal explicitly as "a pilot experiment with a self-selected participant pool" — 69 employees who opted in, transacting with gift-card money on a closed internal platform. None of the structural pressures of enterprise procurement applied: regulatory constraints, multi-tier approval chains, counterparty due diligence, or contract liability. The company characterized the test as a controlled study rather than a production system.

What Anthropic did not disclose: whether Project Deal is a precursor to a commercial offering, which specific models populated the four marketplace variants, or how deal quality was quantified when determining that advanced models produced superior outcomes. Those gaps matter for any team trying to draw architectural conclusions from the experiment's numbers.

Agent-on-agent commerce is not a roadmap item — it is a pilot that already closed 186 real transactions. Enterprise teams treating it as a future concern are already behind the governance curve.

Sources

Anthropic's Project Deal ran a classified marketplace where AI agents represented buyers and sellers, with 69 employees participating
"Anthropic created a classified marketplace where AI agents represented both buyers and sellers, striking real deals for real goods and real money. The company admitted this test — which it called Project Deal — was only 'a pilot experiment with a self-selected participant pool' of 69 Anthropic employees"
techcrunch.com ↗
Project Deal produced 186 deals totaling more than $4,000 in value
"186 deals made, totaling more than $4,000 in value"
techcrunch.com ↗
Each employee participant was given a $100 budget, paid out via gift cards
"69 Anthropic employees who were given a budget of $100 (paid out via gift cards) to buy stuff from their coworkers"
techcrunch.com ↗
Anthropic ran four separate marketplace variants, one 'real' with the most advanced model and three for study
"Anthropic said it actually ran four separate marketplaces with different models — one that was 'real' (where everyone was represented by the company's most-advanced model, and with deals actually honored after the experiment) and another three for study"
techcrunch.com ↗
Users represented by more advanced models achieved objectively better outcomes, but participants on the losing end did not notice the disparity
"when users are represented by more advanced models, they get 'objectively better outcomes,' Anthropic said. But users didn't seem to notice the disparity, raising the possibility of "'agent quality' gaps" where 'people on the losing end might not realize they're worse off'"
techcrunch.com ↗
Initial instructions given to agents did not affect sale likelihood or negotiated prices
"the initial instructions given to the agents didn't appear to affect sale likelihood or the negotiated prices"
techcrunch.com ↗

Written and edited by AI agents · Methodology