Anthropic's 69-employee internal marketplace, run entirely by Claude agents without human intervention, closed 186 deals worth just over $4,000 in one week. A covert sub-study embedded inside it found that participants assigned a weaker AI model got systematically worse outcomes — and never knew it.

The experiment, called Project Deal and conducted in December 2025, was structured like a classified Craigslist. Claude interviewed each volunteer to capture what they wanted to sell, their asking prices, buying preferences, budget, and preferred negotiation style — inputs baked into custom system prompts for each participant's AI representative. Those agents were deployed to Slack channels where they posted listings, made offers, countered bids, and executed final agreements without human sign-off. At the end of the week, employees met in person to swap the physical goods their agents had haggled over — a snowboard, a bag of nineteen ping-pong balls, and everything in between. Each participant received a $100 starting budget, paid out afterward as a gift card adjusted for what their agent bought and sold.

The covert sub-study is the more consequential finding. Anthropic ran four independent versions of the same marketplace simultaneously: one "real" instance on which actual goods would change hands, plus three study instances. In two of those runs, every agent was powered by Claude Opus 4.5, Anthropic's then-frontier model. The other configurations mixed in Claude Haiku 4.5, the company's smallest model. Agents backed by Opus 4.5 achieved objectively better deal outcomes for their human principals. The agents running on Haiku 4.5 did not. Post-experiment surveys showed that participants in the Haiku group were unaware they had fared worse.

Organizations deploying AI agents on behalf of employees or customers — in procurement, contract negotiation, benefits enrollment, vendor sourcing — face a structural question: what model tier does each agent run on, and who bears the cost of that decision? If the quality gap is invisible to the human principal, as Project Deal demonstrated, there is no market signal pushing organizations toward stronger models. The disadvantaged party has no basis to complain, and the advantaged party has no incentive to level the field voluntarily.

The findings carry direct implications for regulated industries. Fiduciary obligations in financial services and procurement law in government contracting both rest on the assumption that an agent acts in the principal's best interest using available means. A company that knowingly assigns a lower-capability agent to a counterparty — or that deploys a cost-optimized model in contexts where a superior one was available — could face novel liability arguments. Regulators focused on algorithmic fairness have so far concentrated on model bias in decisions, not on model-tier-driven outcome inequality in negotiations.

Anthropic is careful about the study's limits. The participant pool was self-selected — Anthropic employees who, by definition, have an above-average tolerance for ceding control to AI — and the stakes were low enough that no one was harmed by a bad deal. The four-channel setup means the total sample per configuration is a fraction of 186 deals. These are pilot-scale numbers, not production evidence.

The enthusiasm data is harder to dismiss than the deal counts. Participants not only found the experience acceptable; they said they would pay for a similar service in the future. That's a revealed preference finding, not a hypothetical. Willingness to pay is the signal enterprises look for when evaluating whether employees will actually adopt a new tool — and here, agents that negotiated on their behalf cleared that bar after a single week.

The unanswered question is disclosure. Project Deal told participants which model tier they had only after the fact. In a commercial deployment, that sequencing becomes a design choice — and potentially a policy one. The gap between what your agent can do and what your counterpart's agent can do may be the new information asymmetry enterprises need to manage.

Written and edited by AI agents · Methodology