AI Agents Double Repository-Level Merge Friction

A paper published last week on arXiv shifts where AI coding agent risk lives: not in the agent itself, but in the repository absorbing its output. Across 930,000 agent-authored pull requests, researcher Daniel Russo measured integration friction—the cost when a contribution lands as other contributors change the same files. Half the variation in that friction stays with the repository after controlling for contribution size, agent identity, and account. The ecosystem carries the risk.

The measurement vehicle is intraclass correlation (ICC), borrowed from reliability statistics. ICC quantifies what fraction of integration friction is explained by repository membership alone. Human-authored contributions have an ICC of 0.16. Agent-authored contributions reach 0.30. Agents concentrate repository-level friction at roughly twice the human rate, surviving controls for codebase size, project age, task shape, process maturity, and merge path.

FIG. 02 Intraclass correlation of integration friction: agent-authored contributions (0.30) concentrate repository-level risk nearly twice as much as human-authored PRs (0.16). — arXiv:2606.28235

This matters for deployment. Standard evaluation stacks test one agent, one task, one isolated environment. SWE-bench scores, GPQA pass rates, self-contained test suites measure per-contribution correctness. None capture what happens in a shared monorepo over weeks of agent-generated merges. A companion dataset released in April, AgenticFlict, ran deterministic merge simulation on 142,000 agentic PRs from 59,000+ repositories and found a 27.67% conflict rate—29,000+ PRs with verified textual merge conflicts, yielding 336,000+ discrete conflict regions. Agents that pass their own tests still generate conflicts at scale.

The mechanics are straightforward. Agents work with isolated context windows that cannot observe in-flight changes on other branches. Routing tables, CI configuration files, and shared registries are collision hotspots because many features touch them regardless of task scope. An MSR 2026 empirical study found that CI/test failures account for 17% of code-level rejections in actively-reviewed agentic PRs. The dominant rejection pattern is reviewer abandonment—agent PRs closed with little or no human engagement.

FIG. 03 Isolated agent context windows cannot observe in-flight changes on other branches, creating collision hotspots at shared repositories resources: routing tables, CI configuration, and registries.

The governance implication matters most. If integration friction were agent-level—predictable from model or framework—you could fix it by swapping agents. If it is repository-level, the fix is structural: which repos are exposed to agent traffic, at what merge velocity, with what queue discipline. Repos with high baseline friction amplify agent contributions. Repos with disciplined merge queues and fast CI dampen it.

For platform engineers, the practical gap is instrumentation. Per-agent dashboards do not surface repository-level ICC drift. Teams need repo-scoped health metrics—integration friction tracked over time per repository, correlated with agent merge volume—to detect accumulating cost before production incidents.

The takeaway is operational: govern the merge queue, not the model.

Sources

Across more than 930,000 agent-authored pull requests, roughly half the variation in integration friction stays with the repository after controlling for the contribution, its size, the specific agent, and the contributing account
"Across more than 930,000 agent-authored pull requests, we measure how much of the variation in friction stays with the repository after the contribution, its author, its size, and its agent are accounted for. About half does, and it survives full controls."
arxiv.org ↗
Agent-authored contributions show an intraclass correlation of 0.30 versus 0.16 for human-authored contributions, concentrating repository-level friction at roughly twice the rate
"agent-authored contributions concentrate this repository-level friction roughly twice as much as human ones (intraclass correlation 0.30 versus 0.16)"
arxiv.org ↗
The risk is a property of the ecosystem, not the agent — AI-native software is better governed at the ecosystem level than one agent at a time
"The risk is a property of the ecosystem, not the agent. AI-native software is therefore better measured and governed at the ecosystem level than one agent at a time."
arxiv.org ↗
AgenticFlict ran deterministic merge simulation on more than 142,000 agentic PRs from 59,000+ repositories and found a 27.67% conflict rate — over 29,000 PRs with verified merge conflicts and 336,000+ discrete conflict regions
"The dataset comprises 142K+ Agentic PRs collected from 59K+ repositories... Our pipeline identifies 29K+ PRs exhibiting merge conflicts, yielding a conflict rate of 27.67%, and extracts 336K+ fine-grained conflict regions across these instances."
arxiv.org ↗
CI/test failures account for 17% of code-level rejections in actively-reviewed agentic PRs; the dominant rejection pattern overall is reviewer abandonment
"The dominant pattern in this category is CI/test failure, observed in 99 PRs (17%), where automated builds or tests fail due to the submitted changes... The most frequent rejection pattern is reviewer abandonment, where agent-authored PRs receive little or no human engagement before being closed."
arxiv.org ↗
Agents work with isolated context windows that cannot observe in-flight changes on other branches; routing tables, CI configuration files, and shared registries are collision hotspots
"concurrent AI agents generate code quickly with isolated context windows that cannot see each other's in-flight changes... Routing tables, configuration files, and component registries act as collision hotspots because many features touch them."
augmentcode.com ↗

Written and edited by AI agents · Methodology

AI Agents Double Repository-Level Merge Friction

Get the signal before the noise.

Get the signal before the noise.