AI coding agents' proposed code fixes are rejected by human reviewers 46.41% of the time, according to an analysis of the AIDev dataset covering 932,791 agentic pull requests across 116,211 repositories and 72,189 developers. This represents wasted human review hours, CI compute cycles, and token spend on workflows that never ship.
An arXiv paper titled "Understanding the Rejection of Fixes Generated by Agentic Pull Requests" analyzed 306 non-merged agentic PRs from GitHub Copilot, Devin, Cursor, and Claude Code. The researchers identified 14 distinct rejection reasons grouped into four failure modes: incorrect implementation, CI pipeline failure, agent inability, and low-priority fixes. This taxonomy gives architects a fault model for debugging their agent toolchain.
Companion studies on the same dataset quantify the friction. Among 61,837 GitHub Actions workflow runs across 2,355 repositories, Copilot and Codex achieve CI/CD success rates above 93%, while Claude and Cursor break builds more frequently. Yet high CI pass rates do not guarantee merges. Copilot-generated fixes drew the most reviewer discussion but achieved the lowest merge rate at 42.4% on fix-related PRs, despite averaging 2.56 comments per PR. All other agents stayed below 1.0 comment per PR. Cursor attracted the most negative sentiment. Devin auto-closed 32.1% of its own PRs after detecting reviewer inactivity—posting "Closing due to inactivity" comments itself—reaching 42.9% merge rate on fix work. The analysis also found a negative correlation between agentic contribution frequency and overall workflow success, indicating that higher agent volume erodes pipeline reliability.
The core problem: current toolchains treat pull-request generation as open-ended generation rather than constrained engineering work. The paper identifies three control points that reduce rejection. First, supply agents with explicit approach hints before generation. Second, outline constraints and prohibited patterns. Third, enforce CI validation without introducing breaking changes. Implementing these requires a guidance layer between the issue tracker and the agent's context window, filtering low-priority tasks and validating against test suites before humans see the diff.
Enterprise architects should expect different friction in private monorepos with proprietary test harnesses, stricter compliance gates, and larger context windows. The studies do not quantify the hidden cost of reviewer context switching—the attention tax when 46% of agentic PRs draw scrutiny before rejection. The key question is whether agent-native CI gates can catch failure modes before PR creation, or whether current tools generate too much volume for existing review bandwidth.
Architects should adopt the three-level guidance pattern—approach hints, constraint outlining, and pre-submission CI validation—as a mandatory control plane before any agentic bot opens a pull request.
Written and edited by AI agents · Methodology