GitHub Cuts Token Costs 62% via MCP Pruning and CLI Swaps

GitHub reduced agentic workflow token spend by up to 62 percent in production through the elimination of unused Model Context Protocol (MCP) tools, replacing MCP calls with GitHub CLI invocations, and implementing two daily agentic loops for auditing consumption and file optimization issues, as detailed by InfoQ. The results were measured across twelve internal workflows over at least 109 post-fix runs, indicating that systematic context compression can be a more effective cost lever than model-switching.

The stack begins with an API proxy that intercepts every agent call from Claude CLI, Copilot CLI, and Codex CLI, producing a normalized `token-usage.jsonl` artifact per run. GitHub consolidates this data into an "Effective Tokens" metric, which assigns weights to output tokens at 4×, cache reads at 0.1×, and applies model multipliers of 0.25× for Haiku, 1.0× for Sonnet, and 5.0× for Opus. This metric is linear with cost, meaning a 10 percent drop in ET directly translates to a 10 percent budget reduction, regardless of the model in use. Two agents operate on this data: a Daily Token Usage Auditor that aggregates consumption by workflow and flags anomalous spikes, and a Daily Token Optimiser that reviews source and recent logs, opens a GitHub issue, and proposes specific fixes. Both agents report into their own daily dashboards, and the tooling is available in the `gh-aw` CLI.

The key optimizations included removing unused entries from the MCP server, which reduced per-call context by 8–12 KB in smoke-test workflows. For data-heavy operations, the team replaced MCP calls for PR diffs and file contents with native `gh` CLI commands, either pre-downloading files into workspace context before the agent starts or proxying them at runtime through a transparent HTTP proxy that keeps auth tokens out of the agent's context window.

In terms of Effective Tokens, Auto-Triage Issues saw a 62 percent reduction across 109 post-fix runs, Smoke Claude fell by 59 percent, Security Guard by 43 percent, and Daily Community Attribution by 37 percent. The exception was Contribution Check, which increased by 5 percent, attributed to a shift toward larger PRs rather than a regression in the optimizer. The Daily Community Attribution workflow, which carried eight unused GitHub MCP tools that logged zero calls across an entire run, showed no ET savings after stripping them, as the tool manifest was a small fraction of the workflow's overall context.

FIG. 02 Effective Token savings by agentic workflow after GitHub's optimizations (109–146 post-fix runs per workflow). — GitHub production data, 2026

The challenge lies in identifying where the context actually resides. MCP pruning is only beneficial when schema bloat dominates the prompt, which was not the case in attribution workflows already handling large text payloads. The proxy-and-CLI replacement also introduces an integration cost: teams must either hydrate workspaces before the agent launches or maintain a sidecar HTTP proxy that handles authentication without leaking tokens into the model's input. GitHub's next target is portfolio-level analysis to eliminate duplicated reads and shared intermediate artifacts across entire repository fleets, suggesting that current gains are single-workflow local optima.

For architects, key takeaways include normalizing token costs into a single effective-currency metric, running a daily agent that files issues against its own codebase, and treating MCP tool schemas as prompt bloat that should be pruned like any other static context.

Sources

GitHub recorded token-spend reductions of up to 62% across production agentic workflows after pruning unused MCP tools, replacing MCP calls with gh CLI, and adding daily audit/optimise agents
"GitHub has published results from work to cut token usage in the agentic workflows it runs in its own repositories. The company recorded reductions of up to 62% after pruning unused Model Context Protocol (MCP) tools, replacing MCP calls with GitHub CLI invocations, and adding daily audit and optimisation agents."
infoq.com ↗
Effective Tokens metric weights output tokens 4×, cache reads 0.1×, with model multipliers Haiku 0.25×, Sonnet 1.0×, Opus 5.0×; a 10% drop in ET = 10% cost reduction
"the team uses an Effective Tokens (ET) metric that weights output tokens by 4× and cache reads by 0.1×, then applies a model multiplier (Haiku at 0.25×, Sonnet at 1.0×, Opus at 5.0×). A 10% drop in ET maps to a 10% cost reduction regardless of the model in use."
infoq.com ↗
A 40-tool GitHub MCP server adds 10–15 KB of schema per turn; removing unused tools cuts per-call context by 8–12 KB
"a GitHub MCP server with 40 tools can add 10 to 15 KB of schema per turn. Removing unused entries cuts per-call context by 8 to 12 KB in GitHub's smoke-test workflows."
infoq.com ↗
Auto-Triage Issues: 62% ET reduction over 109 post-fix runs; Smoke Claude: 59%; Security Guard: 43%; Daily Community Attribution: 37%; Contribution Check: +5% (workload shift)
"Auto-Triage Issues shows a sustained 62% ET reduction over 109 post-fix runs, Security Guard 43%, and Smoke Claude 59%. Daily Community Attribution improved 37%. One workflow, Contribution Check, recorded a 5% ET increase that GitHub attributes to a workload shift toward larger pull requests rather than a regression."
infoq.com ↗
Daily Community Attribution had 8 unused MCP tools with zero calls yet removing them yielded no ET savings — tool manifests were a small fraction of overall context
"Daily Community Attribution carried eight unused GitHub MCP tools and made zero calls to them across an entire run, yet removing them did not reduce ET. "Tool manifests were a small fraction of this workflow's overall context," GitHub wrote."
infoq.com ↗
The Auditor and Optimiser agents ship today in the gh-aw CLI; GitHub's next step is portfolio-level analysis targeting duplicated reads and shared artefacts
"The Auditor and Optimiser ship in the gh-aw CLI today. "The cheapest LLM call is the one you don't make," GitHub wrote, framing the next step as portfolio-level analysis that targets duplicated reads and shared intermediate artefacts across the fleet of workflows in a repository."
infoq.com ↗

Written and edited by AI agents · Methodology

GitHub Cuts Token Costs 62% via MCP Pruning and CLI Swaps

Get the signal before the noise.

Get the signal before the noise.