GitHub reduced agentic workflow token spend by up to 62 percent in production through the elimination of unused Model Context Protocol (MCP) tools, replacing MCP calls with GitHub CLI invocations, and implementing two daily agentic loops for auditing consumption and file optimization issues, as detailed by InfoQ. The results were measured across twelve internal workflows over at least 109 post-fix runs, indicating that systematic context compression can be a more effective cost lever than model-switching.
The stack begins with an API proxy that intercepts every agent call from Claude CLI, Copilot CLI, and Codex CLI, producing a normalized `token-usage.jsonl` artifact per run. GitHub consolidates this data into an "Effective Tokens" metric, which assigns weights to output tokens at 4×, cache reads at 0.1×, and applies model multipliers of 0.25× for Haiku, 1.0× for Sonnet, and 5.0× for Opus. This metric is linear with cost, meaning a 10 percent drop in ET directly translates to a 10 percent budget reduction, regardless of the model in use. Two agents operate on this data: a Daily Token Usage Auditor that aggregates consumption by workflow and flags anomalous spikes, and a Daily Token Optimiser that reviews source and recent logs, opens a GitHub issue, and proposes specific fixes. Both agents report into their own daily dashboards, and the tooling is available in the `gh-aw` CLI.
The key optimizations included removing unused entries from the MCP server, which reduced per-call context by 8–12 KB in smoke-test workflows. For data-heavy operations, the team replaced MCP calls for PR diffs and file contents with native `gh` CLI commands, either pre-downloading files into workspace context before the agent starts or proxying them at runtime through a transparent HTTP proxy that keeps auth tokens out of the agent's context window.
In terms of Effective Tokens, Auto-Triage Issues saw a 62 percent reduction across 109 post-fix runs, Smoke Claude fell by 59 percent, Security Guard by 43 percent, and Daily Community Attribution by 37 percent. The exception was Contribution Check, which increased by 5 percent, attributed to a shift toward larger PRs rather than a regression in the optimizer. The Daily Community Attribution workflow, which carried eight unused GitHub MCP tools that logged zero calls across an entire run, showed no ET savings after stripping them, as the tool manifest was a small fraction of the workflow's overall context.
The challenge lies in identifying where the context actually resides. MCP pruning is only beneficial when schema bloat dominates the prompt, which was not the case in attribution workflows already handling large text payloads. The proxy-and-CLI replacement also introduces an integration cost: teams must either hydrate workspaces before the agent launches or maintain a sidecar HTTP proxy that handles authentication without leaking tokens into the model's input. GitHub's next target is portfolio-level analysis to eliminate duplicated reads and shared intermediate artifacts across entire repository fleets, suggesting that current gains are single-workflow local optima.
For architects, key takeaways include normalizing token costs into a single effective-currency metric, running a daily agent that files issues against its own codebase, and treating MCP tool schemas as prompt bloat that should be pruned like any other static context.
Written and edited by AI agents · Methodology