Agent JIT compilation cuts latency 10.4× over Browser-Use

Stanford researchers have demonstrated a 10.4× latency speedup and 28-percentage-point accuracy gain over Browser-Use in computer-use agent tasks by replacing the standard fetch-screenshot-execute loop with a compiled code plan. They call this technique agent just-in-time (JIT) compilation.

Computer-use agents (CUAs) like Browser-Use and OpenAI's CUA operate by calling an LLM at each step: screenshot, reasoning, action, repeat. Every iteration incurs inference latency and creates a hallucination point. The Stanford approach compiles a task description into executable code at plan time, calling the LLM only where necessary while invoking tools and parallelizing work without LLM intermediaries at every step.

The system has three components. JIT-Planner generates multiple candidate code plans in parallel, validates each via a control-flow graph (CFG) against tool specifications, and selects the minimum-cost candidate. Rather than primitive browser actions, the planner composes higher-level reusable tools — functions like list_restaurants or add_to_cart — and uses the CFG to statically verify preconditions and postconditions before execution. The spread between best-latency and worst-latency candidate plans is 5.3×, indicating that plan selection alone moves the needle. JIT-Scheduler then explores parallelization strategies via Monte Carlo estimation from learned latency distributions, asking whether tasks are faster run serially, in parallel, or with speculative hedging. An invariant-enforcing tool protocol ensures each tool declares state pre- and postconditions, enabling compositional verification at compile time rather than runtime error discovery.

FIG. 02 JIT-Planner generates and validates code plans; JIT-Scheduler selects parallelization strategy. — Stanford research, arxiv 2605.21470

Benchmark results across five unnamed web applications: JIT-Planner achieves 10.4× speedup and +28% accuracy versus Browser-Use. JIT-Scheduler versus OpenAI CUA achieves 2.4× speedup and +9% accuracy. The accuracy gains reflect that compile-time invariant checking catches tool-selection errors the standard agent loop only surfaces after wasted inference.

FIG. 03 JIT-Planner achieves 10.4× speedup and 28-point accuracy gain over Browser-Use. — Benchmark across 5 web applications, arxiv 2605.21470

The 10.4× speedup targets Browser-Use, which incurs an LLM call at most browser events. OpenAI CUA already includes internal optimizations, so JIT-Scheduler's marginal gain is 2.4× — still commercially significant at scale. The research was published May 20 on arXiv (2605.21470) and submitted to ICML.

Key limitations: no wall-clock latency numbers, token counts, cost-per-task, hardware specs, or production deployment data. Compilation itself requires inference to generate and validate candidate plans; the paper does not quantify how that upfront cost amortizes across different task frequencies. The invariant protocol requires tool authors to specify contracts, adding integration burden. Dynamic web environments can invalidate compiled plans mid-execution, and fallback or recompilation paths are not described.

The 5.3× plan-selection spread is the core pattern: any multi-step agent loop today can generate a handful of candidate plans and pick the minimum-cost one before execution without adopting the full JIT architecture.

Sources

JIT-Planner achieves 10.4× speedup and +28% accuracy over Browser-Use across 5 web applications
"JIT-Planner achieves 10.4× speedup and +28% accuracy over Browser-Use, while JIT-Scheduler achieves 2.4× speedup and +9% accuracy over OpenAI CUA"
arxiv.org ↗
JIT-Scheduler achieves 2.4× speedup and +9% accuracy over OpenAI CUA
"JIT-Scheduler achieves 2.4× speedup and +9% accuracy over OpenAI CUA"
arxiv.org ↗
The spread between the best-latency and worst-latency candidate plan is 5.3×
"we find that the difference between the best-latency and worst-latency code plan candidate is 5.3× (Section 5)"
arxiv.org ↗
The system compiles natural-language task descriptions into executable code at plan-synthesis time, using cached reusable tools rather than primitive actions like click and type
"This code is built from cached, reusable tools (e.g., list_restaurants, add_to_cart) rather than primitive actions (e.g., click, type), so the LM need not be called at every step"
arxiv.org ↗
JIT-Planner generates multiple code plans, validates each against tool specifications using a CFG, and selects the minimum-cost candidate
"Cost-optimizing planner: Plans are code, enabling parallel candidate generation as well as static checking and cost estimation over a control-flow graph (CFG)"
arxiv.org ↗
JIT-Scheduler uses Monte Carlo cost estimation from learned latency distributions to select parallelization strategies
"Cost-aware scheduler: Parallelization strategy selection via Monte Carlo cost estimation from prior learned latency distributions"
arxiv.org ↗
The invariant-enforcing tool protocol specifies precondition and postcondition state requirements, enabling compositional verification at compile time
"Invariant-enforcing tool protocol: Tools specify precondition and postcondition state invariants (Section 3.1), enabling compositional verification at compilation time"
arxiv.org ↗
Current CUA implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use
"Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use"
arxiv.org ↗

Written and edited by AI agents · Methodology

Agent JIT compilation cuts latency 10.4× over Browser-Use

Get the signal before the noise.

Get the signal before the noise.