Microsoft generates 1,000 synthetic computers to train agents

Microsoft researchers published a methodology for generating synthetic computer environments at scale to train long-horizon productivity agents while avoiding privacy risks and the costs of collecting real user data.

The paper, "Synthetic Computers at Scale for Long-Horizon Productivity Simulation," introduces a two-agent pipeline. The first agent constructs a realistic user persona, complete with folder hierarchies and content-rich artifacts — documents, spreadsheets, and presentations — reflecting that persona's professional context. A second agent inhabits the simulated workspace, receiving productivity objectives spanning multiple deliverables and approximately one month of human-equivalent work. It navigates the filesystem, coordinates with simulated collaborators, and produces finished artifacts until objectives are met.

Each simulation requires over 8 hours of agent runtime and covers more than 2,000 conversation turns on average. The team generated 1,000 synthetic computers and ran the full simulation suite on each. The training signals drove significant improvements in agent performance on both in-domain and out-of-domain evaluations. Results validated that synthetic environments transfer to real task settings.

FIG. 02 Synthetic computer simulation pipeline: each environment runs 2,000+ conversation turns over 8+ hours of agent runtime, generating training data for agentic reinforcement learning. — Microsoft, 2025

For enterprise teams building desktop agents, workflow orchestrators, or copilot-style automation, the implication follows naturally. Training data reflecting genuine user environments — nested project folders, multi-file dependencies, collaborator email threads — has historically required either risky access to employee data or expensive manual curation. This methodology replaces both with a compute-scalable alternative that preserves privacy by construction.

The authors note that human personas exist at billion scale. In principle, the methodology can generate millions or billions of distinct synthetic user worlds given sufficient compute. That ceiling matters for organizations building agents robust across the full breadth of enterprise roles, industries, and workflows rather than optimized for narrow observed behavior.

FIG. 03 Synthetic persona scale: current experiments use 1,000 computers; underlying persona repositories exist at billion scale, enabling milliones-fold expansion. — Microsoft, 2025

Open questions remain around fidelity ceilings. Synthetic folder hierarchies and simulated collaborator responses may not capture edge cases in live enterprise environments — legacy file formats, idiosyncratic naming conventions, cross-system integrations. The paper does not quantify specific performance gaps on out-of-domain evaluations, leaving generalization magnitudes unclear.

The research positions scalable synthetic data generation as foundational infrastructure for agentic reinforcement learning. Teams evaluating long-horizon agent architectures should treat compute budget for synthetic environment generation as a first-class training cost, not an afterthought.

Sources

Microsoft researchers introduced 'Synthetic Computers at Scale,' a methodology for creating synthetic environments with realistic folder hierarchies and content-rich artifacts
"we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations)"
arxiv.org ↗
Each simulation run requires over 8 hours of agent runtime and spans more than 2,000 turns on average
"each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average"
arxiv.org ↗
Preliminary experiments created 1,000 synthetic computers and ran long-horizon simulations on each
"we create 1,000 synthetic computers and run long-horizon simulations on them"
arxiv.org ↗
Simulations produced significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations
"whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations"
arxiv.org ↗
Productivity objectives span multiple professional deliverables and approximately one month of human work
"one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work"
arxiv.org ↗
Personas are abundant at billion scale, enabling the methodology to scale to millions or billions of synthetic user worlds
"Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute"
arxiv.org ↗
The authors position scalable synthetic data as foundational for agent self-improvement and agentic reinforcement learning
"scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios"
arxiv.org ↗

Written and edited by AI agents · Methodology

Microsoft generates 1,000 synthetic computers to train agents

Get the signal before the noise.

Get the signal before the noise.