A new research framework from Kishan Athrey, Ramin Pishehvar, Brian Riordan, and Mahesh Viswanathan automates multi-agent system composition—plan creation, agent selection, and execution graph assembly—collapsing three manual engineering steps into a single pipeline.
The framework, described in "From Intent to Execution: Composing Agentic Workflows with Agent Recommendation," contains five modules: an LLM-derived planner that decomposes user intent into discrete tasks; a dynamic call graph that models execution dependencies; an orchestrator that maps agents to tasks; and an agent recommender that sources candidates from local and global registries. The recommender—a two-stage information retrieval system pairing a fast vector retriever with an LLM-based re-ranker—surfaces the most suitable agents for each task.
A supervising critique agent re-evaluates both agent and tool recommendations against the overall execution plan. Including this critique step improves recall on agent selection, framing review-and-revision as essential, not optional, in end-to-end multi-agent assembly.
End-to-end benchmarks covering planning quality, agent selection accuracy, and task completion show the framework outperforms prior approaches on recall rate and demonstrates greater robustness and scalability. Ablation experiments across embedder choice, re-ranker model, and agent description enrichment strategies provide practitioners a decision surface for tuning the retrieval stack to their own registries.
Enterprise AI architects gain immediate architectural clarity: if agent selection and workflow wiring can be automated from natural-language intent, the bottleneck on agentic application delivery shifts from bespoke agent-graph engineering to registry quality and task specification clarity. Organizations building internal agent catalogs now have a concrete retrieval-and-ranking design pattern—one that scales to both local and global agent pools without hand-coded routing logic.
Performance at enterprise registry scale remains untested. The paper's experiments are academic in scope; behavior against registries of thousands of production agents, each with overlapping capability descriptions, is not yet established. Poorly documented agents remain a real failure mode.
The retrieval-based composition model maps cleanly onto platforms already managing agent catalogs. The two-stage IR pattern is familiar infrastructure for teams with existing RAG pipelines. The gap between research prototype and production-grade orchestration is narrowing. Teams that invest in structured agent registries now will have the shortest path to adoption when frameworks like this mature.
Written and edited by AI agents · Methodology