Single Researcher Places 2nd in ICRA Robot-Folding Challenge

A single independent researcher, Ilia Larchenko, finished 1st of 62 teams in the LeHome Challenge 2026 simulation phase — ICRA's first standardized competition for deformable-object manipulation — then placed 2nd in the real-world final in Vienna with a score of 865 against the winner's 895. The arXiv paper frames itself as a recipe paper, not a research claim: known RL techniques recombined under competition pressure, applied to a flow-matching VLA.

FIG. 02 LeHome Challenge 2026 real-world final leaderboard: top-3 scores. — LeHome Challenge 2026

The task required bimanual garment folding on a SO-ARM101 setup: two 6-DOF arms, a 12-dimensional joint action space, three RGB cameras, running at 30 Hz in simulation and 20 Hz on the physical robot. Four garment types — long-sleeved tops, short-sleeved tops, long pants, shorts — with binary success defined by keypoint distance conditions: 5 for tops, 4 for pants. The policy received no garment-category label at inference and had to infer the type from vision alone.

The RL loop combines AWR (Advantage-Weighted Regression) and RECAP-style advantage conditioning on a flow-matching VLA. AWR prioritizes high-advantage frames during training. RECAP conditions advantage as a network input, enabling classifier-free guidance at inference — the "aggressiveness" of the policy can be dialed without retraining. Larchenko argues this approach suits flow-matching VLAs better than on-policy PPO, which risks instability with the non-Markovian trajectory structure common in manipulation.

The policy doubles as its own value function, eliminating a separate critic. The same network outputs actions, success probability, task progress, and task-relevant future quantities. These auxiliary outputs drive advantage estimation, live failure detection, and candidate selection at inference. Training ran on a single H200 with rollouts collected in parallel on RTX PRO 6000 GPUs. The training worker, rollout workers, and DAgger station communicate solely through HuggingFace Hub checkpoints.

Inference-time hyperparameter optimization uses Thompson sampling. Rather than fixing guidance strength and candidate count at training time, the system searches over them during evaluation, treating each attempt as a bandit arm. For competition settings with a fixed number of attempts, this recovers hyperparameter sensitivity without burning shots on random exploration.

Sim-to-real transfer was blind: Larchenko had no access to the organizers' physical rig. The transfer chain ran sim → own robot → their robot. Camera-alignment tooling anchored the simulation viewpoint to the real overhead camera. Heavy domain randomization covered lighting and garment texture. A DAgger-style human-in-the-loop loop patched distribution shift. For garment-type inference, a learned input token is bootstrapped at inference using a lightweight classifier that runs a short rollout before committing to the main trajectory.

FIG. 03 Blind sim-to-real transfer chain: policy trained on researcher's simulation and robot, then evaluated on organizers' physical rig with no access. — Larchenko, 2026

Behavior cloning on the organizers' scripted demonstrations failed because expert trajectories were inflexible — small cloth deviations produced no recovery signal. The RL loop addresses that brittleness. Generalization to unseen garments required heavy simulation domain randomization with no coverage guarantee. The 30-point gap to the winner in the real-world round suggests the randomization was imperfect.

For architects building bimanual systems, the actionable takeaways are specific: AWR + RECAP advantage conditioning composes onto any flow-matching policy without a critic; HuggingFace Hub as shared rollout state eliminates distributed RL infrastructure; Thompson sampling at inference is a low-overhead way to recover hyperparameter sensitivity. The paper explicitly notes no component was ablated in isolation — this is a deployment log, not a proof.

Sources

System placed 1st of 62 teams in the online simulation round and 2nd in the real-world final of LeHome Challenge 2026
"The system placed 1st of 62 teams in the online (simulation) round and 2nd in the real-world final."
arxiv.org ↗
Bimanual SO-ARM101 setup: two 6-DOF arms, 12-dimensional joint action space, 30 Hz in sim / 20 Hz real
"two 6-DOF arms, a 12-dimensional joint action space at 30 Hz in sim (20 Hz in the real round), and three RGB cameras (one overhead, one on each wrist)"
arxiv.org ↗
AWR + RECAP combined for flow-matching VLA; advantage conditioning enables classifier-free guidance at inference
"AWR + RECAP combined for flow-matching VLA; an asynchronous distributed training / rollout pipeline through HuggingFace Hub; inference-time hyperparameters optimization via Thompson sampling"
arxiv.org ↗
Policy is its own value function: same network predicts actions, success, progress, and task-relevant future quantities
"The policy is its own value function: the same network that predicts actions also predicts success, progress, and a few task-relevant future quantities, and those predictions drive advantage estimation, live failure detection, and candidate selection."
arxiv.org ↗
Training ran on a single H200; rollouts collected on RTX PRO 6000 GPU; components communicate via HuggingFace Hub checkpoints
"Training ran on a single H200; rollouts were collected mostly on RTX PRO 6000 GPU."
arxiv.org ↗
Sim-to-real transfer was blind — author never had access to the evaluation robot; chain was sim → own robot → their robot
"No access to the evaluation robot. For the real round I never had the actual evaluation rig, so transfer was really sim → my robot → their robot, with an extra generalization step baked in."
arxiv.org ↗
Real-world final leaderboard: sZs 895 pts (1st), ilya 865 pts (2nd), Dum-E 762.5 pts (3rd)
"1 sZs 895 / 2 ilya 865 / 3 Dum-E 762.5"
lehome-challenge.com ↗
Competition ran on NVIDIA Isaac Lab; top 8 simulation teams invited to real-world final at ICRA 2026 in Vienna
"The top eight ranked participants in this phase will be invited to compete in the Real-World Challenge, which will be held on-site at ICRA from June 1 to June 5, 2026."
lehome-challenge.com ↗
LeHome challenge is the world's first robotics competition dedicated to diverse garment manipulation in home scenarios
"Garment manipulation is a fundamental yet highly challenging problem in the robotic manipulation area, involving complex, deformable objects and contact-rich interactions."
lehome-challenge.com ↗
Four garment types evaluated: long-sleeved tops, short-sleeved tops, long pants, shorts; online round scored over 20 instances per type
"Four garment types are evaluated: long-sleeved tops, short-sleeved tops, long pants, and shorts... Each garment type is scored over 20 instances: 10 seen garments... and 10 unseen."
arxiv.org ↗

Written and edited by AI agents · Methodology

Single Researcher Places 2nd in ICRA Robot-Folding Challenge

Get the signal before the noise.

Get the signal before the noise.