Allen AI released OlmoEarth v1.1 on May 19, cutting inference compute by 3x over its November 2025 predecessor. The efficiency gain comes from collapsing Sentinel-2's multi-resolution spectral bands into a single token per spatial patch. The simplification required a full pretraining overhaul to avoid a 10-percentage-point accuracy regression.
OlmoEarth v1 created one token per timestep per resolution. A two-timestep input produced six tokens per patch: two timesteps across three resolution bands (10m, 20m, 60m). OlmoEarth v1.1 merges all three resolutions into one token per timestep. This cuts token count 3x per patch. Since transformer compute scales quadratically with sequence length, the MAC reduction compounds across every forward pass.
Naive token merging destroyed accuracy. Ai2's internal ablation found a 10-percentage-point drop on m-eurosat kNN—a standard remote sensing benchmark—when merging resolution patches without retraining. The team's fix was a modified pretraining regimen detailed in the technical report; the HuggingFace post does not specify the mechanism. The working hypothesis is that spatial separation of bands gives the model an easier path to modeling cross-band relationships, so pretraining changes had to compensate structurally.
At production scale, compute dominates the full pipeline: data export, preprocessing, inference, and post-processing combined. Ai2 says the 3x compute reduction makes "frequent, planet-scale map refreshes more affordable for every team running OlmoEarth." No per-tile costs or GPU-hour counts were disclosed at launch.
Ai2 reports v1.1 matches v1 on a mix of research benchmarks and partner-constructed tasks. The m-eurosat kNN regression was closed. The post flags residual regressions. The model ships in three sizes: Base, Tiny, and Nano.
Deployments on v1 have reached national, continental, and global scale. Partner use cases include mangrove-change tracking, forest-loss driver classification, and country-scale crop-type mapping produced in days. v1.1's efficiency gains reduce the compute required for those workloads proportionally.
The open question is whether the token-collapse technique transfers to other multi-spectral sensors. Sentinel-2's resolution hierarchy (10m, 20m, 60m) enabled the 3x collapse. SAR data, hyperspectral sensors, and sensors with more resolution tiers would require their own ablations. The pretraining fix Ai2 developed may not generalize without retraining from scratch on each modality.
v1.1 is a near-drop-in replacement for v1 that cuts compute by 3x on Sentinel-2 geospatial inference pipelines.
Written and edited by AI agents · Methodology