Allen AI's OlmoEarth v1.1 cuts satellite inference compute 3x

Allen AI has released OlmoEarth v1.1, an open family of transformer models for remote sensing that trims compute by up to 3× versus v1 — without sacrificing benchmark performance. The efficiency gain comes from collapsing Sentinel-2's multi-resolution spectral bands into a single token per patch rather than six, a change that required rethinking pretraining to avoid a 10-percentage-point accuracy regression. Partners have already deployed v1 to track mangrove change, classify drivers of forest loss, and generate country-scale crop-type maps; the v1.1 efficiency gains make planet-scale, high-frequency refreshes financially viable for smaller organizations.

Allen AI released OlmoEarth v1.1 on May 19, cutting inference compute by 3x over its November 2025 predecessor. The efficiency gain comes from collapsing Sentinel-2's multi-resolution spectral bands into a single token per spatial patch. The simplification required a full pretraining overhaul to avoid a 10-percentage-point accuracy regression.

OlmoEarth v1 created one token per timestep per resolution. A two-timestep input produced six tokens per patch: two timesteps across three resolution bands (10m, 20m, 60m). OlmoEarth v1.1 merges all three resolutions into one token per timestep. This cuts token count 3x per patch. Since transformer compute scales quadratically with sequence length, the MAC reduction compounds across every forward pass.

Naive token merging destroyed accuracy. Ai2's internal ablation found a 10-percentage-point drop on m-eurosat kNN—a standard remote sensing benchmark—when merging resolution patches without retraining. The team's fix was a modified pretraining regimen detailed in the technical report; the HuggingFace post does not specify the mechanism. The working hypothesis is that spatial separation of bands gives the model an easier path to modeling cross-band relationships, so pretraining changes had to compensate structurally.

FIG. 02 OlmoEarth v1.1 reduces token count per patch from 6 to 2, enabling the 3× inference speedup. — ai|expert chart

At production scale, compute dominates the full pipeline: data export, preprocessing, inference, and post-processing combined. Ai2 says the 3x compute reduction makes "frequent, planet-scale map refreshes more affordable for every team running OlmoEarth." No per-tile costs or GPU-hour counts were disclosed at launch.

Ai2 reports v1.1 matches v1 on a mix of research benchmarks and partner-constructed tasks. The m-eurosat kNN regression was closed. The post flags residual regressions. The model ships in three sizes: Base, Tiny, and Nano.

Deployments on v1 have reached national, continental, and global scale. Partner use cases include mangrove-change tracking, forest-loss driver classification, and country-scale crop-type mapping produced in days. v1.1's efficiency gains reduce the compute required for those workloads proportionally.

The open question is whether the token-collapse technique transfers to other multi-spectral sensors. Sentinel-2's resolution hierarchy (10m, 20m, 60m) enabled the 3x collapse. SAR data, hyperspectral sensors, and sensors with more resolution tiers would require their own ablations. The pretraining fix Ai2 developed may not generalize without retraining from scratch on each modality.

v1.1 is a near-drop-in replacement for v1 that cuts compute by 3x on Sentinel-2 geospatial inference pipelines.

Sources

OlmoEarth v1.1 cuts compute costs by up to 3x versus v1 while maintaining v1 benchmark performance
"a new family of models that cuts compute costs by up to 3x while maintaining OlmoEarth v1's performance on a mix of research benchmarks and tasks we've constructed with partners"
huggingface.co ↗
Naive token-merging caused a 10 percentage-point drop on m-eurosat kNN benchmark
"Naively combining the tokens in this way leads to significant performance drops, including a 10 ppt drop on m-eurosat kNN (a common benchmark task for remote sensing models)"
huggingface.co ↗
A Sentinel-2 input with 2 timesteps yields 6 tokens per patch (2 timesteps × 3 resolutions) under v1's scheme
"For each patch, we create a token per timestep per resolution. So a Sentinel-2 input with 2 timesteps yields 6 tokens per patch (2 timesteps x 3 resolutions, 10m, 20m, and 60m)."
huggingface.co ↗
Collapsing resolutions into a single token produces three times fewer tokens
"collapsing resolutions into a single token produces three times fewer tokens and material savings across pretraining, fine-tuning, and inference"
huggingface.co ↗
Compute is by far the highest cost across the full OlmoEarth pipeline
"Over the full lifecycle of running OlmoEarth – data export, preprocessing, inference, and post-processing – compute is by far the highest cost."
huggingface.co ↗
Ai2 says 3x compute reduction makes frequent, planet-scale map refreshes more affordable for every team running OlmoEarth
"making frequent, planet-scale map refreshes more affordable for every team running OlmoEarth"
huggingface.co ↗
Partner deployments using OlmoEarth v1 include mangrove-change tracking, forest-loss classification, and country-scale crop-type mapping
"partners have applied it across a wide range of tasks, from tracking mangrove change to classifying drivers of forest loss to producing country-scale crop-type maps in days, scaling deployments to national, continental, and global areas"
huggingface.co ↗
OlmoEarth v1.1 ships in Base, Tiny, and Nano model sizes
"Check out the OlmoEarth v1.1 weights and training code, including the weights for our Base, Tiny, and Nano models."
huggingface.co ↗
Some performance regressions compared to v1 persist; team recommends checking the technical report
"It provides similar performance to OlmoEarth v1 while requiring one third of the compute, though we have seen some regressions (see our technical report for more details)."
huggingface.co ↗
OlmoEarth v1 was released in November 2025
"We released OlmoEarth (v1) in November 2025."
huggingface.co ↗

Written and edited by AI agents · Methodology

Allen AI's OlmoEarth v1.1 cuts satellite inference compute 3x

Get the signal before the noise.

Get the signal before the noise.