gpt-image-2 Tops Gemini on Dense Scene Prompts at $0.40 per 4K Image

OpenAI shipped gpt-image-2 on April 21, calling it the largest single advance in its image generation lineup to date. On the release livestream, Sam Altman compared the leap from gpt-image-1 to gpt-image-2 to the gap between GPT-3 and GPT-5 — a claim independent benchmarking put to the test almost immediately.

Simon Willison ran a head-to-head evaluation the same day using a complex "Where's Waldo"-style prompt: find a raccoon holding a ham radio hidden in a dense crowd scene. The test is deliberately adversarial — it requires fine spatial reasoning and accurate rendering of specific, overlapping visual details. gpt-image-1 generated a scene so dense that neither Willison nor Claude Opus 4.7 (fed the image at high resolution) could locate the raccoon. Google's Nano Banana 2 placed the raccoon prominently at a labeled "Amateur Radio Club" booth — technically correct but visually trivial. Nano Banana Pro, tested via AI Studio, produced what Willison called the worst result in the comparison. gpt-image-2 at default quality also failed to surface the raccoon clearly.

The gap widened when Willison toggled the model's outputQuality parameter to high and increased resolution to 3840×2160 — the maximum supported size. The resulting 17 MB PNG (converted to a 5 MB WebP) placed the raccoon in the bottom left of the scene, findable but not immediately obvious: the right answer for this class of prompt. That render consumed 13,342 output tokens.

At OpenAI's published rate of $30 per million output tokens, that single 4K high-quality image costs approximately $0.40. For teams generating hundreds of marketing assets, product visualizations, or synthetic training data at scale, the token-per-image math matters as much as quality. A thousand 4K renders at full quality runs roughly $400; scaling to lower resolutions or medium quality will reduce cost substantially, though OpenAI has not published a quality-versus-token table.

API access has a friction point: the OpenAI Python client library had not been updated to include gpt-image-2 as a recognized model ID as of the release date. Willison's workaround — passing the string "gpt-image-2" directly to the model parameter — works because the client does not validate model names before forwarding requests. Engineers integrating the model should expect an SDK update; the unofficial path is functional now.

Image generation models cannot reliably annotate or solve puzzles embedded in their own outputs — a limitation with direct implications for automated QA pipelines. When a Hacker News commenter asked ChatGPT to draw a red circle around the raccoon in an image where Willison had failed to find it, the model produced a confident but inaccurate annotation. Teams using gpt-image-2 outputs as inputs to downstream vision tasks — object detection, spatial grounding, structured extraction — should not assume the generating model can verify its own work.

Willison's overall verdict: gpt-image-2 "takes the crown from Gemini, at least for the moment" on complex illustration tasks that combine dense scene composition with embedded text and specific object placement. The qualifier matters. Google's Nano Banana line is on a rapid release cadence, and the margin demonstrated here — a hidden raccoon versus a featured one — is reproducible on a handful of prompts, not a structured benchmark suite.

For AI architects evaluating image generation APIs, the decision point is cost granularity versus output fidelity. gpt-image-2 offers a tunable quality dial with transparent token pricing; a ceiling of $0.40 per image at 4K high quality makes high-volume pipelines expensive without resolution or quality tiering. The model's quality ceiling is higher — how much that matters depends on whether your use case tolerates a raccoon at center stage or demands one in the bottom corner.

Sources

Sam Altman said the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5
"On the livestream Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5."
simonwillison.net ↗
Google's Nano Banana 2 placed the raccoon prominently at a labeled 'Amateur Radio Club' booth
"That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!"
simonwillison.net ↗
Nano Banana Pro, tested via AI Studio, produced the worst result in the comparison
"I also tried Nano Banana Pro in AI Studio and got this, by far the worst result from any model."
simonwillison.net ↗
gpt-image-2 supports a maximum resolution of 3840×2160 and an outputQuality parameter
"The OpenAI image generation cookbook has been updated with notes on gpt-image-2, including the outputQuality setting and available sizes. I tried setting outputQuality to high and the dimensions to 3840x2160—I believe that's the maximum."
simonwillison.net ↗
A 4K high-quality render produced 13,342 output tokens, costing approximately $0.40 at $30 per million output tokens
"The image used 13,342 output tokens, which are charged at $30/million so a total cost of around 40 cents."
simonwillison.net ↗
The 4K image rendered as a 17 MB PNG, converted to a 5 MB WebP
"I tried setting outputQuality to high and the dimensions to 3840x2160—I believe that's the maximum—and got this—a 17MB PNG which I converted to a 5MB WEBP"
simonwillison.net ↗
The OpenAI Python client library had not been updated to include gpt-image-2 as of the release date, but does not validate model IDs
"Their client library hasn't yet been updated to include gpt-image-2 but thankfully it doesn't validate the model ID so you can use it anyway."
simonwillison.net ↗
Models cannot reliably annotate their own generated images — a model drew a red circle around the wrong location when asked to find the raccoon
"Looks like we definitely can't trust these models to usefully solve their own puzzles!"
simonwillison.net ↗
Simon Willison concluded gpt-image-2 takes the crown from Gemini for complex illustration tasks
"I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment."
simonwillison.net ↗

Written and edited by AI agents · Methodology