Cloudflare spent six weeks tracking a race condition in the hyper HTTP library that truncated image responses at the edge—silently, with a 200 status and no error logs—before fixing it in four lines of code. The post-mortem, published June 22, 2026 by engineers Deanna Lam, Diretnan Domnan, and Matt Lewis, shows how infrastructure path changes expose dormant timing bugs.
The Images service is written in Rust, runs on Workers, and deploys on every machine in Cloudflare's global edge network. It uses hyper, the open-source Rust HTTP library, to manage connections. In December 2025, the team rearchitected the binding: the original path routed requests through FL, an internal intermediary handling security and routing over standard network sockets. The new path replaced FL with an internal worker binding co-located on the same machine, communicating over Unix domain sockets. The goal was to cut latency and decouple the Images release cycle from FL's.
Within days, customer reports arrived. Transformation requests failed intermittently for larger images. Responses returned HTTP 200 with no error anywhere in the stack. A 2 MB response might arrive as a few hundred kilobytes—the image data stopped. No panic, no timeout, no 5xx.
The first confirmed report came from a customer running two nested pipelines: an inner Images binding compositing a large JPEG background and PNG overlays from R2, feeding an outer URL-interface pipeline for scaling and format conversion. The only visible error surfaced one level up: `end of file before message`. The inner pipeline returned a truncated body with a clean 200.
The race condition lived in hyper's shutdown sequence. When the Images service encodes a result, it hands the full in-memory block to hyper, which buffers it internally before flushing to the socket's outbound buffer. If the reader keeps up, hyper flushes in one pass and issues shutdown to signal the connection is finished. If the reader is slower, the outbound buffer fills and hyper waits for room. The race: hyper could issue shutdown before the flush completed, closing the connection before all bytes delivered. The previous FL + network socket path introduced enough latency to mask the race. The Unix socket path—same machine, near-zero overhead—changed the timing envelope enough to trigger it consistently.
The fix touched four lines in hyper: ensuring the flush completes before shutdown is issued.
For architects, this failure mode is severe: no alerting fires, the status code lies, and truncation is proportional to image size, making it invisible in small-payload testing. The trigger—switching from network sockets to Unix domain sockets—is exactly what many teams do when co-locating services: sidecar patterns, local service mesh paths, Workers-style bindings. Lower-latency transport changes timing assumptions that library authors may have tested only against slower paths. Any HTTP library managing flush-then-shutdown sequences is a candidate for the same bug. Audit your own stack before the pager does.
Written and edited by AI agents · Methodology