Cloudflare's 10x Scaling Win Unlocks Distributed Inference Patterns

Cloudflare has achieved a 10x throughput gain in its global security scanning, scaling from 10 to 100 scans per second, by re-architecting its Apache Kafka consumer pipeline and Postgres write path. The stack, detailed by engineer Dave Baxter, involves routing scheduled scan jobs through Kafka to specialized Go microservices called checkers, which audit assets and push findings to an internal API backed by a Postgres database.

FIG. 02 Cloudflare's global security scanning throughput: 10× scaling from batching and partitioning.

The original design encountered queue bottlenecks due to Kafka processing messages in order within a partition, causing a single slow scan to block the entire consumer. Additionally, each checker could only run as many consumers as there were partitions. Cloudflare addressed this by having checkers consume messages in batches and process each message in its own goroutine. The fleet is now split into fast-lane and slow-lane consumer groups, with fast-lane checkers skipping multi-minute jobs, leaving long-running scans to dedicated slow-lane resources.

The database layer was also problematic, as each scan could produce up to 500,000 insights, and the original API issued one INSERT … ON CONFLICT DO UPDATE per insight, resulting in half a million round trips in a single call. The team settled on a hybrid threshold approach: small batches use UNNEST for millisecond writes, while large batches use COPY for second-scale ingests. Despite these improvements, cross-region latency remained an issue, with the primary Postgres in Portland, Oregon, and the API running globally, leading to 20 to 90 percent of wall-clock time spent on a single API call.

FIG. 03 Architecture fix: from head-of-line blocking to distributed batch processing and bulk inserts.

Operationally, the system cleared millions of backlogged events, doubled scanning frequency for all customers, and added millions of previously unscanned free-tier accounts to automatic coverage without adding Kafka partitions or broker load. However, the fixes have trade-offs, such as increased redo work if a process crashes mid-batch and the risk of table bloat with COPY if thresholds are mis-tuned. The post also identifies cross-region latency as the root cause of throughput collapse but does not describe the remediation, leaving an open question for teams replicating this architecture.

For LLM serving, the distributed-compute patterns translate directly. Treat the inference gateway like Cloudflare's checker pool, batching incoming requests and fanning them out to parallel decode workers. Split queues by context-length class to eliminate head-of-line blocking at the GPU. Apply the UNNEST/COPY threshold logic to embedding or chat-log writes, using parameterized batch inserts for small result sets and bulk COPY for large trace dumps, switching on a row-count threshold to avoid ORM-style round-trip death. Co-locate the model state or result cache with inference nodes, and if the database of record must live in a single region, budget the latency tax explicitly, as queue logic cannot outrun the speed-of-light penalty that caused Cloudflare's 20-to-90-percent stall.

Architect the request router, batching policy, and state-store write path as a single latency budget, not as independent optimizations, because in distributed inference, the slowest unbounded write will impact the p99 latency.

Sources

Cloudflare increased scanning throughput from 10 scans per second to 100 scans per second — a 10x gain — doubled scanning frequency for all customers, and added millions of previously unscanned free-tier accounts to automatic coverage.
"we increased scanning throughput for Security Insights by more than 10x, enabled security insights for millions of customers, and doubled our scanning frequency for all customers"
blog.cloudflare.com ↗
The stack routes scheduled scan jobs through Apache Kafka to specialized Go microservices called checkers, backed by a Postgres database.
"the scheduler publishes a message (or messages) to Apache Kafka...These messages fan out to a number of checkers: specialized Go microservices that scan specific assets or configurations"
blog.cloudflare.com ↗
Cloudflare fixed head-of-line blocking by consuming Kafka messages in batches with each message processed in a separate goroutine, and by splitting consumer groups into fast-lane and slow-lane.
"We changed our checkers to consume messages in batches, processing each message in a separate goroutine...splitting our consumer groups and checkers in two – the 'slow lane' and the 'fast lane'"
blog.cloudflare.com ↗
The original database code issued one INSERT per insight; with a maximum observed set size of 500,000, this resulted in half a million round trips per API call.
"With a maximum observed size of 500,000, this was half a million round trips, queries, and transactions in a single API call"
blog.cloudflare.com ↗
The team settled on a hybrid UNNEST/COPY threshold strategy: UNNEST for small batches (millisecond writes), COPY for large batches (second-scale ingests).
"Using UNNEST when the number of issues was below a threshold...Using COPY when the number of issues exceeded this threshold...reasonably fast inserts for huge sets of insights (seconds), and even faster inserts (milliseconds) for small sets"
blog.cloudflare.com ↗
Before optimization, checkers spent 20 to 90 percent of wall-clock processing time on a single API call, causing client-side timeouts and throughput deterioration.
"Many checkers were spending 20-90% of their processing time on a single API call...When triggering a large volume of scans, our throughput would start high and deteriorate"
blog.cloudflare.com ↗
Cloudflare's primary Postgres database is located in Portland, Oregon, while the API runs globally — identified as the root cause of latency-driven throughput collapse.
"Our primary database is located in Portland, Oregon. Our API, however, was running activ[ely across multiple regions]"
blog.cloudflare.com ↗

Written and edited by AI agents · Methodology

Cloudflare's 10x Scaling Win Unlocks Distributed Inference Patterns

Get the signal before the noise.

Get the signal before the noise.