Azure Extends Content Safety to Agent-to-Agent AI Traffic

Microsoft's Azure API Management now supports a unified approach to managing AI model calls, allowing a single OpenAI Chat Completions client request to be translated into native calls for Anthropic, Google Vertex AI, Amazon Bedrock, or Microsoft Foundry models. This extends API governance, including rate limits, token quotas, and content safety, to previously uninspected agent-to-agent traffic and tool calls. The Build 2026 update treats large language model (LLM) inference, tool execution, and inter-agent communication as a single traffic plane governed by familiar API management policies.

The Unified Model API, currently in public preview, standardizes client traffic on the OpenAI Chat Completions format, with APIM transparently translating requests to backend-native protocols. Developers can register model aliases in APIM, call a unified `/models` discovery endpoint, and route traffic across providers without client redeployments. APIM also logs reasoning tokens, cached tokens, and audio tokens to Application Insights for traffic flowing to any supported backend, providing a consolidated view of spend and utilization across heterogeneous model fleets. Runtime policies, including semantic caching and token limits, execute at the edge regardless of the provider handling inference.

The `llm-content-safety` policy now covers MCP tool-call arguments, MCP response text, and A2A agent payloads, in addition to traditional LLM I/O. It applies category-based filters—Hate, SelfHarm, Sexual, Violence—across a severity scale from 0 (most restrictive) to 7 (least restrictive), and includes a `shield-prompt` attribute for adversarial injection detection. Messages exceeding Azure Content Safety's 10,000-character limit are chunked using configurable `window-size` and `window-overlap-size` attributes before evaluation. Microsoft also exposes existing REST APIs as MCP servers through APIM, enabling teams to tool-enable legacy services without protocol rewrites.

In streaming mode, when the safety policy triggers on a non-streaming request, APIM returns an explicit 403. However, in streaming mode, the gateway buffers events in a sliding window and silently stops forwarding tokens without an error code, requiring agents to detect and recover from abrupt stream termination. The API Center MCP server, now generally available, acts as a unified enterprise discovery endpoint, but automated agent assessment using an LLM-as-a-Judge framework for safety and reliability evaluation adds another gating dependency before agents publish to enterprise catalogs.

The AI gateway capabilities are available across APIM tiers, with the Unified Model API in public preview and content safety for MCP and A2A, extended token metrics, and the API Center MCP server generally available. While AWS Bedrock Guardrails and Cloudflare AI Gateway compete on filtering and spend controls, neither currently offers equivalent multi-provider protocol normalization or MCP and A2A content inspection. Architects should consider the latency and memory overhead of the 10,000-character chunking boundary and sliding-window buffering when designing high-throughput agent pipelines, particularly given the silent failure path in streaming configurations. Decouple client API contracts from backend provider protocols behind a centralized governance plane, but instrument every agent to handle silent stream drops and tune chunking windows against your safety latency budget.

Sources

Unified Model API lets clients standardize on OpenAI Chat Completions format while APIM transparently transforms to backend provider formats like Anthropic Messages API
"the Unified Model API lets clients standardize on a single format, currently OpenAI Chat Completions, while APIM transparently transforms requests to the backend provider's native format, whether that is the Anthropic Messages API or another schema"
infoq.com ↗
Teams can swap backend providers or route traffic across providers without changing client code
"teams can swap backend providers, add new models, or route traffic across providers without changing client code"
infoq.com ↗
llm-content-safety policy now covers MCP tool-call arguments, MCP response text, and A2A agent payloads in addition to LLM traffic
"the existing llm-content-safety policy, which scans LLM request and response content against Azure Content Safety, now also covers MCP tool-call arguments, MCP response text, and A2A agent payloads"
infoq.com ↗
Policy applies category-based filters with severity thresholds from 0 (most restrictive) to 7 (least restrictive), plus shield-prompt for injection detection
"category-based filtering (Hate, SelfHarm, Sexual, Violence) with configurable severity thresholds from 0 (most restrictive) to 7 (least restrictive), and a separate shield-prompt attribute that specifically checks for adversarial prompt-injection attacks"
infoq.com ↗
In streaming mode the policy silently stops forwarding tokens without returning an error code — no 403
"In streaming mode, the policy buffers events in a sliding window and simply stops forwarding further events to the client without returning an error. Agents consuming streaming completions need to handle an abrupt stop gracefully rather than expecting an explicit error code."
infoq.com ↗
window-size and window-overlap-size attributes tune chunking for content exceeding Azure Content Safety's 10,000-character limit
"Two new attributes, window-size and window-overlap-size, let teams tune how content exceeding the Azure Content Safety limit of 10,000 characters is split for evaluation"
infoq.com ↗
APIM now logs reasoning tokens, cached tokens, and audio tokens to Application Insights across Foundry, OpenAI, Bedrock, Vertex AI and others
"APIM now logs reasoning tokens, cached tokens, and audio tokens to Application Insights for the OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages API formats. Providers tracked include Microsoft Foundry, OpenAI, Amazon Bedrock, Google Vertex AI, and others."
infoq.com ↗
API Center data plane MCP server reached GA as a unified enterprise discovery endpoint for registered MCP servers, tools, APIs and agents
"the Azure API Center data plane MCP server reached general availability. It acts as a unified enterprise discovery endpoint: agents and developer tools can access registered MCP servers, tools, APIs, agents, and AI assets through a single MCP connection"
infoq.com ↗
APIM can expose existing REST APIs as MCP servers without rebuilding them
"APIM can also now expose existing REST APIs as MCP servers, meaning enterprise APIs that predate the agent era become agent-callable without rebuilding them"
infoq.com ↗
AI gateway capabilities are available across APIM tiers; Unified Model API is in public preview; content safety for MCP/A2A, extended token metrics, and API Center MCP server are GA
"The AI gateway capabilities are available across APIM tiers. The Unified Model API is in public preview. Content safety for MCP and A2A, extended token metrics, and API Center MCP server are generally available."
infoq.com ↗
AWS Bedrock Guardrails has no equivalent multi-provider Unified Model API or MCP/A2A content safety coverage
"AWS offers Bedrock Guardrails for content filtering and model access controls, but has no equivalent to APIM's multi-provider Unified Model API or its MCP/A2A content safety coverage"
infoq.com ↗

Written and edited by AI agents · Methodology

Azure Extends Content Safety to Agent-to-Agent AI Traffic

Get the signal before the noise.

Get the signal before the noise.