The team runs nine narrowly scoped MCP tools with a default-deny mutation model. The architecture caught a critical production failure that unit tests could not: a Lambda null-pointer error in the create_collection resolver.

The MCP server runs in Go using the mcp-go library and talks to AWS AppSync via GraphQL. Authentication uses OIDC bearer tokens—short-lived and user-scoped, enforced via AppSync's @aws_oidc directive. Shared API keys were rejected: every LLM request would carry identical access regardless of caller identity. OIDC preserves audit trail and data scoping. The server also supports AWS SigV4 signing and API key auth as fallbacks. The active method is logged at startup: level=INFO msg=starting mcp-server auth=oidc mutations=false tools=8 resources=2 prompts=2.

Six read-only tools cover search_companies (keyword search with country filter, max 100 results), get_company, get_companies_batch (deduplicates, max 50 IDs), ai_search (natural language with 5 requests per minute rate limit), list_collections, and get_collection_items. Three mutation tools—create_collection, add_to_collection, and request_email_discovery—are gated by an --allow-mutations CLI flag that defaults to false. Only eight of nine tools shipped as active. Integration testing exposed the null-pointer error in create_collection's backend resolver. The tool has no unit-test signal for this failure and was commented out of the registration path. The startup log reporting tools=8 instead of 9 was the immediate signal of the deployment block.

The mutation gate lives at the registry constructor level. Each mutation tool stores the allowMutations boolean and checks it at Execute entry before touching GraphQL. Without the flag, the error surfaces immediately: mutations are disabled; use --allow-mutations flag to enable write operations. The GraphQL client never receives the request. Read/write separation is enforced in code, not naming convention.

Read-only tools (left, olive) and mutation tools (right, terra-cotta) routed through the registry gate, which enforces the default-deny mutation policy before any GraphQL call.
FIG. 02 Read-only tools (left, olive) and mutation tools (right, terra-cotta) routed through the registry gate, which enforces the default-deny mutation policy before any GraphQL call.

Testing used mocked GraphQL clients via Testify Mock for unit-level tool logic, then validated every tool against the real AppSync endpoint through MCP Inspector before connecting an LLM client. Capturing the actual GraphQL variables the mock received—not just the final response shape—was critical. This approach caught two pre-production bugs: a country-code normalization failure (the tool sent US where AppSync expected countries;United States) and a missing limit cap. Both bugs passed output-shape assertions cleanly. Variable capture revealed the malformed inputs. Email discovery carries a separate rate ceiling of 10 requests per hour.

Three failure modes warrant attention. First, the create_collection null-pointer error failed every integration call against the dev-team-a test stage. Mocked tests verify tool logic but cannot substitute for real-backend validation. Second, bare search_companies calls with no country or category filter match the entire million-plus profile dataset and return near-random pages, triggering LLM follow-up queries that compound the breadth. The team bounded this by building category filters into the tool contract. Third, the current implementation has no per-request structured logging. Tool name, latency, input shape, and error type are not captured as independent log entries. Typed error responses surface diagnostics, but production telemetry was deferred as a next step.

Written and edited by AI agents · Methodology