Grab Treats Autonomous Agents as Untrusted by Design

Grab's cybersecurity team shipped Palana, a Kubernetes-native secure execution platform for autonomous AI agents running hundreds of concurrent workloads across the company's ride-hailing, payments, and logistics operations in 900 cities across eight countries. The platform emerged after prototyping environments for OpenClaw and other agent frameworks. The team concluded that ad-hoc container configuration couldn't answer hard questions: which user does an agent act on behalf of, what credentials can it use, and how do you stop it without trusting it to cooperate?

The threat model is explicit: agents executing arbitrary tools, calling APIs, and writing code carry fundamentally different risk profiles than stateless services. Prompt injection, logic hijacking, dependency compromise, excessive goal-seeking, and credential exposure are all in scope. Palana's core defense is namespace isolation. Each agent runs in its own Kubernetes namespace provisioned with restrictive RBAC, resource quotas, custom network policies, and isolated service accounts. A Kubernetes operator reconciles the full lifecycle—namespaces, storage, ingress, and network policies—from a custom resource definition. Developers interact through a CLI called pcli or a self-service portal. Platform engineers work directly with standard Kubernetes objects.

Secrets handling marks Palana's sharpest departure from typical container workflows. Passing credentials through environment variables or mounted files is unacceptable for autonomous agents. Palana uses proxy-only secrets instead: sensitive credentials—version control tokens, model gateway API keys, per-agent GrabGPT tokens—stay in HashiCorp Vault and never reach the agent container. Agents receive only abstract placeholder tokens. When an agent initiates an outbound call, a secure proxy intercepts the request, validates the destination, and replaces the placeholder with the live credential. The raw secret never reaches the container's environment, execution memory, or logs.

Egress is a centralized control point. All outbound HTTP and HTTPS traffic routes through Envoy, which calls an ext-authz sidecar running Open Policy Agent rules to identify the calling pod, evaluate policy, and log the request. For HTTPS, Palana performs MITM termination using a CA it distributes to agent pods. This enables full header inspection and endpoint validation on encrypted traffic—something Kubernetes network policies alone cannot provide. The structured audit logs are the primary post-incident forensics surface.

Kill switches operate outside the agent's trust boundary. Asking an agent to stop is a feature, not a safety control. A compromised or confused agent cannot be trusted to self-terminate. Palana's kill switches run at the control plane: network policies are disabled directly from outside the runtime, and an external reaper handles idle shutdowns without touching the agent process. Agents also get persistent /data storage, so long-running workflows—Hermes, Matlock, Butler, cts-aergia Slack automations—survive container restarts without losing session state or memory context.

The production footprint covers OpenClaw and agent-framework test workloads, Claude Code and OpenCode browser-accessible cloud development environments, Slack-connected agents including cts-aergia and Claude-to-Slack workflows, and higher-order systems where agentic supervisors route work to scoped child agents. Grab is planning a second post covering lifecycle orchestration internals, LLM routing, and operational visibility tooling.

The key design principle: every security control must live outside the agent's trust boundary. Credential injection, network termination, and kill switches all run in infrastructure the agent cannot reach or modify. This constraint rules out a large class of compromise paths. It also means the Kubernetes operator must own the full agent lifecycle, and every new agent capability requires an explicit platform affordance rather than a one-off workaround.

FIG. 02 Palana's defense-in-depth: agent sandboxed in namespace with three control-plane security layers outside its trust boundary. — Grab Engineering

Sources

Palana is a Kubernetes-native secure execution platform implementing deterministic guardrails around model-driven applications, built after prototyping environments for Claw and other agent frameworks
"Palana acts as a secure, isolated runtime environment that implements deterministic guardrails around the inherently non-deterministic behaviors of model-driven applications."
infoq.com ↗
Palana is currently running hundreds of agents in production including OpenClaw workers, Hermes agents, Slack automations, and remote development environments
"It is currently used to run hundreds of agents, including remote development environments, Slack automation, OpenClaw workers, Hermes agents, and other long-running internal systems."
engineering.grab.com ↗
Grab serves over 900 cities across eight countries in Southeast Asia
"Grab is Southeast Asia's leading superapp, serving over 900 cities across eight countries (Cambodia, Indonesia, Malaysia, Myanmar, the Philippines, Singapore, Thailand, and Vietnam)."
engineering.grab.com ↗
Each agent is assigned its own dedicated Kubernetes namespace with RBAC, resource quotas, network policy, and isolated service accounts
"Palana achieves this by assigning each agent to its own dedicated Kubernetes namespace configured with restrictive Role-Based Access Control, custom network policies, and isolated service accounts."
infoq.com ↗
Sensitive credentials including PATs and model gateway API keys remain in HashiCorp Vault; agents only receive placeholder tokens that are swapped by a proxy at call time
"Highly sensitive credentials, such as version control personal access tokens and model gateway API keys, remain secured within HashiCorp Vault. The agent container is only provisioned with abstract, dummy placeholder tokens."
infoq.com ↗
All egress routes through Envoy plus an ext-authz proxy running Open Policy Agent rules, with MITM certificate termination for HTTPS traffic
"External HTTP and HTTPS traffic flows through Envoy. Envoy asks ext-authz-proxy to identify the calling pod, evaluate policy with OPA, log the request, and optionally inject credentials. HTTPS traffic can be terminated by the proxy's man-in-the-middle (MITM) listener for header inspection and replacement."
engineering.grab.com ↗
Kill switches operate at the network-policy level from the control plane, not by signalling the agent process — because a compromised agent cannot be trusted to self-terminate
"A kill switch that asks the agent to stop is a feature. A kill switch that removes the agent's network path is a safety control. Palana assumes an agent might become confused, compromised, or uncooperative. Operational controls therefore live outside the agent process."
engineering.grab.com ↗
LLM access is provided through a LiteLLM wrapper that injects per-agent GrabGPT credentials from Vault
"LLM access through a LiteLLM wrapper that injects per-agent GrabGPT credentials from Vault."
engineering.grab.com ↗
Each agent is modeled as a custom Kubernetes resource reconciled by a custom operator; developers use pcli or a portal while systems engineers work with native Kubernetes objects
"Each agent is modeled as a custom resource reconciled by a custom Kubernetes operator that dynamically provisions namespaces, storage, network policies, and ingress paths. This design splits the operational experience into a simplified user interface and command-line tool for developers, and a robust, standard Kubernetes layer for systems engineers."
infoq.com ↗

Written and edited by AI agents · Methodology

Grab Treats Autonomous Agents as Untrusted by Design

Get the signal before the noise.

Get the signal before the noise.