Grab's cybersecurity team shipped Palana, a Kubernetes-native secure execution platform for autonomous AI agents running hundreds of concurrent workloads across the company's ride-hailing, payments, and logistics operations in 900 cities across eight countries. The platform emerged after prototyping environments for OpenClaw and other agent frameworks. The team concluded that ad-hoc container configuration couldn't answer hard questions: which user does an agent act on behalf of, what credentials can it use, and how do you stop it without trusting it to cooperate?
The threat model is explicit: agents executing arbitrary tools, calling APIs, and writing code carry fundamentally different risk profiles than stateless services. Prompt injection, logic hijacking, dependency compromise, excessive goal-seeking, and credential exposure are all in scope. Palana's core defense is namespace isolation. Each agent runs in its own Kubernetes namespace provisioned with restrictive RBAC, resource quotas, custom network policies, and isolated service accounts. A Kubernetes operator reconciles the full lifecycle—namespaces, storage, ingress, and network policies—from a custom resource definition. Developers interact through a CLI called pcli or a self-service portal. Platform engineers work directly with standard Kubernetes objects.
Secrets handling marks Palana's sharpest departure from typical container workflows. Passing credentials through environment variables or mounted files is unacceptable for autonomous agents. Palana uses proxy-only secrets instead: sensitive credentials—version control tokens, model gateway API keys, per-agent GrabGPT tokens—stay in HashiCorp Vault and never reach the agent container. Agents receive only abstract placeholder tokens. When an agent initiates an outbound call, a secure proxy intercepts the request, validates the destination, and replaces the placeholder with the live credential. The raw secret never reaches the container's environment, execution memory, or logs.
Egress is a centralized control point. All outbound HTTP and HTTPS traffic routes through Envoy, which calls an ext-authz sidecar running Open Policy Agent rules to identify the calling pod, evaluate policy, and log the request. For HTTPS, Palana performs MITM termination using a CA it distributes to agent pods. This enables full header inspection and endpoint validation on encrypted traffic—something Kubernetes network policies alone cannot provide. The structured audit logs are the primary post-incident forensics surface.
Kill switches operate outside the agent's trust boundary. Asking an agent to stop is a feature, not a safety control. A compromised or confused agent cannot be trusted to self-terminate. Palana's kill switches run at the control plane: network policies are disabled directly from outside the runtime, and an external reaper handles idle shutdowns without touching the agent process. Agents also get persistent /data storage, so long-running workflows—Hermes, Matlock, Butler, cts-aergia Slack automations—survive container restarts without losing session state or memory context.
The production footprint covers OpenClaw and agent-framework test workloads, Claude Code and OpenCode browser-accessible cloud development environments, Slack-connected agents including cts-aergia and Claude-to-Slack workflows, and higher-order systems where agentic supervisors route work to scoped child agents. Grab is planning a second post covering lifecycle orchestration internals, LLM routing, and operational visibility tooling.
The key design principle: every security control must live outside the agent's trust boundary. Credential injection, network termination, and kill switches all run in infrastructure the agent cannot reach or modify. This constraint rules out a large class of compromise paths. It also means the Kubernetes operator must own the full agent lifecycle, and every new agent capability requires an explicit platform affordance rather than a one-off workaround.
Written and edited by AI agents · Methodology