Cloudflare AI Agent Infrastructure vs Kubernetes: 2026

Q: Can Cloudflare Workers access GPUs for AI inference?

Cloudflare Workers run on V8 isolates without direct GPU access. Inference routes through the Cloudflare AI Platform to 14+ providers via Workers AI. For dedicated GPU access, Kubernetes with NVIDIA DRA and KAI Scheduler provides fine-grained allocation including MIG partitioning on H100s.

Q: What is the Kubernetes Agent Sandbox and how does it differ from regular pods?

Agent Sandbox is a Kubernetes SIG Apps project providing a Sandbox CRD for singleton, stateful, long-running agent workloads with suspend/resume lifecycle, stable hostnames, native gVisor/Kata isolation, and SandboxWarmPool for sub-second cold starts. Regular pods are designed for steady-state multi-user traffic, not the idle-heavy bursty pattern of autonomous agents.

Q: How does NVIDIA OpenShell compare to Cloudflare Sandboxes for coding agents?

OpenShell runs locally in Docker with K3s and declarative YAML policy enforcement, designed for single-developer use (alpha). Cloudflare Sandboxes are cloud-hosted with up to 15,000 concurrent instances and ~2 second R2 state restoration, billing only active CPU. Choose OpenShell for local policy-controlled dev; Cloudflare Sandboxes for cloud production agents.

Q: Is Cloudflare's agent infrastructure vendor-locked?

Yes, significantly. Cloudflare's agent stack uses proprietary APIs (Durable Objects, Dynamic Workers, Mesh VPC bindings) that do not port to Kubernetes. Standard protocols (MCP, OpenAI API format, WebSockets) are supported at integration boundaries. A hybrid architecture using Cloudflare for edge orchestration and Kubernetes for backend inference provides the most migration flexibility.

Q: What are the cold start times for each agent sandboxing approach?

Cloudflare Dynamic Workers: a few milliseconds (V8 isolate). Cloudflare Sandboxes: ~2 seconds from R2 state, ~30 seconds fresh. Kubernetes Agent Sandbox with SandboxWarmPool: under 1 second. Kata Containers: approximately 125ms VM boot. gVisor: adds 250m CPU and 100Mi memory overhead at pod startup.

Cloudflare Dynamic Workers are the right choice for stateless, high-density edge agents with millisecond cold starts. Kubernetes is the right choice for GPU inference, hardware-isolated multi-tenant deployments, and compliance-heavy enterprise workloads. A hybrid of both platforms covers the full range of production AI agent requirements.

Running AI agent workloads exposes a fundamental mismatch between how Kubernetes was designed to run services and what agents actually need. Traditional pods serve thousands of concurrent users per instance. Agents run one-to-one: one isolated instance per user or task, mostly idle, occasionally bursting. This April 2026, Cloudflare’s Agents Week shipped four products addressing this directly. Kubernetes has its own answer through the SIG Apps Agent Sandbox CRD, gVisor and Kata Container isolation, NVIDIA’s OpenShell runtime, DRA for GPU allocation, and the Gateway API Inference Extension.

This comparison covers the architecture of each approach, how the isolation models differ at the kernel and hardware level, and a decision framework for choosing between them based on workload characteristics.

Why Agent Infrastructure is Different from Application Infrastructure

Traditional applications scale horizontally: a pod serves thousands of concurrent requests. Agent workloads invert this. A coding agent holds a filesystem session for hours, waits for user input, then bursts to run a build. Multiply that by 10,000 users and you have 10,000 agent instances, most consuming zero CPU at any given moment.

Two properties separate agent infrastructure from standard application infrastructure:

Isolation. Each agent operates on behalf of a specific user or task, often executing untrusted or AI-generated code. A security boundary at the agent level is a hard requirement, not an optimization.

Hibernation. Paying for 10,000 always-on containers to serve 100 active agents at peak is not a viable cost model. Infrastructure that bills only for active compute and suspends to zero changes the economics entirely.

graph LR
    subgraph Traditional["Traditional: One-to-Many"]
        P1[Pod] -->|serves| U1[User 1]
        P1 -->|serves| U2[User 2]
        P1 -->|serves| U3[1000s more]
    end

    subgraph AgentModel["Agents: One-to-One"]
        A1[Instance 1] --> T1[User 1 Session]
        A2[Instance 2] --> T2[User 2 Session]
        A3[...mostly idle] --> T3[User N Session]
    end

Traditional pods amortize their cost across thousands of concurrent users. Agent workloads require one isolated instance per user session, with most instances idle at any given moment. Always-on containers are not viable at this scale.

Cloudflare’s Agent Stack: Edge-Native from Isolates to Linux Environments

Cloudflare’s Agents Week 2026 delivered four discrete infrastructure layers: a sandboxing primitive (Dynamic Workers), a full Linux environment (Sandboxes GA), a private networking layer (Mesh), and an opinionated agent framework (Project Think).

Dynamic Workers: Millisecond Sandboxing with Zero Ambient Authority

Dynamic Workers execute AI-generated code in V8 isolates, the same runtime Chrome uses to isolate browser tabs. An isolate starts in a few milliseconds and uses a few megabytes of memory, roughly 100x faster and 10-100x more memory-efficient than a container.

The capability model is the core security property. A Dynamic Worker starts with zero ambient authority: no network access, no filesystem access, no environment variables. The parent Worker explicitly grants capabilities through bindings:

// Parent Worker spawns a sandboxed Dynamic Worker
const sandbox = await env.DYNAMIC.create({
  script: aiGeneratedCode,
  globalOutbound: null, // zero network by default
  bindings: {
    KV: env.MY_KV, // explicitly grant KV access
  }
});
const result = await sandbox.fetch(new Request("http://sandbox/run"));

Security hardening includes Memory Protection Key (MPK) support, Spectre-specific countermeasures, and V8 security patches deployed within hours of upstream release. Cap’n Proto RPC bridges handle cross-sandbox communication between the Dynamic Worker and its parent without the agent code needing to know it is crossing a security boundary.

Pricing: $0.002 per unique Worker loaded per day (waived during beta), plus standard CPU and invocation costs. Status: Open beta for paid Workers users.

Sandboxes GA: Full Linux Environments for Long-Horizon Agents

Dynamic Workers cover stateless tool execution. For agents that need a real filesystem, dependency installation, git operations, and background processes (coding agents, CI runners, research pipelines), Cloudflare shipped Sandboxes to general availability.

Sandboxes are persistent Linux environments powered by Cloudflare Containers. Key capabilities:

Resume by session ID across requests; sleep on idle, wake on demand
PTY terminal access over WebSocket, compatible with xterm.js
Inotify-based filesystem watching with SSE streams
waitForLog() and waitForPort() for background process synchronization
State backed to R2: approximately 2 seconds to restore vs approximately 30 seconds for a fresh clone and npm install
Concurrent limits: 15,000 lite / 6,000 basic / 1,000+ larger instances
Billing only for active CPU cycles; idle time is free

Network-layer credential injection via programmable egress proxies keeps credentials off the sandbox filesystem. The @cloudflare/sandbox SDK (v0.8.9+) handles lifecycle management.

Cloudflare Mesh: Zero-Trust Agent Networking

Agents need to communicate with backend services and each other without exposing internal endpoints to the public internet. Mesh provides bidirectional, many-to-many private networking routed through Cloudflare’s 330+ city edge network.

A single wrangler.jsonc entry is all the configuration required:

{
  "vpc_networks": [{ "binding": "MESH", "network_id": "cf1:network" }]
}

// Access a private backend by IP
const data = await env.MESH.fetch("http://10.0.1.50/api/data");

Mesh inherits Cloudflare One controls: Gateway policies, device posture checks, DLP, and CASB. The free tier covers 50 nodes and 50 users per account. Hostname routing, replacing IPs with names like postgres-staging.mesh, is on the roadmap for Summer 2026.

Project Think: Durable Execution and the Execution Ladder

Project Think is Cloudflare’s opinionated agent framework built on Durable Objects. Each agent gets persistent identity, SQLite storage, and hibernation at zero cost when idle.

The execution ladder defines five tiers of capability, escalating from lightweight to full Linux:

Workspace - read/write filesystem access
Dynamic Worker - sandboxed JavaScript execution
npm - runtime package installation
Headless browser - web automation
Cloudflare Sandbox - full OS access

Code Mode replaces JSON tool-call loops with direct code execution, reducing token consumption from approximately 1.17 million tokens to approximately 1,000 tokens for equivalent multi-tool tasks. That is a 99.9% reduction in tokens for the same work.

Sub-agents use Facets: child Durable Objects colocated with the parent, each with isolated SQLite and typed RPC. Status: Experimental preview, already powering thousands of production agents.

Kubernetes-Native Agent Infrastructure

Kubernetes has assembled a native answer to agent workloads across multiple SIGs and vendor contributions: a purpose-built CRD (Agent Sandbox), runtime isolation options (gVisor, Kata Containers), a policy-enforced developer runtime (OpenShell), GPU resource management (DRA, KAI Scheduler), and model-aware network routing (Gateway API Inference Extension).

What Is the Kubernetes Agent Sandbox CRD?

The kubernetes-sigs/agent-sandbox project introduces a Sandbox CRD under SIG Apps, purpose-built for singleton, stateful, long-running agent workloads. It addresses the mismatch between agent access patterns and what Deployments or StatefulSets provide:

apiVersion: agent-sandbox.sigs.k8s.io/v1alpha1
kind: Sandbox
metadata:
  name: coding-agent
spec:
  runtimeClassName: gvisor  # or kata
  resources:
    requests:
      cpu: "500m"
      memory: "512Mi"

Key capabilities:

Suspend/resume lifecycle with persistent state and stable per-sandbox hostname
Native gVisor and Kata Containers runtimeClassName support for strong isolation
SandboxWarmPool: pre-provisioned pod pool with sub-second handover via SandboxClaim and SandboxTemplate
Python SDK: pip install k8s-agent-sandbox

Status: Active development under SIG Apps, no stable release version yet.

gVisor and Kata Containers: Runtime Isolation Options

The runtimeClassName field selects the isolation model:

gVisor intercepts system calls in user space using a Go-implemented kernel called Sentry. It adds approximately 250m CPU and 100Mi memory overhead per sandbox. The trade-off is strong isolation without a hardware VM boundary, running anywhere containers run.

Kata Containers runs each sandbox in a lightweight VM using QEMU or Cloud Hypervisor with a minimal guest kernel. VM boot adds approximately 125ms (this figure comes from secondary sources; the Kata Containers project does not publish a canonical cold-start benchmark - treat it as approximate). The Kata Containers project confirmed integration with the Agent Sandbox CRD in late 2025, bringing hardware VM isolation to the Sandbox CRD ecosystem.

The Kata Containers isolation boundary is the strongest of any approach covered here: a hardware VM separates the agent from the host kernel entirely.

NVIDIA OpenShell: Policy-Enforced Agent Runtime

OpenShell packages K3s inside a single Docker container and enforces four policy layers through declarative YAML, rather than isolating at the kernel or VM level:

Filesystem: restrict reads and writes to allowed paths
Network: HTTP method and path-level egress control
Process: block privilege escalation and dangerous syscalls
Inference: reroute model API calls to controlled backends

# policy.yaml - network policy example
network:
  - name: allow-github-read
    host: api.github.com
    methods: [GET]
    paths: ["/repos/*"]
  - name: block-all-post
    host: "*"
    methods: [POST]
    action: deny

Filesystem and process policies lock at sandbox creation. Network and inference policies support hot-reload without restarting the sandbox. Credentials are injected as environment variables at runtime and never written to the sandbox filesystem. OpenShell auto-discovers credentials for Claude Code, Codex, OpenCode, and GitHub Copilot.

GPU passthrough works via Container Device Interface (CDI) with fallback to --gpus all. License: Apache 2.0. Status: Alpha, single-developer mode only, with multi-tenant support planned.

DRA and KAI Scheduler: Fine-Grained GPU Allocation

At KubeCon Europe 2026, NVIDIA donated the GPU Dynamic Resource Allocation (DRA) driver to CNCF. DRA enables workloads to request GPU resources by attribute (memory capacity, MIG profile) rather than a simple device count.

MIG partitioning on an H100 becomes declarative: request a 1g.10gb MIG slice to serve multiple inference backends on a single physical GPU. The KAI Scheduler, now a CNCF Sandbox project, adds fractional GPU allocation, hierarchical team quotas, and gang scheduling for distributed training jobs.

Collaborators on the DRA driver include AWS, Broadcom, Canonical, Google Cloud, Microsoft, Nutanix, Red Hat, and SUSE.

Gateway API Inference Extension: Model-Aware Routing

The Gateway API Inference Extension (GIE) adds inference-specific routing to the Kubernetes Gateway API. Its Endpoint Picker (EPP) routes requests based on live backend metrics: queue depth, GPU memory availability, and loaded LoRA adapters.

The v1 GA CRDs are InferencePool and InferenceObjective under the API group inference.networking.k8s.io/v1. Supported gateways: Envoy Gateway, kgateway, GKE Gateway, and Istio 1.28+. Version 1.5.0, released April 19, 2026, added a Pluggable Parser Framework, SLO-based deadline ordering, and pool-wide saturation metrics. The EPP is migrating into llm-d-inference-scheduler beginning April 2026.

The Kubernetes AI Conformance Program formalizes these building blocks: DRA, Kueue all-or-nothing scheduling, custom-metric autoscaling, and standardized accelerator observability are required for GKE and AKS certification.

Architecture Comparison: Isolation Models Head-to-Head

The four isolation approaches differ at the layer where the security boundary sits:

graph TD
    subgraph CF["Cloudflare Dynamic Worker"]
        CFH[Host OS] --> CFV[V8 Engine Process]
        CFV --> CFI1[Isolate A]
        CFV --> CFI2[Isolate B]
    end

    subgraph GV["gVisor"]
        GVH[Host Kernel] --> GVS[Sentry: user-space kernel]
        GVS --> GVC1[Container A]
        GVS --> GVC2[Container B]
    end

    subgraph KAT["Kata Containers"]
        KATH[Host OS] --> KATV[QEMU / Cloud Hypervisor]
        KATV --> KATG[Guest Kernel]
        KATG --> KATC[Container]
    end

    subgraph OS["NVIDIA OpenShell"]
        OSH[Docker] --> OSK[K3s cluster]
        OSK --> OSP[Policy Engine]
        OSP --> OSC[Sandbox container]
    end

Isolation boundaries from lightest to heaviest: V8 process-level memory isolation (Cloudflare), user-space kernel syscall interception (gVisor), hardware VM boundary (Kata), and policy-enforced container (OpenShell).

Comparative metrics across approaches:

	Dynamic Workers	gVisor	Kata Containers	OpenShell
Cold start	~2ms	Pod startup	~125ms VM boot	Container launch
Memory overhead	~2 MB per isolate	+100Mi per pod	<5 MiB per VM	K3s baseline
Isolation boundary	V8 process	User-space kernel	Hardware VM	YAML policy
GPU access	None (via Workers AI)	Passthrough	Passthrough	CDI / experimental
Multi-tenant	Built-in	Needs NetworkPolicy	Needs NetworkPolicy	Single-player (alpha)

When Should You Use Cloudflare vs Kubernetes for AI Agents?

graph TD
    Start([What does your agent need?]) --> Q1{GPU access required?}
    Q1 -->|Yes| K8S_GPU["Kubernetes: DRA and KAI Scheduler\nMIG partitioning on H100s"]
    Q1 -->|No| Q2{"Persistent filesystem\nor background processes?"}
    Q2 -->|"Yes, cloud-hosted"| CF_SB["Cloudflare Sandboxes\n15k concurrent, R2 state, active-CPU billing"]
    Q2 -->|"Yes, self-hosted K8s"| K8S_SB["K8s Agent Sandbox CRD\ngVisor or Kata runtimeClassName"]
    Q2 -->|No| Q3{"Stateless, high-density,\nlow-latency?"}
    Q3 -->|Yes| CF_DW["Cloudflare Dynamic Workers\nfew ms cold start, zero ambient authority"]
    Q3 -->|No| Q4{"Local dev with\npolicy control?"}
    Q4 -->|Yes| NVOS["NVIDIA OpenShell\nDeclarative YAML, credential auto-discovery"]
    Q4 -->|"Enterprise, multi-tenant"| K8S_KATA["K8s with Kata Containers\nHardware VM isolation, audit-ready"]

GPU access is the first decision branch because it is a hard requirement Cloudflare Workers cannot satisfy. Filesystem persistence and compute location drive subsequent branches.

Cloudflare: Lightweight Stateless Agents at the Edge

Dynamic Workers are the right choice for stateless tool execution, lightweight orchestration, and high-density deployments where agent logic is predominantly function calls with minimal local state. Single-digit millisecond cold starts matter when routing thousands of user requests to fresh agent instances with no warm pool to manage.

Project Think extends this to stateful agents via Durable Objects, but the stateful story works best when persistence is conversational (message history, session metadata) rather than requiring a real filesystem.

Kubernetes: GPU-Heavy Inference and Multi-Tenant Enterprise

Any workload touching GPUs belongs on Kubernetes. NVIDIA DRA and KAI Scheduler handle fractional allocation, MIG partitioning, and gang scheduling for distributed training. GIE routes to inference backends based on live queue depth and GPU memory. No equivalent exists at the infrastructure level in the Cloudflare stack: Workers AI routes through Cloudflare’s managed inference layer, not dedicated hardware.

Kata Containers plus the Agent Sandbox CRD is also the answer for compliance-heavy multi-tenant deployments where a hardware VM boundary is a hard requirement and full audit trails are mandatory.

Hybrid: Edge Agents with a K8s Inference Backend

The most capable production architecture combines both stacks:

graph TD
    User[User Request] --> CF_DW["Cloudflare Dynamic Worker\nAgent orchestration and tool calls"]
    CF_DW <-->|Private networking| MESH["Cloudflare Mesh\nZero-trust bridge"]
    MESH <-->|Secure channel| K8S[Kubernetes Cluster]

    subgraph K8S [Kubernetes Cluster]
        GIE["GIE Endpoint Picker\nQueue depth and GPU memory routing"]
        IP["InferencePool\nEnvoy Gateway"]
        VLLM["vLLM backends\nDRA-managed H100 slices"]
        AGSB["Agent Sandbox CRD\nStateful agents with gVisor or Kata"]
        GIE --> IP
        IP --> VLLM
    end

Cloudflare handles lightweight edge orchestration and tool routing. Kubernetes handles GPU inference and stateful agent workloads. Cloudflare Mesh bridges the two layers over private networking without exposing internal endpoints.

Cloudflare Dynamic Workers handle lightweight coordination. Project Think’s Durable Objects manage conversational state. Mesh bridges to the Kubernetes cluster over private networking. On the Kubernetes side, GIE routes inference requests to vLLM or llm-d backends based on real-time health, while Agent Sandbox handles workloads requiring persistent filesystems and background processes.

NVIDIA OpenShell: Developer-Local Sandboxing

OpenShell occupies a niche neither stack fully covers: developer-local agent sandboxing with declarative policy control. Running inside a single Docker container, it is the fastest path to a policy-enforced agent environment without a Kubernetes cluster. The YAML policy model is accessible to developers who are not cluster administrators, and automatic credential discovery for Claude Code, Codex, and similar tools reduces setup time significantly.

The alpha status and single-player limitation make OpenShell unsuitable for production multi-tenant deployments today.

Frequently Asked Questions

Can Cloudflare Workers access GPUs for AI inference?

No. Cloudflare Workers run on V8 isolates without direct GPU access. Inference routes through the Cloudflare AI Platform, which provides access to 14+ providers via a unified API. For workloads requiring dedicated GPU access, such as fine-tuning or high-throughput inference on specific hardware, Kubernetes with NVIDIA DRA and KAI Scheduler provides fine-grained allocation including MIG partitioning on H100s.

What is the Kubernetes Agent Sandbox and how does it differ from regular pods?

Agent Sandbox is a Kubernetes SIG Apps project providing a Sandbox CRD optimized for singleton, stateful, long-running agent workloads. Unlike regular pods or StatefulSets, it supports a suspend/resume lifecycle with state preservation, native gVisor and Kata Containers runtimeClassName support, stable per-sandbox hostnames, and SandboxWarmPool for sub-second cold starts via SandboxClaim and SandboxTemplate. Regular pods are designed for steady-state multi-user traffic, not the idle-heavy, bursty pattern of autonomous agents.

How does NVIDIA OpenShell compare to Cloudflare Sandboxes for running coding agents?

OpenShell runs locally in a Docker container using K3s, with declarative YAML policy enforcement across four layers: filesystem, network, process, and inference. It is designed for single-developer use (alpha) with automatic credential discovery for Claude Code, Codex, and similar tools. Cloudflare Sandboxes are cloud-hosted persistent Linux environments scaling to 15,000 concurrent instances with R2-backed state restoration in approximately 2 seconds, billing only for active CPU. Choose OpenShell for local policy-controlled development; choose Cloudflare Sandboxes for cloud-hosted production agents at scale.

Is Cloudflare’s agent infrastructure vendor-locked?

Yes, significantly. Cloudflare’s agent stack uses proprietary APIs: Durable Objects for persistent state, Dynamic Worker bindings for sandboxing, and Mesh VPC bindings for networking. Code written for Project Think’s fiber-based execution and DO SQLite storage does not port directly to Kubernetes. Standard protocols (MCP, OpenAI API format, WebSockets) are supported at integration boundaries. A hybrid architecture using Cloudflare for edge orchestration and Kubernetes with Agent Sandbox CRD and GIE for backend inference provides the most migration flexibility.

What are the cold start times for each agent sandboxing approach?

Cold starts by approach: Cloudflare Dynamic Workers start in a few milliseconds (V8 isolate). Cloudflare Sandboxes restore from R2 state in approximately 2 seconds; fresh initialization takes around 30 seconds (clone and npm install). Kubernetes Agent Sandbox with SandboxWarmPool hands over a pre-warmed pod in under 1 second. Kata Containers adds approximately 125ms for VM boot (approximate; varies by hypervisor and kernel version). gVisor adds 250m CPU and 100Mi memory overhead at pod startup, with actual startup time depending on image size and pod spec.