A2A v1.0 for Platform Engineers on Kubernetes

Q: How do I trace agent-to-agent calls across services?

Use OpenTelemetry with W3C Trace Context propagation. Auto-instrument with HTTPXClientInstrumentor and FastAPIInstrumentor (Python), set OTEL_EXPORTER_OTLP_ENDPOINT to your collector, and traces will show the full agent call chain including LLM inference and tool execution spans.

Q: How does A2A relate to MCP? Do I need both?

MCP handles agent-to-tool communication (APIs, databases, tools). A2A handles agent-to-agent communication (task delegation to peer agents). Most production deployments use both. Agent Gateway handles both protocol types through a single data plane with consistent policy and observability.

A2A (Agent-to-Agent Protocol) is the open standard for structured communication between AI agents: a specification that governs how agents discover peer services, delegate tasks through a defined lifecycle, and stream results over HTTPS.

A2A protocol just hit v1.0 under Linux Foundation governance, with production deployments running in Azure AI Foundry, Amazon Bedrock AgentCore Runtime, and Copilot Studio. Your developers are going to start sending agent-to-agent traffic whether your platform is ready or not.

This post covers the infrastructure questions, not the framework questions. How do you expose agent endpoints on Kubernetes? How do you secure agent-to-agent calls? How do you trace a task across three agents and a tool server? Which deployment model should you choose?

If you need the foundational A2A/MCP distinction or a framework-level walkthrough, our Microsoft Agent Framework 1.0 post covers that ground. This post assumes you know what A2A does and focuses on how platform engineers handle it at the infrastructure layer.

What Does A2A v1.0 Actually Specify?

A2A is an open protocol for agent-to-agent communication built on HTTPS. Agents communicate via JSON-RPC 2.0, gRPC, or HTTP+JSON. Every interaction is a Task with a defined lifecycle: SUBMITTED > WORKING > COMPLETED (or FAILED, CANCELED, REJECTED). Two interrupted states exist for human-in-the-loop scenarios: INPUT_REQUIRED and AUTH_REQUIRED.

The protocol stack for a platform engineer looks like this:

graph TD
    A["Application Layer\nAgent Skills · Task Lifecycle · contextId grouping"]
    B["Protocol Layer\nJSON-RPC 2.0 / gRPC / HTTP+JSON"]
    C["Security Layer\nSigned Agent Cards · OAuth2 / mTLS · PKCE / Device Code"]
    D["Transport Layer\nHTTPS / TLS 1.3+"]
    E["Infrastructure Layer\nKubernetes Service · kube-dns · Gateway API"]

    A --> B --> C --> D --> E

The A2A protocol stack on Kubernetes. Each layer maps to existing infrastructure you already operate.

What Changed in A2A v1.0?

Several v1.0 changes have direct infrastructure implications.

Multi-tenancy: A tenant field now appears on all request messages, enabling a single agent deployment to serve multiple tenants without separate Kubernetes Services per tenant. This changes your routing model when serving more than one team from a shared agent platform.

Signed Agent Cards: Agent Cards now support JWS signatures (RFC 7515) over JSON-canonicalized content (RFC 8785). When agents cross trust boundaries - different teams, clusters, or external providers - cryptographic verification replaces network-level trust alone.

gRPC as first-class: All three bindings (JSON-RPC, gRPC, HTTP+JSON) are now fully specified with equivalence guarantees. A single agent can serve multiple bindings on different endpoints simultaneously. Your Gateway API config needs routes for both HTTP and gRPC traffic types when exposing an agent that supports both.

Breaking changes to note: Enum values changed from lowercase to SCREAMING_SNAKE_CASE (for example, TASK_STATE_COMPLETED). The Part type is now unified - separate TextPart, FilePart, and DataPart are gone, replaced by a single Part message using member-based discrimination. If you have any internal tooling parsing A2A responses, audit it before deploying against v1.0 agents.

Task timestamps: createdAt and lastModified (ISO 8601 UTC milliseconds) are now on all Task objects. This is what makes task latency measurable as an SLI. More on this in the observability section.

What Is the Adoption Status?

v1.0 shipped with Linux Foundation governance, 150+ participating organizations (Google, Microsoft, AWS, Cisco, IBM, Salesforce, SAP, ServiceNow), and five production-ready SDKs in Python, JavaScript, Java, Go, and .NET. The core repository has over 22,000 GitHub stars. Production deployments are live in Azure AI Foundry, Copilot Studio, and Amazon Bedrock AgentCore Runtime.

This is not a draft spec you can wait on. It is already running in your cloud providers’ managed services.

How Does Agent Discovery Work on Kubernetes?

Every A2A agent publishes a JSON document at a well-known URI that describes its identity, capabilities, skills, authentication requirements, and service endpoints. This is the Agent Card, served at /.well-known/agent-card.json per RFC 8615.

Agent Cards are the DNS-SD of agentic systems. A client agent resolves the server agent’s domain via kube-dns, fetches the Agent Card via HTTP GET, and knows exactly what the server supports and how to authenticate before sending a single task. The discovery flow on Kubernetes looks like this:

sequenceDiagram
    participant CA as Client Agent
    participant DNS as kube-dns
    participant SA as Server Agent Service
    participant EP as Agent Endpoint

    CA->>DNS: Resolve server-agent.namespace.svc.cluster.local
    DNS-->>CA: Cluster IP
    CA->>SA: GET /.well-known/agent-card.json
    SA-->>CA: Agent Card JSON (capabilities, skills, auth, interfaces)
    CA->>SA: Authenticate (mTLS cert handshake or OAuth2 token)
    CA->>EP: tasks/send (JSON-RPC) - task SUBMITTED
    EP-->>CA: Task ID + state: WORKING
    EP-->>CA: SSE stream (state updates)
    EP-->>CA: state: COMPLETED + result

Agent discovery and task execution flow on Kubernetes. kube-dns handles service resolution; the Agent Card provides everything the client needs to call and authenticate.

How Do Agent Cards Map to Kubernetes Primitives?

The Kubernetes architecture for an A2A agent follows the standard microservice pattern with one addition:

Deployment - runs the agent container, typically with multiple replicas
Service - exposes the agent via a stable DNS name
ConfigMap or Secret - stores Agent Card JSON and signing key material
Ingress or Gateway API HTTPRoute - exposes the /.well-known/agent-card.json endpoint externally for cross-cluster or cross-tenant discovery

The v1.0 supportedInterfaces[] array in the Agent Card lets you declare multiple protocol bindings pointing at different ports on the same agent:

from a2a.types import AgentCard, AgentCapabilities, AgentInterface, AgentSkill

agent_card = AgentCard(
    name='Expense Processor Agent',
    description='Processes and validates expense reports',
    version='1.0.0',
    default_input_modes=['text'],
    default_output_modes=['text'],
    capabilities=AgentCapabilities(
        streaming=True,
        extended_agent_card=True,
        push_notifications=True,
    ),
    supported_interfaces=[
        AgentInterface(
            protocol_binding='JSONRPC',
            url='https://expense-agent.default.svc.cluster.local:8080',
        ),
        AgentInterface(
            protocol_binding='GRPC',
            url='https://expense-agent.default.svc.cluster.local:50051',
        ),
    ],
    skills=[
        AgentSkill(
            id='validate-expense',
            name='Validate Expense Report',
            description='Validates expense report against company policy',
            input_modes=['text', 'application/json'],
            output_modes=['text', 'application/json'],
            tags=['finance', 'compliance'],
        )
    ],
)

One agent, two endpoints. Your Gateway API config needs HTTPRoutes and GRPCRoutes for separate ports. The Agent Card tells clients which binding to use.

What Are Extended Agent Cards?

A2A v1.0 supports a second, authenticated Agent Card endpoint for sensitive details. The public card at /.well-known/agent-card.json is available without authentication. The extended card - containing additional skill details, internal endpoints, or pricing information - is served at a separate authenticated URL declared in the base card.

For multi-tenant deployments, extended cards let you expose different capability profiles to different callers based on authentication context. Your mTLS peer certificate or OAuth2 token determines which view of the agent’s capabilities a caller receives.

How Do You Secure Agent-to-Agent Traffic on Kubernetes?

A2A has a layered security model. Transport security, identity verification, and authorization are distinct layers that platform teams implement independently.

Transport: HTTPS required in production. TLS 1.3+ recommended. The spec notes PQC cipher suites as a future direction for long-lived deployments with data sensitivity requirements.

Identity verification: Signed Agent Cards provide cryptographic proof that a card has not been tampered with and originates from the claimed provider. Required when agents cross trust boundaries.

Authentication schemes declared in Agent Card securitySchemes: apiKey, http Bearer, oauth2 (authorization code, client credentials, device code flows per RFC 8628), openIdConnect, and mtls. v1.0 removed deprecated implicit and password OAuth flows. PKCE support added for public clients.

Agent-to-agent authorization: mTLS with X.509 certificates. Each agent holds a certificate issued by a trusted CA. Both agents present certificates during the TLS handshake. Certificate revocation provides immediate access termination without redeploying anything.

You have three viable approaches to implementing this on Kubernetes:

graph LR
    subgraph MeshModel ["Model A: Service Mesh"]
        A1[Agent Pod] --> A2["ztunnel\n(per-node, Rust)"]
        A2 -->|"SPIFFE mTLS"| A3[Agent Pod]
    end

    subgraph DaprModel ["Model B: Dapr Sidecars"]
        B1[Agent Pod] -->|"localhost:3502"| B2["Dapr Sidecar"]
        B2 -->|"Sentry mTLS"| B3["Dapr Sidecar"]
        B3 --> B4[Agent Pod]
    end

    subgraph GWModel ["Model C: Agent Gateway"]
        C1[Agent Pod] --> C2["Agent Gateway\n(centralized, Rust)"]
        C2 -->|"CEL policy + JWT + mTLS"| C3[Agent Pod]
    end

Three Kubernetes deployment models for A2A security. Each handles mTLS differently. Choose based on what your platform already operates.

Model A: Service Mesh

If you already run Istio or Linkerd, extending mTLS coverage to agent namespaces costs little. Kagenti (a Red Hat incubation project) demonstrates this with Istio Ambient mesh: per-node ztunnel proxies written in Rust handle L4 encryption without per-pod sidecars. This matters for agent workloads because sidecar proxies consume memory that competes directly with LLM inference processes on the same node.

SPIFFE identities are injected automatically via a spiffe-helper sidecar on each agent pod. AuthorizationPolicy CRDs enforce namespace-level access control:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: peer-agents
  namespace: beeai-team
spec:
  rules:
  - from:
    - source:
        namespaces: [beeai-team]

This allows any agent in beeai-team to call any other agent in the same namespace and blocks everything else at L4. TCP connections from unauthorized namespaces are reset before they reach the agent process.

Kagenti caveat: This is an incubation project with alpha-stage limitations in early releases: no dynamic agent discovery (peer URLs hardcoded as environment variables), CRD updates not reconciling Deployments, and no built-in workflow engine. Later alpha versions (v0.2.0-alpha.21+) added an AgentCard CRD and Kubernetes-native discovery. Kagenti shows where the ecosystem is heading; it is not production-ready infrastructure today.

Model B: Dapr Sidecars

Dapr provides automatic mTLS via its Sentry service with SPIFFE IDs, certificate rotation through managed trust domains, and encrypted sidecar-to-sidecar communication without agent code changes. Agents call their local Dapr sidecar instead of calling peer agents directly:

# Direct call (no mTLS, no retry, no tracing):
curl https://server-agent:8080/a2a

# Through local Dapr sidecar (mTLS, retries, circuit breaker, tracing):
curl http://localhost:3502/v1.0/invoke/server-agent/method/a2a

Dapr’s Access Control Lists enforce default-deny with explicit per-endpoint allowlists. Resiliency policies (retries, timeouts, circuit breakers) are declarative YAML applied to the sidecar rather than the agent code. The sidecar also emits OpenTelemetry spans for every interaction automatically.

The tradeoff: Dapr adds an abstraction layer between agents. If you are already running Dapr for other microservices, extending it to agents is natural. If you are not already running Dapr, evaluate whether the operational overhead of Sentry and the sidecar lifecycle is worth it relative to a service mesh you may already have.

For a detailed Dapr agent setup, see our Dapr agents guide.

Model C: Agent Gateway

The Linux Foundation’s agentgateway is a Rust-based data plane built specifically for AI agent traffic. Donated by Solo.io, with contributors from Microsoft, Apple, AWS, Cisco, and Salesforce, it implements the Kubernetes Gateway API (HTTPRoute, GRPCRoute, TCPRoute, TLSRoute) with protocol-aware A2A and MCP handling.

Install and run:

# Install agentgateway binary
curl https://raw.githubusercontent.com/agentgateway/agentgateway/refs/heads/main/common/scripts/get-agentgateway | bash

# Download example config
curl -sL https://raw.githubusercontent.com/agentgateway/agentgateway/main/examples/basic/config.yaml -o config.yaml

# Run
agentgateway -f config.yaml

Agent Gateway provides JWT authentication, CEL-based access policies, rate limiting, external authorization hooks, and built-in OpenTelemetry integration. It also handles MCP traffic - making it the single data plane for both protocol types, covered further in the MCP section below.

Version note: Confirm Agent Gateway’s A2A v1.0 support status before deploying against v1.0 agents. The gateway is under active development; check the project’s release notes at the time of deployment.

How Do SPIFFE Identity and Network Policies Work?

Regardless of which model you choose, two baseline practices apply at the platform level.

SPIFFE identity: Assign a SPIFFE Verifiable Identity Document to each agent workload. This gives every agent a cryptographically verifiable identity tied to its Kubernetes ServiceAccount, independent of IP address (which changes across pod restarts and node evictions). All three models above use SPIFFE under the hood; the question is which CA manages certificate issuance.

Network policies: Apply default-deny egress NetworkPolicies per agent namespace. Whitelist only required endpoints: peer agent Services, LLM API endpoints, and tool servers. An agent that can communicate only with its declared dependencies has a dramatically smaller blast radius when compromised. Our Agent-Ready Kubernetes Platform guide covers these network policy patterns in detail.

How Do You Observe Agent-to-Agent Communication?

Standard HTTP monitoring misses the operational concepts that matter for agents. HTTP dashboards show request rates and latency. Agent observability needs task state transitions, skill invocation counts, agent roles (orchestrator vs. worker), and completion rates broken down by skill.

How Does OpenTelemetry Enable Cross-Agent Tracing?

A2A calls are HTTP-based, so W3C Trace Context (traceparent header) propagates trace IDs across agent-to-agent boundaries automatically. When Agent A sends a task to Agent B, the trace follows the full lifecycle across every agent in the chain:

sequenceDiagram
    participant U as User Request
    participant O as Orchestrator Agent
    participant A as Specialist Agent A
    participant B as Specialist Agent B
    participant T as MCP Tool Server

    U->>O: Request (trace-id generated, root span)
    O->>A: tasks/send [inject traceparent header]<br/>Child span: task delegation
    A->>B: tasks/send [inject traceparent header]<br/>Grandchild span: sub-task
    B->>T: tool/call [inject traceparent header]<br/>Tool execution span
    T-->>B: Tool result
    B-->>A: state: COMPLETED
    A-->>O: state: COMPLETED
    O-->>U: Final response

W3C Trace Context propagation across a multi-agent chain. Every agent boundary produces a child span. The full reasoning chain - LLM inference and tool calls included - is visible in a single trace.

A real trace hierarchy from Red Hat’s multi-agent implementation shows what this looks like in practice:

request-manager: POST /api/v1/requests/generic [3.75s]
  agent-service: POST /api/v1/events/cloudevents [3.64s]
    llamastack: /v1/openai/v1/responses [3.54s]
      InferenceRouter.openai_chat_completion [88.26ms]
      InferenceRouter.stream_tokens_openai_chat [1.88s]
    snow-mcp-server: mcp.tool.open_laptop_refresh_ticket [11.93ms]

Three agents, one LLM inference call, one MCP tool invocation, and one trace ID ties it all together.

How Do You Set Up Auto-Instrumentation?

Python agents instrument with two lines:

from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

HTTPXClientInstrumentor().instrument()
FastAPIInstrumentor.instrument_app(app)

Configure via environment variables in your Deployment spec:

env:
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://otel-collector.observability.svc.cluster.local:4317"
  - name: OTEL_SERVICE_NAME
    value: "expense-agent"

MCP servers may need manual span creation when they use protocols that standard auto-instrumentation does not cover. In those cases, extract the traceparent from incoming HTTP headers and create spans explicitly with the extracted parent context before making outbound calls.

Which Metrics Matter for A2A Observability?

The A2A task lifecycle gives you natural SLIs. The v1.0 createdAt and lastModified timestamps on Task objects make these measurable without custom instrumentation:

Task latency by state: SUBMITTED-to-COMPLETED duration per skill. This is your agent SLO metric. Define it before you have incidents.
Task failure rate by skill: FAILED tasks per skill per time window. An agent failing 20% of validate-expense tasks but 0% of summarize-document tasks has a skill-specific problem.
Agent availability: Successful Agent Card fetches vs. total attempts. Agents that cannot serve their card are not discoverable by peers.
Stuck tasks: Tasks in WORKING state beyond your P99 latency threshold. These are your hung processes. Alert on them.

Dapr and Agent Gateway both emit infrastructure-level metrics that complement application-level OTel traces. Run both: OTel for task and skill granularity, infrastructure metrics for connection and throughput.

How Does A2A Relate to MCP?

A2A and MCP serve different roles in the same system. MCP handles agent-to-tool communication: an agent discovers and invokes external tools, APIs, and data sources. A2A handles agent-to-agent communication: an agent discovers and delegates tasks to peer agents. The traffic patterns differ, and so do the routing requirements.

flowchart TD
    AG["Central Agent"]

    AG -->|A2A outbound| ADP["Agent Discovery\n/.well-known/agent-card.json"]
    ADP --> AT["Task delegation\ntasks/send - bidirectional, long-running"]
    AT --> AA["Peer Agent"]

    AG -->|MCP outbound| MDP["Tool Discovery\ntools/list"]
    MDP --> MT["Function invocation\ntools/call - request/response"]
    MT --> MS["MCP Tool Server"]

    AA --> GW["Agent Gateway or Service Mesh\nHandles both protocol types"]
    MS --> GW

A single agent generates two types of outbound traffic: A2A to peer agents and MCP to tool servers. Both need routing, security, and observability policies.

In practice, a production agent deployment generates both traffic types simultaneously. Your expense processing agent calls an A2A peer to validate against compliance rules and calls an MCP tool server to look up vendor records. They happen in the same request.

Agent Gateway handles both protocol types through a single data plane. Rather than maintaining separate routing and policy infrastructure for A2A and MCP traffic, one gateway applies consistent CEL policies, JWT authentication, rate limiting, and OpenTelemetry instrumentation to both. This is operationally significant when multiple agent teams are creating both A2A and MCP endpoints across your cluster.

What Should You Set Up Today, and What Can Wait?

Set Up Today

These patterns work with infrastructure you already operate.

Agent Card endpoints. Add /.well-known/agent-card.json to any existing Kubernetes Service that will become an A2A agent. This is an HTTP endpoint your agent serves - no new infrastructure required.

mTLS via your existing service mesh. If you run Istio or Linkerd, extend PeerAuthentication and AuthorizationPolicy to agent namespaces. Agents get mTLS automatically with zero code changes.

OpenTelemetry tracing. Two instrumentation lines per Python agent, one OTEL_EXPORTER_OTLP_ENDPOINT environment variable, and agent task chains appear in your existing Jaeger or Grafana Tempo dashboard. W3C Trace Context handles cross-agent propagation automatically.

NetworkPolicies per agent namespace. Default-deny egress with explicit allowlists. This costs nothing and limits blast radius when an agent misbehaves.

Task latency as an SLI. With v1.0 task timestamps available, define your latency objective now. SUBMITTED-to-COMPLETED duration for your most critical agent skill is the metric to instrument first.

Evaluate Soon

Agent Gateway becomes worth evaluating when you have multiple teams building agents with both A2A and MCP endpoints. The single data plane reduces the number of policy systems you operate. Confirm v1.0 support before deploying.

Kagent (kagent.dev) provides an Agent CRD for deploying agents as Kubernetes-native workloads. Every agent created with Kagent implements A2A automatically, with the endpoint exposed on port 8083 of the kagent controller service and discovery at /api/a2a/{namespace}/{agent-name}/.well-known/agent.json. Skills are defined declaratively in the CRD spec:

apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: expense-processor
  namespace: finance
spec:
  description: "Processes and validates expense reports"
  a2aConfig:
    skills:
      - id: validate-expense
        name: Validate Expense Report
        description: Validates expense report against company policy

This is the Kubernetes-native path for teams building greenfield agent infrastructure who want agents to behave like first-class cluster workloads.

Dapr Agents for teams already running Dapr. Extending Sentry mTLS and Dapr’s observability stack to agent workloads is natural if the sidecar pattern is already established across your microservices.

Watch This Space

Dynamic agent registries: Both the A2A spec and tooling (Kagenti’s later alphas, Agent Gateway’s roadmap) are moving toward dynamic discovery with agent registries rather than static well-known URIs. This replaces hardcoded peer URLs with a registry lookup that stays current as agents scale up, down, and across clusters.

Agent Payments Protocol (AP2): Extends A2A with payment primitives so agents can transact on behalf of users. Infrastructure implications for platform teams are not yet defined, but this will add a new traffic type to manage.

Post-quantum cipher suites: The A2A spec explicitly calls out PQC cipher suites as a future direction for TLS. Start tracking this if your organization has long-term data sensitivity requirements; it will affect your TLS termination configuration before long.

Frequently Asked Questions

How is A2A different from just using REST APIs between agents?

A2A adds a standardized task lifecycle (SUBMITTED, WORKING, COMPLETED, FAILED), Agent Cards for capability discovery, built-in streaming via server-sent events, and push notifications via webhooks. REST gives you request-response. A2A gives you long-running task management with state tracking, multi-turn conversations via contextId, and a machine-readable way for agents to advertise what they can do. Platform teams get consistent observability across all agent communication instead of bespoke API contracts per agent pair.

Do I need a service mesh to run A2A on Kubernetes?

No. A2A runs over standard HTTPS, so any Kubernetes cluster with TLS-enabled Services can handle it. A service mesh (Istio, Linkerd) adds automatic mTLS, traffic policies, and observability without code changes. Alternatives include Dapr sidecars (automatic mTLS with Sentry and SPIFFE, plus resiliency policies) or Agent Gateway (protocol-aware routing for both A2A and MCP). For production environments with multiple agent teams, one of these three approaches is strongly recommended over raw HTTPS without a trust infrastructure.

How do I trace agent-to-agent calls across services?

Use OpenTelemetry with W3C Trace Context propagation. A2A calls are HTTP-based, so the standard traceparent header carries trace IDs across agent boundaries automatically. In Python, HTTPXClientInstrumentor().instrument() captures outbound call traces and FastAPIInstrumentor.instrument_app(app) captures inbound. Set OTEL_EXPORTER_OTLP_ENDPOINT to your collector and full agent call chains - including LLM inference and tool execution spans - appear in your existing tracing backend. Dapr and Agent Gateway both add infrastructure-level traces on top of application instrumentation.

What is a Signed Agent Card and do I need one?

A Signed Agent Card uses JWS (RFC 7515) with a detached signature over JSON-canonicalized content (RFC 8785) to cryptographically prove that the card has not been tampered with and originates from the claimed provider. Use it when agents cross trust boundaries: different teams, different clusters, or external partners. For agents communicating entirely within a single cluster under service mesh mTLS, network-level trust may be sufficient without card signatures. At cloud service boundaries - calling Azure AI Foundry or Amazon Bedrock AgentCore - expect signed cards on the other end.

How does A2A relate to MCP? Do I need both?

MCP handles agent-to-tool communication: discovering and invoking APIs, databases, and tools. A2A handles agent-to-agent communication: discovering and delegating tasks to peer agents. Most production agent deployments generate both traffic types simultaneously. A single agent will have MCP outbound calls to tool servers and A2A calls to collaborating agents within the same request. Agent Gateway from the Linux Foundation handles both protocol types through a single data plane, which reduces the number of security policy and observability systems you need to operate.