Securing AI/ML Supply Chains on Kubernetes

Q: Would Pod Security Standards have prevented the TeamPCP Kubernetes attack?

Partially. The CanisterWorm DaemonSet required privileged containers and hostPath volume mounts - both blocked by the Pod Security Standards Baseline profile. Enforcing Baseline on kube-system would have rejected the pods at admission. But PSS alone does not stop IMDS credential harvesting (requires NetworkPolicy) or secret enumeration (requires RBAC scoping). All three controls together break the kill chain at multiple stages.

Q: Is the official LiteLLM Docker image safe to use on Kubernetes?

The official LiteLLM Proxy Docker image (ghcr.io/berriai/litellm) was not affected because it pins dependencies in requirements.txt. Only deployments using unpinned pip install litellm during the approximately 40-minute exposure window on March 24, 2026 were vulnerable. Starting from v1.83.0, all LiteLLM Docker images are signed with cosign.

Q: How do I check if my Kubernetes cluster was compromised by CanisterWorm?

Run kubectl get pods -A | grep 'node-setup-' to check the malware's DaemonSet naming pattern. Also audit DaemonSets in kube-system for unexpected entries. On Python hosts, search for litellm_init.pth in site-packages - it survives package uninstall. Check for ~/.config/sysmon/sysmon.py and the sysmon.service systemd unit. Review DNS logs for queries to scan.aquasecurtiy.org, checkmarx.zone, or models.litellm.cloud.

Q: Why are AI/ML dependencies a higher supply chain risk for Kubernetes than regular application dependencies?

AI/ML tooling like LiteLLM aggregates credentials for dozens of LLM providers into a single package. Compromising it exposes not just application data but every provider API key and cluster secret accessible from that environment. With 95-97 million monthly PyPI downloads and named downstream users including Netflix, Stripe, and Google, the blast radius is qualitatively larger than a typical utility library.

Securing AI/ML supply chains on Kubernetes means protecting every component in your cluster pipeline from compromise: the GitHub Actions that scan your code, the PyPI packages your model servers install, and the credentials that flow between them. When any one component is poisoned, an attacker inherits everything it touches.

On March 19, 2026, TeamPCP force-pushed malicious code to 76 of 77 version tags in the aquasecurity/trivy-action GitHub repository. Within eight days, one harvested credential had cascaded through four vendors, put malicious packages on PyPI, and deployed privileged DaemonSets to Kubernetes clusters across affected organizations. CVE-2026-33634 was assigned with a CVSS 9.4 score.

This post does two things. It traces exactly how CanisterWorm - the malware at the center of this campaign - moved from a poisoned GitHub Action to Kubernetes cluster takeover. Then it maps that kill chain to specific Kubernetes-native controls that would have broken each stage.

Most coverage of TeamPCP has focused on the PyPI package angle: rotate credentials, pin versions, run pip audit. That advice is correct but incomplete. The K8s-specific implications - how the malware detected service account tokens, escalated to cluster-wide secret enumeration, and deployed privileged DaemonSets - deserve a dedicated treatment. That is what this post provides.

How Did the TeamPCP Supply Chain Cascade Unfold?

How Was the Trivy Action Poisoned? (March 19)

The initial compromise traced to incomplete credential rotation at Aqua Security. TeamPCP gained access to the aqua-bot GitHub service account and used it to force-push a malicious payload to 76 of 77 version tags in aquasecurity/trivy-action and all tags in aquasecurity/setup-trivy.

The payload was a 150-line bash script. When a CI runner pulled the compromised action, the script:

Queried the AWS Instance Metadata Service (IMDS) at 169.254.169.254 to harvest cloud credentials
Read $GOOGLE_APPLICATION_CREDENTIALS on GCP-hosted runners
Collected SSH keys, environment variables, and GitHub tokens from the runner environment
Downloaded a Python second-stage component - CanisterWorm (kube.py)

Clean versions published after discovery: trivy-action 0.35.0, setup-trivy 0.2.6, and the Trivy 0.69.3 binary (the compromised binary was 0.69.4).

How Did the Attack Cascade Across Four Vendors?

Security scanning tools occupy a structurally privileged position in CI/CD pipelines. They need access to the code and artifacts they analyze, and they run with ambient access to whatever credentials the pipeline has loaded. That is the condition TeamPCP exploited.

GitHub Personal Access Tokens belonging to Checkmarx were harvested from the Trivy runner environment. On March 23 - four days after the initial Trivy compromise - those tokens were used to poison all 35 version tags of kics-github-action. That pipeline, in turn, had access to PyPI publishing tokens for LiteLLM.

On March 24, malicious LiteLLM versions 1.82.7 and 1.82.8 were published to PyPI. LiteLLM’s official security update states the packages were available for approximately 40 minutes before PyPI quarantined them (10:39 to roughly 16:00 UTC). Some secondary sources report a longer exposure period; the official figure from LiteLLM is 40 minutes from publish to quarantine.

On March 27, Telnyx SDK versions 4.87.1 and 4.87.2 were compromised using WAV steganography for payload delivery.

sequenceDiagram
    participant AQ as Aqua Security
    participant T as Trivy Action Tags
    participant CX as Checkmarx CI/CD
    participant KC as KICS Action Tags
    participant LL as LiteLLM CI/CD
    participant PY as PyPI
    participant TN as Telnyx SDK

    Note over AQ: Feb 2026 - Credential rotation incomplete
    AQ->>T: TeamPCP gains aqua-bot account access
    Note over T: March 19 - Force-push to 76/77 tags
    T->>CX: Payload executes - harvests Checkmarx GitHub PAT
    Note over CX: March 23
    CX->>KC: Stolen PAT used to poison 35 KICS action tags
    KC->>LL: Payload executes - harvests LiteLLM PyPI token
    Note over LL: March 24
    LL->>PY: Publishes malicious v1.82.7 and v1.82.8
    Note over PY: Exposed approx. 40 min - 10:39 to 16:00 UTC
    PY->>TN: Harvested creds enable Telnyx compromise
    Note over TN: March 27 - WAV steganography payload

One incomplete credential rotation at a security vendor cascaded across four organizations in eight days. Each compromised pipeline yielded the credentials needed to attack the next.

CSA researchers analyzed a 30,000-repository sample and estimated 474 public repositories executed the malicious trivy-action. Unit42 reported 16 victim organizations publicly disclosed. Over 300 GB of data and 500,000 credentials were exfiltrated across all four waves.

Why Does AI/ML Tooling Carry Unique Supply Chain Risk?

LiteLLM is a proxy that aggregates API keys for 100+ LLM providers - OpenAI, Anthropic, Google Gemini, AWS Bedrock, Cohere, and Mistral - into a single dependency. It downloads at 95-97 million packages per month on PyPI. Named downstream users include Netflix, Stripe, Google, CrewAI, DSPy, and MLflow.

Compromising a dependency that sits between your application and every LLM provider is categorically different from compromising a typical utility library. A compromised NumPy version exposes your data. A compromised LiteLLM version exposes your data plus every API key for every model your organization runs.

The malicious .pth file (litellm_init.pth) executed automatically on every Python interpreter startup - including pip, Python, and IDE processes - with no import required. This persistence survived uninstallation of the LiteLLM package itself. That is a qualitatively different persistence model than a malicious import in an __init__.py.

How Did CanisterWorm Move from CI Runner to Kubernetes Cluster Takeover?

When CanisterWorm detected Kubernetes service account tokens at the standard mount path (/run/secrets/kubernetes.io/serviceaccount/), it shifted from credential harvesting to cluster attack. This is the section most post-incident coverage skips.

graph TD
    A["CI Pipeline Runs\nCompromised Trivy Action"] --> B["Bash Script Executes\n150-line payload"]
    B --> C{"Cloud Env Detected?"}
    C -->|"AWS"| D["Query IMDS\n169.254.169.254"]
    C -->|"GCP"| E["Read GOOGLE_APPLICATION_CREDENTIALS"]
    D --> F["Harvest Cloud Creds\nSSH Keys, Env Vars, GitHub Tokens"]
    E --> F
    F --> G["Download CanisterWorm\nkube.py - Python Stage 2"]
    G --> H{"K8s SA Token Found?\n/run/secrets/kubernetes.io/serviceaccount/"}
    H -->|"Yes"| I["Enumerate All Secrets\nkubectl get secrets --all-namespaces"]
    H -->|"No"| J["Harvest Remaining Creds\nExit K8s Path"]
    I --> K["Deploy Privileged DaemonSet\nnode-setup-{node_name} in kube-system\nalpine:latest image"]
    K --> L["Mount Host Filesystem\nInstall sysmon.py Backdoor"]
    L --> M["Exfiltrate to ICP Canister C2\nDecentralized - Cannot Be Seized"]
    L --> N["Wiper Variant kamikaze.sh\nDestructive DaemonSets on Geolocated Targets"]

The complete kill chain from CI runner to cluster-wide backdoor. Each stage feeds the next.

How Did CanisterWorm Harvest Cloud Credentials?

The bash payload queried the cloud metadata service first. On AWS, that is 169.254.169.254. Microsoft’s detection guidance also identifies 169.254.170.2 as a queried endpoint. On GCP, the payload read the credential file path from $GOOGLE_APPLICATION_CREDENTIALS.

This is a well-understood attack vector, but its prevalence in Kubernetes environments remains high because the default pod network places no egress restrictions on metadata service access. Any process running in a pod can reach the IMDS endpoint unless a NetworkPolicy or cloud-provider control explicitly blocks it - and most clusters have neither by default.

How Did CanisterWorm Discover Service Account Tokens?

Kubernetes mounts a service account token at /run/secrets/kubernetes.io/serviceaccount/token by default on every pod. That token carries the permissions of the pod’s assigned service account.

CI/CD runners frequently carry broad permissions because they need to deploy to the cluster, pull secrets for configuration, and interact with the Kubernetes API. When CanisterWorm found a token at that path, it had a signed credential for the Kubernetes API server.

The two cluster actions that followed - kubectl get secrets --all-namespaces to read every secret across every namespace, and creating DaemonSets in kube-system - required cluster-level read permissions on secrets and create permissions on DaemonSets. A service account with only namespace-scoped permissions and no secret read access would have stopped both actions with a 403.

How Did CanisterWorm Establish Persistent Node-Level Access?

The malware deployed DaemonSets named node-setup-{node_name} to the kube-system namespace using alpine:latest. These pods were privileged containers with hostPath volume mounts pointing to the root filesystem - giving each container full read/write access to the underlying node.

From the host filesystem, the malware installed sysmon.py as a systemd user service (sysmon.service), creating a persistent backdoor that survived pod restarts and LiteLLM package removal.

A wiper variant - kamikaze.sh - deployed destructive DaemonSets against clusters geolocated in Iran. The entire operation used an Internet Computer Protocol (ICP) canister as C2 infrastructure. Unlike traditional domain-based C2, ICP canisters run on a decentralized blockchain network. Standard incident response playbooks that rely on domain seizure or DNS sinkholing do not apply.

Which Kubernetes-Native Controls Would Have Contained the Blast Radius?

The following controls are available in every Kubernetes cluster without third-party dependencies. Each maps to a specific attack stage.

graph LR
    subgraph Attacks["Attack Stages"]
        A1["1. Query IMDS\n169.254.169.254"]
        A2["2. Read SA Token\n/run/secrets/.../token"]
        A3["3. List All Secrets\nkubectl get secrets -A"]
        A4["4. Deploy Privileged DaemonSet\nhostPath and privileged=true"]
        A5["5. Egress to C2\nICP Canister Endpoint"]
    end
    subgraph Controls["K8s Controls"]
        D1["NetworkPolicy\ndeny 169.254.169.254/32 egress"]
        D2["automountServiceAccountToken: false\non scanner ServiceAccount"]
        D3["RBAC: namespace-scoped Role\nno secrets list or get"]
        D4["Pod Security Standards Baseline\nenforce on kube-system and app namespaces"]
        D5["Default-deny egress NetworkPolicy\nallowlist known endpoints only"]
    end
    A1 -. "blocked by" .-> D1
    A2 -. "blocked by" .-> D2
    A3 -. "blocked by" .-> D3
    A4 -. "blocked by" .-> D4
    A5 -. "blocked by" .-> D5

Each defense layer intercepts a distinct attack stage. No single control stops the full chain - but all five together break it at multiple points.

How Do Pod Security Standards Block Privileged Pod Deployment?

Pod Security Standards (PSS) are enforced by the built-in Pod Security Admission (PSA) controller, stable since Kubernetes 1.25 and included in every cluster. No additional tooling required.

The Baseline profile blocks:

Privileged containers (spec.containers[*].securityContext.privileged: true)
Host namespace access (spec.hostNetwork, spec.hostPID, spec.hostIPC)
hostPath volumes (spec.volumes[*].hostPath)

The node-setup-* DaemonSet CanisterWorm deployed required all three. Enforcing Baseline on kube-system and application namespaces would have rejected the pod at admission - before it ever ran.

Apply PSS via namespace labels:

apiVersion: v1
kind: Namespace
metadata:
  name: ai-workloads
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

The audit and warn labels generate logs and API warnings without blocking - use them to inventory violations in existing namespaces before switching to enforce: restricted. Start with enforce: baseline on every non-system namespace today. Tighten toward Restricted on namespaces where workloads can tolerate it.

How Should You Harden RBAC for CI/CD Service Accounts?

The CanisterWorm secret enumeration succeeded because the compromised service account had cluster-wide read access. Two immediate changes break this:

Disable automatic service account token mounting on pods that do not need Kubernetes API access:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: trivy-scanner
  namespace: ci-cd
automountServiceAccountToken: false

With this set, the token file is not mounted. CanisterWorm’s check of /run/secrets/kubernetes.io/serviceaccount/ finds nothing, and the K8s escalation path closes.

Scope deployer roles to the target namespace using Role and RoleBinding - not ClusterRole and ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: deploy-manager
  namespace: production
rules:
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "patch", "update"]
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list"]

No wildcards. No cluster-admin grants. No secret read permissions on roles that don’t explicitly need to consume secrets. The Kubernetes RBAC Good Practices documentation makes the risk concrete: “providing wildcard access gives rights not just to all object types that currently exist in the cluster, but also to all object types which are created in the future.”

For production CI/CD systems, generate short-lived per-run tokens using IRSA (AWS), Workload Identity (GCP, AKS), or GitHub’s OIDC integration. A token that expires 15 minutes after the run ends has minimal value to an attacker who harvests it mid-run.

How Do Network Policies Block IMDS Access and C2 Egress?

The Trivy payload queried IMDS as its first action. A NetworkPolicy blocking that egress from CI/CD pods would have cut the credential harvesting before it reached the Kubernetes token stage.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-imds
  namespace: ci-cd
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 169.254.169.254/32

NetworkPolicy requires a CNI plugin that enforces it. Calico and Cilium both support this fully. The default Kubernetes network does not enforce NetworkPolicy without a supporting CNI.

A stronger baseline: add a default-deny egress policy in your CI/CD namespace and explicitly allowlist only the registries, artifact stores, and APIs your runners need. Any future compromised action has no egress path to a C2 server.

On AKS, a preview feature allows restricting IMDS access at the cluster level for all non-host-network pods - providing defense-in-depth beyond NetworkPolicy. This is in preview and carries the usual caveats around unsupported add-ons and production readiness. On EKS, using IAM Roles for Service Accounts (IRSA) means pod processes never query IMDS for cloud credentials at all - the IAM binding happens at the pod identity level via the Kubernetes API.

How Do Falco and Tetragon Detect CanisterWorm Activity?

Pod Security Standards, RBAC, and NetworkPolicy are preventive controls. Runtime security tools catch what slips past them.

Falco (CNCF graduated) detects anomalous behavior at the Linux syscall level. Rules targeting CanisterWorm behavior would fire on:

bash spawning kubectl with get secrets --all-namespaces arguments from inside a container
File read events on /run/secrets/kubernetes.io/serviceaccount/ from processes that do not have API access in their declared role
Network connections from CI runner containers to unknown external endpoints or unusual IP ranges
DaemonSet creation in kube-system from a service account that is not a cluster component

Microsoft’s detection guidance for this attack includes a hunting query specifically targeting “bash processes spawned from Python initiators executing kubectl get secrets --all-namespaces -o json” - the exact execution pattern CanisterWorm used.

Cilium Tetragon takes detection further with eBPF enforcement via Linux Security Module (LSM) hooks. Where Falco detects and alerts after the fact, Tetragon can block the syscall before it completes - stopping the kubectl get secrets call rather than logging that it happened. For high-security namespaces running AI inference workloads, Tetragon policy enforcement is worth the operational investment.

How Should You Harden CI/CD Pipelines Running AI/ML Dependencies?

Why Should You Pin Actions to Commit SHAs Instead of Tags?

The TeamPCP Trivy attack relied entirely on Git tag mutability. A version tag is a pointer that can be force-pushed to a different commit at any time. Any workflow pinned to a tag picks up the new code on its next run, with no notification.

# Vulnerable: mutable tag - attacker can change where this resolves
- uses: aquasecurity/[email protected]

# Secure: immutable commit SHA - cannot be rewritten without changing the hash
- uses: aquasecurity/trivy-action@57a97c7a5ef77f9608dcc5af182e98e43b2e3252

SHA pinning requires keeping those SHAs current as legitimate releases arrive. Dependabot and Renovate both support GitHub Actions SHA pinning with automated PRs that include the changelog diff for review. The workflow is: pin to a verified SHA today, review and merge automated updates as the upstream project ships patches.

How Do You Verify Python Dependencies with Hash Pinning?

The malicious LiteLLM .pth file bypassed pip’s default integrity verification because it was legitimately declared in the wheel’s RECORD file. The package had a valid PyPI record. Hash-pinning provides a stronger guarantee:

# Generate requirements with SHA256 hashes from a known-good lockfile
pip-compile --generate-hashes requirements.in -o requirements.txt

# Install with hash verification - changed package content fails immediately
pip install --require-hashes -r requirements.txt

With --require-hashes, pip verifies the SHA256 hash of every downloaded artifact against the pinned value. A package whose contents differ from the hash in your lockfile fails to install, regardless of whether it has the correct version number. This would not have prevented the .pth persistence from a package already installed before the lockfile was generated, but it blocks future installs of tampered packages.

How Should You Separate Scanner and Publisher Credentials?

The TeamPCP cascade worked because Trivy’s CI job ran with ambient access to Checkmarx’s GitHub PAT used for publishing. These credentials had no business being in the same environment.

Three separations that break this structural problem:

PyPI Trusted Publishers - OIDC-based package publishing that replaces long-lived PyPI tokens with short-lived OIDC tokens scoped to a specific repository and workflow. A token that does not exist as a static credential cannot be harvested.

Function-separated service accounts - The scanner service account gets read access to build artifacts. The deployer service account gets write access to the cluster. Neither account has the other’s credentials, and neither has ambient access to publishing tokens for unrelated systems.

Per-run token generation - Use IRSA, Workload Identity, or GitHub’s OIDC integration to generate tokens with a 15-minute expiration scoped to the current run. Harvested tokens are worthless before the next pipeline stage can use them.

How Do You Sign and Verify Container Images?

Starting from LiteLLM v1.83.0, all Docker images are signed with cosign. Verify the signature before pulling to a production cluster:

cosign verify \
  --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \
  ghcr.io/berriai/litellm:1.83.0

The official LiteLLM Proxy Docker image (ghcr.io/berriai/litellm) was not affected by the March 24 compromise because it pins dependencies in requirements.txt. Custom K8s deployments that ran pip install litellm without pinning during the exposure window were vulnerable. Deployments using the official image were not.

Cosign verification in admission control - via Policy Controller or Kyverno with cosign support - enforces signature requirements at the cluster level before any image is allowed to run.

Are You Exposed? A Practical Audit Checklist

graph TD
    Start(["Start: Am I Exposed?"]) --> Q1{"Use Trivy in CI/CD?"}
    Q1 -->|"No"| Q2{"Use LiteLLM?"}
    Q1 -->|"Yes"| Q3{"Tag-based refs\nbefore March 19?"}
    Q3 -->|"No - SHA pinned"| LOW1["Low Risk\nVerify SHA is trivy-action 0.35.0+\nor setup-trivy 0.2.6+"]
    Q3 -->|"Yes"| HIGH1["HIGH RISK\nCheck runner logs for IMDS queries\nand kubectl secret enumeration"]
    HIGH1 --> ACT1["Rotate all CI/CD secrets now\nAudit cluster: kubectl get pods -A | grep node-setup\nCheck DaemonSets in kube-system"]
    Q2 -->|"No"| SAFE(["Low Risk - Apply structural hardening below"])
    Q2 -->|"Yes"| Q5{"Installation method?"}
    Q5 -->|"Official Docker image\nghcr.io/berriai/litellm"| LOW2["Low Risk\nVerify running v1.83.0+\nEnable cosign verification"]
    Q5 -->|"pip install litellm"| Q7{"Installed March 24\nbetween 10:39-16:00 UTC?"}
    Q7 -->|"No"| LOW2
    Q7 -->|"Yes"| HIGH2["HIGH RISK\nTreat environment as compromised"]
    HIGH2 --> ACT2["Check site-packages for litellm_init.pth\nRemove sysmon.py and sysmon.service\nRotate all LLM provider API keys"]

Use this flowchart to determine exposure level before running the checks below.

What Are the Immediate Indicators of Compromise to Check?

Check for CanisterWorm Kubernetes artifacts:

# Check for the DaemonSet naming pattern used by the malware
kubectl get pods -A | grep "node-setup-"

# Audit DaemonSets in kube-system for unexpected entries
kubectl get daemonsets -n kube-system

# Check cluster audit logs for this specific enumeration pattern:
# bash processes spawned from Python initiators running kubectl get secrets --all-namespaces

Check for LiteLLM persistence on affected Python environments:

# Check for the malicious .pth file - persists after pip uninstall
find /usr/lib/python3.*/site-packages/ -name "litellm_init.pth"

# Check for persistence backdoor
ls -la ~/.config/sysmon/sysmon.py /root/.config/sysmon/sysmon.py 2>/dev/null

# Check for rogue systemd service
systemctl --user status sysmon.service 2>/dev/null

Known network indicators of compromise include DNS queries to typosquatted domains: scan.aquasecurtiy.org (note the misspelling), checkmarx.zone, and models.litellm.cloud.

What Structural Hardening Should You Apply Across AI/ML Cluster Namespaces?

Apply these in order across any cluster running AI/ML tooling:

Pod Security Standards - Label every application namespace with enforce: baseline. Use audit: restricted and warn: restricted to surface violations before tightening. Apply to kube-system only with extreme care - system components may legitimately need capabilities that Baseline blocks.
RBAC audit - List all ClusterRoleBindings. Identify any that grant cluster-admin, wildcard resources, or secrets read/list to CI/CD service accounts. Replace with namespace-scoped Roles and RoleBindings.
automountServiceAccountToken - Set to false on ServiceAccounts used by scanners, build tools, and any workload that does not directly call the Kubernetes API. Opt in explicitly on the pods that need it using automountServiceAccountToken: true in the Pod spec.
NetworkPolicy - Add a deny-IMDS egress rule to your CI/CD namespace (see example above). Add a default-deny egress baseline in any namespace running untrusted code. Allowlist only verified registries, artifact stores, and APIs.
GitHub Actions - Replace all tag-based action references with full commit SHAs. Enable branch protection rules that prevent tag force-pushing on your own action repositories.
Python dependencies - Add --require-hashes to pip install steps in CI. Generate lockfiles with pip-compile --generate-hashes for development environments. Move to PyPI Trusted Publishers for any packages you publish.
Runtime security - Deploy Falco or Tetragon. Write rules that alert on secret enumeration, unexpected DaemonSet creation in system namespaces, and IMDS access from application pods. Start in alert-only mode, tighten to enforcement as you validate rules against your workloads.

The structural hardening is the work that makes your cluster resilient to the next campaign, not just this one. The ICP canister C2 model TeamPCP used means conventional domain-seizure-based detection and response is not sufficient. Defense-in-depth within the cluster - PSS, RBAC, NetworkPolicy, runtime monitoring - is the layer that contains the blast radius when a CI tool or supply chain dependency is compromised.

Frequently Asked Questions

How did a vulnerability scanner become the attack vector for Kubernetes cluster compromise?

TeamPCP exploited incomplete credential rotation at Aqua Security to gain access to the aqua-bot service account and poison Trivy GitHub Action tags. Security scanners run with privileged CI/CD pipeline access because they need to analyze artifacts across the entire pipeline - which means they run with whatever cloud credentials, publishing tokens, and Kubernetes service account tokens the pipeline has loaded. Compromising the scanner gave attackers simultaneous access to every secret in that environment. The harvested K8s service account tokens then enabled cluster-wide secret enumeration and privileged DaemonSet deployment. The attack succeeded because a security tool was placed in the most trusted position in the pipeline with no credential isolation.

Would Pod Security Standards have prevented the TeamPCP Kubernetes attack?

Yes, partially. The CanisterWorm DaemonSet required privileged containers and hostPath volume mounts to the root filesystem - both blocked by the Pod Security Standards Baseline profile. Enforcing Baseline on kube-system would have rejected the node-setup-* pods at admission, preventing persistent node-level access. However, PSS alone would not have stopped the initial IMDS credential harvesting (which requires NetworkPolicy) or the cluster-wide secret enumeration (which requires RBAC scoping and disabling automatic service account token mounting). Defense-in-depth - PSS plus RBAC plus NetworkPolicy - breaks the kill chain at three separate stages rather than relying on any single control.

Is the official LiteLLM Docker image safe to use on Kubernetes?

The official LiteLLM Proxy Docker image (ghcr.io/berriai/litellm) was not affected by the March 24 compromise because it pins all dependencies in requirements.txt. The malicious packages only affected deployments using unpinned pip install litellm during the approximately 40-minute exposure window (10:39-16:00 UTC, March 24, 2026). Starting from v1.83.0, all LiteLLM Docker images are cryptographically signed with cosign for verification of build provenance. Run the cosign verify command shown in the CI/CD hardening section before deploying v1.83.0+ images to production.

How do I check if my Kubernetes cluster was compromised by CanisterWorm?

Run kubectl get pods -A | grep "node-setup-" to check for the malware’s DaemonSet naming pattern. Also audit DaemonSets in kube-system for any entries not created by your system components. On Python hosts, search for litellm_init.pth in site-packages directories - this file persists after running pip uninstall litellm and is the most reliable local indicator. Check for ~/.config/sysmon/sysmon.py and a systemd unit named sysmon.service. Review DNS and network logs for queries to scan.aquasecurtiy.org, checkmarx.zone, or models.litellm.cloud - all are known C2-adjacent domains from this campaign.

Why are AI/ML dependencies a higher supply chain risk for Kubernetes than regular application dependencies?

AI/ML tooling like LiteLLM acts as a credential aggregator: it holds API keys for every LLM provider your application uses, and it typically runs in the same environment as your Kubernetes service account tokens and cloud credentials. Compromising this dependency exposes not just your application data but every provider API key and cluster secret accessible from that environment. With 95-97 million monthly PyPI downloads and named downstream users including Netflix, Stripe, and Google, the blast radius of a single compromised version is qualitatively larger than a typical utility library. The LiteLLM compromise was not just a package supply chain incident - it was an attack on the credential layer of AI infrastructure.