Vibe coding creates measurable security debt: 45% of AI-generated code contains OWASP Top 10 vulnerabilities, AI-assisted commits leak secrets at twice the industry baseline, and nearly 20% of AI-generated code references packages that do not exist. Addressing these risks requires a security pipeline most teams have not built yet.
In March 2026, Georgia Tech’s Systems Software and Security Lab confirmed 35 CVEs directly attributable to AI-generated code. That single month exceeded everything they had documented for all of 2025.
The researchers at Georgia Tech’s SSLab maintain the Vibe Security Radar, a live tracker that links CVEs to AI coding tool output after analyzing public advisories. As of their March 2026 data pull, they had confirmed 74 CVEs traced to AI-generated code across approximately 50 tracked tools. Researcher Hanqing Zhao emphasized that figure is a lower bound. “Those 74 cases are confirmed instances where we found clear evidence that AI-generated code contributed to the vulnerability,” Zhao told The Register. His estimate for the actual count: 5 to 10 times higher, roughly 400-700 cases, because most AI-generated vulnerabilities are never attributed to the tool that produced the code.
The tool-level breakdown shows where the confirmed CVEs concentrate: Claude Code accounts for 49 of the confirmed CVEs including 11 critical severity ratings. GitHub Copilot accounts for 15, with 2 critical. Google Jules, Devin, Cursor, and others account for the remainder.
These aren’t edge cases from experimental AI use. Claude Code appeared in over 4% of all public GitHub commits by early 2026 - more than 15 million commits and 30.7 billion lines of code. At that adoption level, a small per-commit vulnerability rate produces a very large aggregate attack surface.
How fast are AI-attributed CVEs multiplying?
The monthly progression from the Vibe Security Radar makes the trajectory clear:
xychart-beta
title "AI-Attributed CVEs Confirmed per Month"
x-axis ["Jan 2026", "Feb 2026", "Mar 2026"]
y-axis "CVEs Confirmed" 0 --> 40
bar [6, 15, 35]
Monthly CVE counts from Georgia Tech’s Vibe Security Radar. March 2026 alone exceeded all AI-attributed CVEs tracked in 2025 combined.
Six CVEs in January. Fifteen in February. Thirty-five in March. This is not organic growth in vulnerabilities - it is detection catching up with adoption. AI coding tools saw explosive growth through 2025, but the security research infrastructure to track resulting vulnerabilities took time to develop. The Vibe Security Radar launched in May 2025. The trend reflects both accelerating AI code deployment and improving attribution methodology.
Zhao’s 5-10x multiplier for undetected cases means the true March 2026 figure is likely 175-350 AI-attributed vulnerabilities. Most will never reach a CVE database because no one connects the finished product back to the AI tool that generated it.
What security risks does AI-generated code introduce?
AI coding tools introduce three distinct vulnerability categories that require different detection approaches. Standard security tooling catches one of them reasonably well. The other two require tooling that most teams haven’t deployed.
graph LR
A[AI Code Generator] --> B[OWASP Vulnerabilities]
A --> C[Hallucinated Dependencies]
A --> D[Hardcoded Secrets]
B --> E["SAST: Semgrep, CodeQL"]
C --> F["SCA: lockfile-lint, Snyk"]
D --> G["Secrets scan: ggshield"]
The three distinct attack surfaces created by AI-generated code, and the detection layer that addresses each one.
How often does AI-generated code fail OWASP tests?
Veracode’s 2025 GenAI Code Security Report tested code output from over 100 LLMs across 80+ coding tasks in Java, Python, C#, and JavaScript. Forty-five percent of samples contained OWASP Top 10 vulnerabilities.
The language-specific failure rates:
| Language | OWASP Failure Rate |
|---|---|
| Java | 72% |
| Python | 38-45% |
| C# | 38-45% |
| JavaScript | 38-45% |
The two worst-performing vulnerability categories: log injection (CWE-117) at an 88% failure rate, and cross-site scripting (CWE-80) at 86%.
A key finding from Veracode’s multi-year testing: newer, larger models did not produce more secure code than smaller ones. The security pass rate held flat at roughly 55% across their 2025-2026 testing cycles. The OWASP failure rate appears structural, not a gap that next-generation model releases will close automatically.
Georgetown University’s Center for Security and Emerging Technology (CSET) published a complementary study in November 2024, testing five models including GPT-3.5-turbo, GPT-4, Code Llama 7B, WizardCoder 7B, and Mistral 7B against 67 prompts targeting the MITRE Top 25 CWE list. They found approximately 48% of generated code contained security bugs, while only about 30% passed as fully secure. (The study used models that are now 1-2 generations old, so exact percentages may differ for current models, but the directional finding is consistent with Veracode’s more recent data.)
Slopsquatting: when AI hallucinates your dependencies
Researchers at the University of Texas at San Antonio, Virginia Tech, and the University of Oklahoma analyzed 576,000 code samples generated across 16 LLMs - including DeepSeek, Claude, GPT-4, and Mistral. Nearly 20% of those samples referenced packages that do not exist.
This creates a supply chain attack vector called slopsquatting. Attackers register the package names that AI tools consistently hallucinate, publish malicious payloads to npm or PyPI under those names, and wait for developers who trust AI-generated code to run npm install or pip install.
The reason this works at scale: 43% of hallucinated package names reappeared every time the same prompt ran again. Fifty-eight percent reappeared within 10 runs. The hallucinations follow consistent patterns - 38% are conflations of two real package names (like combining express and mongoose into express-mongoose), 13% are typo variants, and 51% are pure fabrications. That predictability is exactly what makes preemptive name registration viable for attackers.
Open-source models perform significantly worse: 21.7% hallucination rate versus 5.2% for proprietary models. CodeLlama 7B and 34B hallucinated package names in over a third of outputs. The researchers identified over 205,000 unique hallucinated package names in their dataset.
Secrets sprawl: AI-assisted commits leak at 2x the rate
GitGuardian’s State of Secrets Sprawl 2026 report analyzed 1.94 billion public GitHub commits. Claude Code-assisted commits showed a 3.2% secret-leak rate compared to a 1.5% baseline across all commits. Broader context: 28.65 million new hardcoded secrets were added to public GitHub in 2025, a 34% year-over-year increase and the largest single-year jump on record.
AI-service credentials saw the steepest growth - API keys for LLM providers, embedding services, and AI platforms increased 81% year-over-year to 1,275,105 detected leaks.
Where leaked secrets cause the most damage: 59% of the compromised machines in GitGuardian’s analysis were CI/CD runners. When a secret leaks into a repository that feeds a pipeline, the pipeline becomes the primary attack target.
GitGuardian’s report notes that “developers remain in control of what gets accepted, edited, ignored, or pushed” - the elevated leak rate reflects the combined behavior of AI suggestion plus human acceptance, not AI acting autonomously. But that framing also means the fix requires changing review behavior, not just turning off AI tools.
The quality paradox: why existing tools miss AI-generated bugs
The Cloud Security Alliance’s April 2026 research note synthesizing multiple studies found that AI-assisted developers produce commits 3-4x faster than their peers but introduce security findings at 10x the rate (per the CSA note, citing enterprise analysis data). The breakdown of what AI improves versus what it degrades:
graph TD
A[AI-Generated Code Impact]
A --> B[Shallow bugs reduced]
A --> C[Deep vulnerabilities increased]
B --> D["Syntax errors: -76%"]
B --> E["Logic bugs: -60%"]
C --> F["Privilege escalation: +322%"]
C --> G["Architectural flaws: +153%"]
AI reduces the surface-level bugs that basic static analysis catches. It introduces architectural vulnerabilities that require deeper analysis to detect.
AI eliminates bugs that are easy to detect: syntax errors and simple logic mistakes. Those are also the bugs standard SAST tools were already catching. The bugs AI introduces are architectural - missing authorization checks, broken access control patterns, privilege escalation paths that require understanding the intended system behavior to identify.
A Q1 2026 audit by Kingbird Solutions of over 200 vibe-coded applications found that 91.5% contained at least one AI hallucination-related flaw (note: the study’s sampling methodology was not published; treat this as directional rather than definitive). The most prevalent access control vulnerability was BOLA - Broken Object Level Authorization - where API endpoints authenticate the user but fail to verify they own the object being accessed. Escape’s production scan of 5,600 vibe-coded applications found 2,038 critical vulnerabilities, 400+ leaked secrets, and 175 PII exposures.
The developer perception gap amplifies the risk. JetBrains surveyed 24,534 developers and found approximately 80% believe AI tools generate more secure code than humans write. The gap between that perception and the Veracode, GitGuardian, and CSA data means most teams aren’t applying the additional scrutiny AI-generated code actually requires.
What is the SHIELD framework for vibe coding security?
Palo Alto Networks Unit 42 published the SHIELD framework on January 8, 2026 - the first security governance model designed specifically for AI coding tools. Each pillar maps to a concrete control:
| Pillar | What it means in practice |
|---|---|
| S - Separation of Duties | AI agents get dev/test access only; no single agent holds dev and prod access |
| H - Human in the Loop | Mandatory human code review; PR approval required before merging AI-generated code |
| I - Input/Output Validation | SAST required before code merges; prompt sanitization and guardrail partitioning |
| E - Enforce Security Models | Deploy independent AI models as specialized security reviewers for automated scanning |
| L - Least Agency | Minimum permissions for AI coding agents; no access to sensitive files they don’t need |
| D - Defensive Controls | SCA before consuming dependencies; auto-execution disabled to require human review |
Unit 42’s real-world incidents illustrate why each pillar matters. A sales application was breached because the AI-generated code omitted authentication and rate limiting. An AI agent deleted an entire production database despite explicit instructions to freeze production changes - a failure of the Separation of Duties pillar.
How do you build a security pipeline for AI-generated code?
Mapping SHIELD to CI/CD means adding security gates at three stages: pre-commit, CI pipeline, and Kubernetes admission.
graph LR
A["Developer + AI Tool"] -->|"pre-commit"| B["ggshield\nSecrets scan"]
B -->|"PR"| C["Human Review\nSHIELD: H"]
C -->|"CI"| D["Semgrep SAST\nSHIELD: I"]
D --> E["Snyk + npm audit\nSCA"]
E -->|"K8s"| F["OPA Gatekeeper\nAdmission policy"]
F --> G["Production"]
Complete security pipeline for AI-generated code, with SHIELD principles mapped to each gate.
Pre-commit: catch secrets before they enter the repository
Install ggshield as a pre-commit hook to intercept secrets at the earliest possible stage:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitguardian/ggshield
rev: v1.51.0
hooks:
- id: ggshield
language_version: python3
This directly addresses the 3.2% AI-assisted commit secret-leak rate. Catching at pre-commit means the secret never enters the repository history, which matters because revoking a committed secret requires assuming it was already harvested.
CI: three gates for three attack surfaces
The CI pipeline needs to cover all three attack surfaces. Secrets for anything that slipped past pre-commit, hallucinated dependencies for slopsquatting, and OWASP vulnerabilities for the access control and injection flaws AI code consistently produces:
# .github/workflows/ai-code-security.yml
name: AI Code Security Gates
on: [pull_request]
jobs:
secrets-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: GitGuardian scan
uses: GitGuardian/ggshield-action@v1
env:
GITGUARDIAN_API_KEY: ${{ secrets.GITGUARDIAN_API_KEY }}
dependency-audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check for hallucinated packages
run: |
npm audit --audit-level=critical
npx lockfile-lint --path package-lock.json --type npm --allowed-hosts npm
- name: Snyk dependency scan
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
sast-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Semgrep SAST
uses: semgrep/semgrep-action@v1
with:
config: >-
p/owasp-top-ten
p/javascript
p/typescript
The dependency-audit job directly addresses slopsquatting. npm audit checks registered packages for known vulnerabilities. lockfile-lint verifies that every package in your lockfile resolves to a known registry host - if an AI tool hallucinated a package name that an attacker has since registered with a malicious payload, this combination catches it before it ships.
Admission: Kubernetes policy enforcement
For teams running Kubernetes, OPA Gatekeeper can enforce the SHIELD “Human in the Loop” principle at the admission layer. This constraint blocks any Deployment that hasn’t been explicitly labeled as security-reviewed:
# constraint-template.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredsecurityreview
spec:
crd:
spec:
names:
kind: K8sRequiredSecurityReview
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredsecurityreview
violation[{"msg": msg}] {
input.review.object.kind == "Deployment"
not input.review.object.metadata.labels["security-reviewed"]
msg := "Deployment must have 'security-reviewed' label. AI-generated code requires security scan before deployment."
}
No Deployment reaches production without the security-reviewed: "true" label. That label is the signal that a human has reviewed the code and the CI pipeline passed. OPA Gatekeeper v3.22.0 (March 2026) is a CNCF graduated project with production-grade stability.
What to do Monday morning
This week:
- Install ggshield as a pre-commit hook in every active repository. It takes about 10 minutes and eliminates the easy wins for the 3.2% AI-assisted secret-leak rate.
- Run
npm auditacross your existing projects. Check for packages with zero download history or registration dates within the past year - both are indicators of potential slopsquatting targets.
This quarter:
- Add the GitHub Actions pipeline above to repositories that accept AI-generated PRs. The three-job structure covers all three attack surfaces without requiring a new security tool vendor.
- Require explicit PR labels for AI-assisted code so reviewers know to look specifically for missing authorization checks and privilege escalation paths - the architectural bugs AI consistently introduces.
- Run Semgrep with
p/owasp-top-tenagainst your highest-risk repositories. You may find the 45% OWASP failure rate applies to code your team already shipped.
This year:
- Deploy OPA Gatekeeper admission policies in production Kubernetes namespaces with the security-review label requirement.
- Build attribution infrastructure to track which commits contain AI-generated code so you can measure your actual secret-leak rate and OWASP failure rate rather than relying on industry averages.
- Implement SHIELD’s “Enforce Security-Focused Helper Models” pillar: a dedicated AI model as an automated PR security reviewer running in parallel with your human review process.
The core insight from the CSA research note is that the bugs AI eliminates - syntax errors and simple logic mistakes - are the bugs your existing tools were already catching. The bugs AI introduces - privilege escalation and architectural flaws - are the ones your existing tools struggle with. The three pipeline layers above don’t replace your existing security tooling. They add coverage specifically for what AI-assisted development introduces.
Frequently asked questions
What percentage of AI-generated code has security vulnerabilities?
Veracode’s testing of more than 100 LLMs across 80+ coding tasks found 45% of AI-generated code contains OWASP Top 10 vulnerabilities. The rate varies by language: Java fails at 72%, while Python, C#, and JavaScript range from 38-45%. Georgetown CSET’s formal verification study (November 2024, using older model generations) found approximately 48% of generated code contained security bugs, with only about 30% passing as fully secure. Notably, Veracode found that newer and larger models did not produce more secure code than smaller ones - the 55% security pass rate held flat across their 2025-2026 testing cycles.
What is slopsquatting and why does it matter for DevOps teams?
Slopsquatting is a supply chain attack that exploits AI coding tools’ tendency to hallucinate package names. Research across 576,000 code samples from 16 LLMs found nearly 20% of AI-generated code references packages that do not exist. Forty-three percent of those hallucinated names reappear every time the same prompt runs. Attackers register these names on npm, PyPI, or other registries with malicious payloads and wait for developers running AI-generated install commands to pick them up. DevOps teams should add lockfile-lint and Snyk SCA to their CI pipelines to verify package provenance before dependencies reach production.
How do I scan AI-generated code for leaked secrets before it reaches production?
Install GitGuardian’s ggshield as a pre-commit hook so secrets are intercepted before they enter repository history. In CI, add a full repository scan using ggshield-action. GitGuardian’s 2026 report found AI-assisted commits leak secrets at 3.2% compared to a 1.5% baseline, and 59% of compromised machines were CI/CD runners - making pipeline-level scanning essential, not optional.
What is the SHIELD framework for vibe coding security?
SHIELD is a governance framework published by Palo Alto Networks Unit 42 on January 8, 2026, designed specifically for securing AI coding tools. It stands for: Separation of Duties (restrict AI agents to dev/test environments), Human in the Loop (mandate human PR review), Input/Output Validation (SAST before merge), Enforce Security-Focused Helper Models (specialized security AI agents), Least Agency (minimum permissions for AI tools), and Defensive Technical Controls (SCA and execution controls before consuming dependencies).
Should I ban AI coding tools to reduce security risk?
Banning is unlikely to be effective. JetBrains data shows 85% of developers already use AI coding tools, and adoption continues to grow. The more practical approach is implementing the security gates that AI-assisted development requires: pre-commit secrets scanning, CI-level SAST and dependency auditing, mandatory human review for AI-generated PRs, and Kubernetes admission policies that enforce the security-review requirement. The SHIELD framework provides a structured model for deploying these controls without stopping AI adoption.