Artificial Intelligence

29 Million Leaked Secrets: How AI Coding Accelerated the Secrets Sprawl Crisis

April 4, 2026 · 9 min read · By William
29 Million Leaked Secrets: How AI Coding Accelerated the Secrets Sprawl Crisis

29 Million Reasons Your Cloud Credentials Are Already Leaked

In 2025, developers pushed 28.65 million hardcoded secrets to public GitHub repositories — a 34% jump from the previous year and the largest single-year spike ever recorded. That number comes from GitGuardian’s State of Secrets Sprawl 2026 report, and it barely scratches the surface. The real problem isn’t just what’s on GitHub. It’s the credentials sitting in Slack messages, CI/CD variables, .env files scattered across laptops, and now MCP configuration files feeding AI coding agents. If your team writes code, deploys to the cloud, or uses any SaaS tool, secrets sprawl is already your problem. Here’s how to understand it, measure it, and actually fix it.

What Is Secrets Sprawl, Really?

Secrets sprawl is the uncontrolled spread of authentication credentials — API keys, service account tokens, database passwords, TLS certificates, SSH keys — across every corner of your infrastructure. It’s not a single leak. It’s a systemic condition.

In cloud-native environments, every microservice needs credentials to talk to databases, message queues, object storage, and external APIs. CI/CD pipelines need tokens to push images and trigger deployments. Infrastructure-as-code tools need cloud provider credentials. AI agents need API keys to call LLMs. The result: a single organization can easily manage tens of thousands of non-human identity (NHI) credentials, most of them long-lived, most of them unmapped.

The GitGuardian report highlights a particularly nasty dimension: 28% of secret leak incidents don’t come from code repositories at all. They come from collaboration tools — Slack, Confluence, Google Docs — where someone pasted a token for a colleague and forgot about it. These leaks are harder to detect and often visible to a much broader audience.

The AI Accelerant

The 2026 report reveals something security teams need to internalize quickly: AI-assisted coding is making secrets sprawl worse, fast.

  • AI-service secret leaks surged 81% year over year.
  • 8 of the 10 fastest-growing categories of leaked secrets are tied to AI services.
  • Commits co-authored by Anthropic’s Claude Code leaked secrets at a rate of 3.2%, more than double the human-only baseline of 1.5%.
  • GitGuardian found 24,008 unique secrets exposed in MCP (Model Context Protocol) configuration files in its first year alone, with 8.8% of those being valid credentials.

This isn’t an indictment of AI coding tools. It’s a reflection of how they change workflows. Developers using AI assistants iterate faster, commit more frequently, and often spend less time reviewing what they’re pushing. LLM infrastructure — RAG pipelines, vector databases, orchestration layers — also generates a new category of service accounts and tokens that most security teams haven’t cataloged.

Why Traditional Remediation Fails

Here’s the part that should keep security leads up at night: most exposed secrets stay valid for years. GitGuardian’s 2025 analysis found that credentials detected as far back as 2022 — database passwords, cloud keys, API tokens — remain active today. The persistence problem is structural.

When a secret is found in a public repo, the typical response goes like this:

  1. Security gets an alert (maybe — if they’re scanning at all).
  2. Someone files a ticket.
  3. The developer who committed it has moved to another team or left the company.
  4. Nobody is sure which systems use that credential.
  5. The ticket sits open for weeks while the credential stays valid.

Meanwhile, automated scanners on Telegram, dark web forums, and GitHub mirrors pick up the secret within hours. The mean time to exploitation for a publicly exposed cloud credential is measured in minutes, not days.

Mapping Your Actual Attack Surface

Before you can fix secrets sprawl, you need to see it. That means running a discovery phase across every credential store — not just your vault.

Start with these five sources:

  • Public GitHub repos (your org + forks). Use tools like GitGuardian, TruffleHog, or Gitleaks to scan commit history, not just current code.
  • Internal repositories. Mirror your internal Git servers and run the same scanners. Secrets in private repos are one misconfigured permission away from public exposure.
  • CI/CD platforms. Audit environment variables in GitHub Actions, GitLab CI, Jenkins, CircleCI, and any other pipeline tool. These are magnets for long-lived credentials that nobody rotates.
  • Collaboration tools. Run keyword-based searches across Slack, Teams, Confluence, and internal wikis for patterns that match API keys and tokens.
  • Cloud provider consoles. AWS Secrets Manager, GCP Secret Manager, and Azure Key Vault tell you what’s centralized — but IAM access keys not stored in these services are the real blind spot.

A Practical Remediation Playbook

Once you’ve mapped the sprawl, here’s a phased approach that actually works.

Phase 1: Stop the Bleeding (Week 1-2)

  • Revoke and rotate every credential found in public repositories or collaboration tools. Don’t wait to understand the blast radius — rotate first, then investigate.
  • Enable pre-commit hooks (detect-secrets, gitleaks) on all developer machines. This catches secrets before they reach git history.
  • Deploy repository-level scanning as a CI check. Block PRs that contain detected secrets.

Phase 2: Centralize (Week 3-6)

  • Pick a primary secrets manager — HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or a managed platform like Infisical or Doppler. The specific tool matters less than committing to one.
  • Migrate application secrets to the chosen vault. Use dynamic secrets (short-lived, auto-rotated) wherever the target system supports them.
  • Eliminate .env files from codebases and CI variables. Replace with vault injection at runtime.

Phase 3: Harden (Ongoing)

  • Enforce short TTLs on all secrets. If you’re still using API keys that never expire, you’re accepting unnecessary risk. Aim for a maximum of 90 days; many cloud-native workloads can operate with credentials that last minutes.
  • Implement secretless authentication where possible. Workload identity federation (AWS IAM Roles for Service Accounts, GCP Workload Identity, Azure Managed Identities) eliminates the need to distribute credentials entirely.
  • Monitor NHI lifecycle. Track when service accounts and tokens are created, who has access, and when they were last used. Stale credentials are the ones most likely to be forgotten and exposed.

The Metrics That Matter

If you’re building a secrets security program, track these:

  • Secrets detection rate: How many hardcoded secrets are you finding per 1,000 commits? Trend this down.
  • Mean time to remediation (MTTR): From detection to rotation. Target: under 24 hours for public exposures.
  • Percentage of secrets in vaults: Of all credentials your organization uses, how many are managed centrally? Target: above 90%.
  • Dynamic vs. static ratio: What percentage of your secrets are short-lived and auto-rotated? Target: above 70% for production workloads.
  • Stale credential count: How many service accounts and tokens haven’t been rotated in 90+ days?

Architectural Patterns That Reduce Sprawl

The best secrets strategy reduces the number of secrets that exist in the first place.

Pattern 1: Workload Identity Federation. Instead of giving a Kubernetes pod an AWS access key, configure IRSA (IAM Roles for Service Accounts). The pod assumes a role via trust policy — no credential ever exists on disk or in environment variables.

Pattern 2: Brokered Access. Use a platform like HashiCorp Vault or AWS IAM to broker access to databases. Applications request a short-lived credential at startup. The credential expires in minutes. Even if it leaks, the window is tiny.

Pattern 3: Secret Injection via Sidecar or Init Container. Rather than baking secrets into application configuration, use a sidecar (Vault Agent Injector, External Secrets Operator) that fetches and mounts secrets as files or environment variables at pod creation time.

Pattern 4: MCP Configuration Hardening. If your teams use AI coding agents with MCP, treat MCP config files as sensitive artifacts. Store API keys in a vault and reference them via environment variables — never inline in JSON or YAML configs.

FAQ

Is HashiCorp Vault still the default choice?
Vault remains the most feature-complete option, but it’s operationally heavy. For teams that don’t need its full plugin ecosystem, managed solutions like AWS Secrets Manager, Infisical, or Doppler offer faster time-to-value with less maintenance overhead.

What about secrets in container images?
Scan your container registries. Secrets get baked into images more often than people admit — especially in multi-stage builds where an intermediate layer contains a .env file. Use tools like Trivy or Snyk Container with secret detection enabled.

How do we handle third-party vendor credentials?
Vendor-provided API keys are often long-lived and poorly scoped. Isolate them in a dedicated vault path, rotate them on a schedule (even if the vendor makes it painful), and use network-level restrictions (IP allowlists, VPC endpoints) to limit blast radius.

Should we block AI coding assistants?
No. But you should treat AI-generated code the same as any other code: mandatory pre-commit scanning, PR-level secret detection, and developer training on what constitutes a secret. The GitGuardian data shows AI-assisted commits leak at 2× the baseline — which means your guardrails need to account for that velocity.

The Bottom Line

Secrets sprawl isn’t a tooling problem you can buy your way out of. It’s a lifecycle governance problem that requires discovery, centralization, rotation, and — increasingly — architectural decisions that eliminate the need for distributed credentials in the first place. The 29 million secrets that hit public GitHub last year are a symptom. The disease is treating credentials as configuration instead of what they are: high-value attack surface that needs active management.

If your organization hasn’t mapped its non-human identity inventory, you’re operating blind. Start there. Then centralize, rotate, and move toward secretless patterns wherever the architecture allows. The attackers are already scanning. The question is whether you’re scanning faster.


References

  • GitGuardian, “The State of Secrets Sprawl 2026,” March 2026 — blog.gitguardian.com
  • GitGuardian / Yahoo Finance, “81% Surge of AI-Service Leaks as 29M Secrets Hit Public GitHub,” March 2026 — finance.yahoo.com
  • Oasis Security, “What Is Secret Sprawl?” — oasis.security
  • Akeyless, “Reining in Secrets Sprawl: A Guide to Effective Secrets Management” — akeyless.io
  • Aembit, “Secret Remediation Best Practices: A Step-by-Step Guide” — aembit.io
  • The Hacker News, “The Persistence Problem: Why Exposed Credentials Remain Valid,” May 2025 — thehackernews.com
  • NHI Management Group, “State of Secrets Sprawl 2026” summary — nhimg.org