Microsoft Uncovers Critical Flaw in Anthropic’s Claude Code Action, Allowing AI to Be Tricked Into Stealing Pipeline Secrets

The convergence of Large Language Models (LLMs) and continuous integration/continuous deployment (CI/CD) pipelines has given rise to “agentic” workflows. In these environments, autonomous AI agents are granted runtime capabilities to read code, triage issues, spin up pull requests, and execute commands within a build runner. While this drastically accelerates development velocity, it introduces a paradigm shift in the software supply chain attack surface.

A stark reminder of this risk emerged when the Microsoft Defender Security Research Team disclosed a critical vulnerability in Anthropic’s Claude Code GitHub Action. The vulnerability allowed untrusted external data—such as GitHub issue comments and pull request descriptions—to hijack the underlying AI agent via indirect prompt injection, ultimately exfiltrating highly sensitive pipeline secrets like the ANTHROPIC_API_KEY. By examining the mechanics of this disclosure, organizations can better understand the precarious nature of embedding deterministic software automation with non-deterministic natural language processors.

HTML comment hidden inside an issue opened by the actor

The Paradigm Shift: Deterministic Build Runners vs. Non-Deterministic AI Agents

Traditional CI/CD pipelines, such as GitHub Actions, operate on a deterministic execution model. Developers configure static YAML workflows to handle structured triggers—such as code pushes, merges, or cron schedules—by executing predefined scripts inside ephemeral virtual machines. In this legacy architecture, data and control logic are rigidly segregated. Inputs are treated strictly as parameters, and the pipeline executes only the explicit command strings hardcoded into the configuration.

When agentic AI tools like Claude Code Action are introduced to the environment, this fundamental boundary dissolves. Instead of following static, hardcoded routines, the AI agent ingests unstructured repository data, interprets natural language inputs, and dynamically decides which actions or software development tools to invoke. Because LLMs process instructions and data within the same context window, natural language text supplied by an untrusted external user can inadvertently be elevated to executable code. If the agent operates within a privileged runner environment containing administrative cloud keys or access tokens, a breakdown in runtime isolation allows an attacker to manipulate the agent into misusing its tool-calling privileges.

Dissecting the Exploit: Indirect Prompt Injection and Isolation Bypasses

The vulnerability discovered by Microsoft Threat Intelligence centered around an architectural discrepancy in how the Claude Code GitHub Action handled sandbox enforcement across its available tools. Anthropic engineered the Claude Code environment with explicit security guardrails: when a subprocess or terminal command was instantiated via the Bash tool, the Action deployed a secure, namespace-based Linux sandbox utilizing Bubblewrap. Simultaneously, an environment scrubbing mechanism enforced by CLAUDE_CODE_SUBPROCESS_ENV_SCRUB was automatically initialized for non-write repository contributors, ensuring that sensitive environment variables were wiped from the subprocess execution path.

However, a critical security boundary gap existed within the agent’s file system Read tool. Unlike the Bash tool, the Read tool was engineered as a direct, in-process function call that bypassed the Bubblewrap isolation mechanism entirely. To exploit this design discrepancy, researchers executed an indirect prompt injection attack by embedding adversarial instructions within a standard GitHub issue. When the agent parsed the issue body, the injected text overrode the agent’s core system instructions, commanding it to invoke its un-sandboxed Read tool to target the Linux process environment file located at /proc/self/environ. Because the Read tool ran directly within the main process space of the runner, it retained full access to the active environment variables, capturing the cleartext ANTHROPIC_API_KEY and other sensitive pipeline variables.

Evading AI Safety Filters and GitHub Secret Scanners

To successfully exfiltrate the harvested credential, the exploit had to circumvent two formidable layers of active defense: the LLM’s native safety alignment filters and GitHub’s automated Secret Scanner. Standard frontier models are explicitly aligned to refuse requests that involve dumping raw API keys, particularly when encountering strings that match known token signatures (such as Anthropic’s sk-ant- prefix).

To neutralize this guardrail, the adversarial payload relied on an LLM “jailbreak” technique. Rather than demanding the key outright, the prompt framed the request as a routine “compliance review” and instructed the model to truncate or omit the first seven characters of the string before printing the output. This semantic obfuscation laundered the data, tricking the model’s safety alignment into evaluating the payload as benign. Furthermore, by altering the structural signature of the token before writing it to standard output (stdout), the payload completely bypassed GitHub’s Secret Scanner, which scans live runner logs for exact cryptographic string patterns. Once printed to the log in its mutated format, the attacker could simply pull down the workflow log, prepend the missing prefix, and reconstruct the fully functional administrative key.

Architectural Mitigation: Enforcing the Agents “Rule of Two”

Defending agentic pipelines against prompt injection requires moving away from fragile prompt-engineering guardrails and adopting structural, Zero-Trust architectural constraints. The foundational pillar of this defensive strategy is the “Agents Rule of Two.” Under this model, an AI-powered CI/CD workflow must never simultaneously possess more than two of the following three capabilities:

Processing Untrusted Input: Reading data surfaces controlled by arbitrary external users, such as issue descriptions, community PR code changes, or public discussion comments.
Access to Sensitive Secrets: Maintaining active environment variables, cloud deployment tokens, or third-party platform keys within the memory space of the runner.
State Modification or External Communication: Holding tool-calling privileges that allow the agent to write back to the repository, invoke external webhooks via WebFetch, or execute raw commands via Bash.

If an AI agent must process untrusted input and requires an API key to communicate with its foundational LLM, it must be systematically denied the toolsets required to modify repository states or communicate with arbitrary external servers. Furthermore, tokens must be isolated to the absolute minimum required privileges. If an agent requires access to the repository, it should use highly scoped, short-lived tokens rather than permanent administrative credentials, ensuring that even if a prompt injection occurs, the blast radius is strictly contained.

Our Opinion on the Case

The Claude Code vulnerability represents a defining moment in modern software supply chain security, exposing the systemic fragility of marrying deterministic execution platforms with probabilistic AI models. In our view, the fundamental flaw here was not merely a programming oversight by Anthropic, but rather a conceptual failure to recognize that data is now code. For decades, computer science has worked to separate code execution from data storage; however, LLMs inherently unify the two into a single processing context.

Anthropic deserves commendation for its rapid mitigation response, which explicitly blocked access to sensitive /proc pathways in version 2.1.128. However, relying on blocklists for individual file paths is inherently a reactive approach to security. As long as software engineers treat LLMs as traditional software components without establishing absolute isolation boundaries around their data input vectors, variations of these injection exploits will persist.

Moving forward, the industry must stop treating system prompts or keyword filters as dependable security perimeters. If an autonomous agent is given a file-reading tool within a container that holds production-level tokens, it must be assumed that a clever prompt injection will eventually access those tokens. Organizations must decouple AI reasoning engines from high-privilege execution environments entirely, treating all outputs generated by an AI agent with the exact same suspicion as unvalidated code written by an anonymous external adversary.