Critical Flaws in AI Coding Tools Expose Developers to Silent Sandbox Escapes and Full System Compromise

Artificial Intelligence is rapidly transforming modern software development. AI-powered coding assistants and CLI agents are no longer experimental tools—they are now deeply embedded in developer workflows. From generating code to auditing vulnerabilities, tools like Claude Code, Gemini CLI, Codex CLI, Cursor, and GitHub Copilot are shaping the future of engineering productivity.

However, beneath this convenience lies a growing and largely underestimated security risk.

Recent research has uncovered a new class of vulnerabilities that fundamentally challenge the trust model of these tools. These flaws allow attackers to bypass sandbox protections, execute arbitrary code, and compromise both local and cloud environments.

Source : cymulate

The Rise of AI Agents—and Their Expanding Attack Surface

AI coding agents are designed to simplify complex workflows. They can write, review, and execute code through natural language commands. To mitigate risks, many of these tools implement sandbox environments—isolated execution layers intended to contain potentially harmful operations.

The promise is simple: even if something goes wrong, the damage remains confined.

But the reality is far more concerning.

Across multiple platforms, researchers identified recurring weaknesses in how these tools:

  • Enforce isolation boundaries
  • Handle configuration files
  • Trust user-controlled or LLM-generated input

These weaknesses introduce a completely new attack surface—one that blends traditional software vulnerabilities with AI-driven behaviors.


Understanding Configuration-Based Sandbox Escape (CBSE)

One of the most critical findings is a vulnerability class known as Configuration-Based Sandbox Escape (CBSE).

Unlike traditional sandbox escapes that exploit operating system flaws, CBSE attacks target the application layer itself—specifically, how AI tools manage configuration and execution logic.

How CBSE Works

At a high level, the attack follows a simple but powerful pattern:

  1. The attacker gains execution inside the sandbox (via prompt injection or malicious code).
  2. They modify configuration files that control the AI agent’s behavior.
  3. These configurations are later executed outside the sandbox.
  4. The attacker achieves code execution on the host system.

The key issue?
The sandbox protects the operating system—but not the configuration layer that controls execution.


Why Sandboxes Are Failing

The root cause across multiple tools is consistent:

The sandbox is treated as the security boundary, while the actual control layer (configuration and startup logic) remains writable.

This creates a dangerous gap.

Most AI CLI tools use layered configurations:

  • Enterprise-level policies
  • User-level settings
  • Project-level configurations

The problem lies in the project-level configuration, which:

  • Loads automatically on startup
  • Controls execution behavior
  • Is writable from inside the sandbox

This effectively allows attackers to plant persistent backdoors without breaking the sandbox itself.


Case Study 1: Claude Code Sandbox Escape

One of the most impactful vulnerabilities was identified in Claude Code.

Key Issue

A configuration file (.claude/settings.json) was not protected unless it already existed at startup.

This created a default insecure state.

Attack Flow

  • The attacker executes code inside the sandbox
  • They create a malicious configuration file with a startup hook
  • The sandbox session ends
  • On the next run, the host loads the malicious configuration
  • The payload executes with full user privileges

This results in:

  • Full sandbox escape
  • Persistent execution
  • Access to sensitive data (credentials, tokens, source code)

Why This Matters

The attack does not require breaking the sandbox at the OS level.
Instead, it uses the tool’s own features against itself.


Source : cymulate

Case Study 2: Gemini CLI Vulnerabilities

Gemini CLI introduces even more severe risks, with multiple independent attack paths.

1. Writable Host Configuration

The tool mounts sensitive directories (like .gemini) into the sandbox with write permissions.

This allows attackers to:

  • Modify persistent configuration files
  • Inject malicious startup commands
  • Execute code on the host during future sessions

2. OAuth Token Theft

The sandbox exposes authentication tokens stored in:

  • oauth_creds.json

An attacker can:

  • Extract the token
  • Impersonate the user
  • Access cloud resources
  • Move laterally across infrastructure

This is particularly dangerous because:

  • The attack does not require persistence
  • The compromise extends beyond the local machine

3. Unsafe Executable Resolution (Windows)

Gemini CLI also suffers from a classic vulnerability:

  • It searches for executables in the current directory before system paths

This allows attackers to plant malicious binaries (e.g., where.exe) that execute before the sandbox even initializes.

Source : cymulate

Case Study 3: Codex CLI (OpenAI)

Codex CLI reveals another variation of the same underlying problem: trust boundary failure.

Core Issue

The tool allows file creation inside a .codex directory—even though this directory controls security settings.

If the directory does not exist (default case), it can be created without restrictions.

Attack Techniques

1. Notify-Based Execution

Attackers inject a command that runs after every agent action—outside the sandbox.

2. Full Sandbox Disable

By modifying configuration:

  • Sandbox protections are removed
  • User approvals are disabled
  • Malicious scripts execute automatically

The most alarming aspect?

The tool may still appear to be running in sandbox mode.


A Pattern Across Vendors

Despite differences in implementation, all tools share the same architectural flaw:

  • Writable configuration controls execution
  • Configuration is loaded outside the sandbox
  • No integrity validation is enforced

This creates a universal attack pattern:

Sandbox → Configuration Injection → Host Execution


Vendor Response: A Mixed Picture

The response from vendors highlights a broader issue in the industry.

VendorResponse
AnthropicFixed quickly
GoogleNo fix after 90+ days
OpenAIClosed as informational

This inconsistency raises concerns about how seriously AI security is being treated.


Security Implications for Organizations

Organizations adopting AI coding agents must rethink their security assumptions.

These tools:

  • Have access to source code
  • Handle credentials (AWS, GitHub, SSH)
  • Interact with cloud environments

A compromised agent can become a powerful attack vector.

Key Risks

  • Arbitrary code execution
  • Credential theft
  • Persistent backdoors
  • Cloud compromise
  • Lateral movement

Mitigation Strategies

While vendors work on fixes, organizations should take proactive steps.

1. Treat AI Agents as Privileged Software

Do not treat them as simple tools—they operate with high levels of access.

2. Use External Sandboxing

Rely on:

  • Virtual machines
  • Isolated containers
  • Strict filesystem controls

3. Audit Configuration Files

Regularly inspect:

  • .claude/
  • .gemini/
  • .codex/

4. Restrict Permissions

Apply least-privilege principles:

  • Limit cloud access
  • Avoid admin-level accounts

5. Monitor Behavior

Detect anomalies such as:

  • Unexpected file changes
  • Suspicious outbound traffic
  • Unusual command execution

Our Opinion on This Case

The findings presented in this research expose a critical gap between innovation and security in the AI tooling ecosystem. While AI coding agents are marketed as productivity enhancers—and even security assistants—their underlying architectures often fail to meet basic security expectations.

What stands out most is not just the presence of vulnerabilities, but the systemic nature of the design flaws. These are not isolated bugs; they are architectural oversights that repeat across vendors. This suggests that the industry is prioritizing speed of deployment over secure-by-design principles.

Equally concerning is the inconsistency in vendor response. While some organizations demonstrated responsible disclosure handling, others dismissed or ignored serious risks. This lack of alignment undermines trust and increases exposure for end users.

In our view, the concept of a “sandbox” in AI tools is currently misleading. It creates a false sense of security while leaving critical control layers exposed. Until stronger isolation models and configuration integrity checks are universally adopted, these tools should be treated as high-risk, privileged software.

Organizations must not wait for vendors to catch up. Proactive defense, strict isolation, and continuous monitoring are essential to safely leverage AI in development workflows.