CVE-2026-22807: Pre-Authentication Remote Code Execution via Unsafe Model Loading in vLLM

CVE ID: CVE-2026-22807
Affected Component: vLLM (model loading / auto_map resolution)
Vulnerability Type: Unsafe model loading leading to Remote Code Execution
Attack Vector: Network / Supply chain (model repository)
Authentication Required: No (pre-authentication)
User Interaction: Not required once a malicious model is loaded
CVSS v3.x Score: 8.8 (High)
Severity: High / Critical in production environments
Exploitability: High when untrusted or user-controlled models are loaded
Exploit Availability: No official public exploit published; exploitation is technically straightforward and feasible for educational and research purposes
Patch Status: Fixed in vLLM v0.14.0
Official Patch / Upgrade Link:
👉 https://github.com/vllm-project/vllm/releases/tag/v0.14.0


Overview

A critical security weakness was identified in vLLM related to how model code is loaded during initialization. Under certain conditions, Python code embedded inside a model repository could be executed automatically without explicit user consent. This execution occurs before authentication or API access controls are enforced, making the issue particularly dangerous for production deployments.

The flaw exists in the logic that processes model configuration fields such as auto_map, which are used to dynamically determine which Python classes should be loaded for a given model. Improper validation of these mappings allowed arbitrary Python modules from remote or local model repositories to be imported and executed during startup.

As a result, if a malicious or compromised model is loaded, arbitrary code may be executed on the host system running vLLM.


Root Cause

During model startup, vLLM reads metadata from the model’s configuration file (typically config.json).
This file may contain an auto_map field that tells vLLM where to find Python classes implementing model behavior.

The issue arises because:

  • The auto_map entries were resolved without enforcing trust boundaries
  • Remote or external Python modules could be fetched and imported automatically
  • The trust_remote_code safeguard was bypassed during this resolution process
  • Python imports were executed directly by the runtime

In practical terms, loading a model was enough to execute attacker-supplied Python code, even if the deployment operator never explicitly trusted that code.


Why This Is Dangerous

  • Code execution happens at model load time, not when handling requests
  • No API key, token, or user authentication is required
  • Model loading often occurs automatically in CI/CD pipelines, containers, or startup scripts
  • The vLLM process typically runs with access to GPUs, secrets, model caches, and internal networks

Once exploited, attackers may:

  • Execute arbitrary shell commands
  • Install persistence mechanisms
  • Exfiltrate API keys, credentials, or training data
  • Pivot laterally to other internal systems
  • Tamper with inference results or inject backdoors

Attack Scenarios

Scenario 1 – Malicious Public Model

  • A model repository is created or modified to include malicious Python code
  • The repository appears legitimate (typosquatting, reused namespace, or cloned popular model)
  • A vLLM deployment is configured to load this model by name
  • Code executes automatically during model initialization

Scenario 2 – User-Supplied Model

  • A service allows users to specify model paths or names
  • An attacker points the service to a crafted model repository
  • vLLM loads the model and executes embedded Python code

Scenario 3 – Compromised Local Model Directory

  • An attacker gains write access to a model directory
  • Malicious files are placed alongside model artifacts
  • vLLM loads the model and executes the injected code

Proof of Concept (Educational)

No official exploit code has been released publicly.
However, exploitation is conceptually simple and feasible for educational or research purposes.

A typical proof-of-concept would involve:

  • Creating a model repository with a manipulated config.json
  • Adding a Python module referenced by auto_map
  • Including code that executes upon import (e.g., command execution, file write)
  • Loading the model via vLLM

Detection & Monitoring Guidance

Because exploitation occurs during model loading, traditional API-level monitoring is insufficient. Detection must focus on startup behavior, process execution, file access, and network activity.


Key Log Sources to Monitor

  1. vLLM Application Logs
    • Model loading messages
    • auto_map resolution logs
    • Unexpected warnings or stack traces during startup
  2. Operating System Process Logs
    • Linux: auditd (execve)
    • Windows: Sysmon (Event ID 1)
    • Detection of Python subprocesses spawned by vLLM
  3. File Integrity Monitoring (FIM)
    • Creation or modification of .py files in model directories
    • Changes to config.json or tokenizer files
  4. Network Logs
    • Outbound HTTP(S) or Git traffic from inference servers
    • Unexpected connections to model hosting platforms
  5. Container Runtime Logs
    • New containers spawning processes at startup
    • Image pulls triggered unexpectedly

Indicators of Exploitation

  • Python processes executing from model cache directories
  • Unexpected shell commands launched by the vLLM process
  • Network access occurring during model load when no deployment was planned
  • Newly created Python files inside model directories
  • Sudden configuration changes without a corresponding release event

Sigma Detection Rules

Sigma Rule – vLLM Executing Python from Model Cache

title: vLLM Suspicious Python Execution During Model Load
id: 1f4c2d9a-9c34-4f21-b6d2-vllm22807
status: experimental
description: Detects Python execution originating from vLLM model cache directories.
author: Security Team
logsource:
  product: linux
  category: process_creation
detection:
  selection:
    ParentImage|contains:
      - "vllm"
    Image|endswith:
      - "python"
    CommandLine|contains:
      - ".cache"
      - "huggingface"
  condition: selection
level: high
tags:
  - attack.execution
  - attack.initial_access

Sigma Rule – Unexpected Network Activity at Startup

title: vLLM Unexpected Outbound Network During Startup
id: 9a7b6c1d-vllm-net-22807
status: experimental
description: Detects outbound network connections initiated by vLLM during model loading.
logsource:
  product: linux
  category: network_connection
detection:
  selection:
    ProcessName: "vllm"
    DestinationPort:
      - 443
      - 80
  condition: selection
level: medium
tags:
  - attack.command_and_control

Sigma Rule – Model Directory File Creation

title: vLLM Model Directory Python File Creation
id: 77b0c3aa-vllm-file-22807
status: experimental
description: Detects new Python files created in model directories.
logsource:
  product: linux
  category: file_event
detection:
  selection:
    TargetFilename|endswith: ".py"
    TargetFilename|contains:
      - "/models/"
      - "/.cache/"
  condition: selection
level: high
tags:
  - attack.persistence

Mitigation & Hardening Recommendations

  • Immediate upgrade to vLLM v0.14.0 or later
  • Avoid loading untrusted or user-supplied models
  • Enforce strict allow-lists for model repositories
  • Disable automatic remote code trust by default
  • Run vLLM in isolated containers or VMs
  • Restrict outbound network access from inference hosts
  • Apply file integrity monitoring on model directories
  • Treat model artifacts as executable supply-chain components

Security Classification

  • CWE: Improper Control of Code Execution (Code Injection)
  • MITRE ATT&CK Tactics:
    • Initial Access
    • Execution
    • Persistence (post-exploitation)
    • Defense Evasion (if malicious code hides activity)

Final Takeaway

This vulnerability highlights a broader industry risk: machine learning models are executable artifacts, not just data.
Any system that dynamically loads model code must treat models with the same level of scrutiny as third-party software dependencies.

Upgrading to the fixed version and implementing strong runtime monitoring are strongly advised.


Aegiron

Backed by 11+ years in cybersecurity and incident response, we decode the latest threats shaping today’s digital battlefield. This blog cuts through the noise with clear insights on vulnerabilities, emerging exploits, and the cyber news defenders can’t afford to miss.