CVE-2026-22807: Pre-Authentication Remote Code Execution via Unsafe Model Loading in vLLM

CVE ID: CVE-2026-22807
Affected Component: vLLM (model loading / auto_map resolution)
Vulnerability Type: Unsafe model loading leading to Remote Code Execution
Attack Vector: Network / Supply chain (model repository)
Authentication Required: No (pre-authentication)
User Interaction: Not required once a malicious model is loaded
CVSS v3.x Score: 8.8 (High)
Severity: High / Critical in production environments
Exploitability: High when untrusted or user-controlled models are loaded
Exploit Availability: No official public exploit published; exploitation is technically straightforward and feasible for educational and research purposes
Patch Status: Fixed in vLLM v0.14.0
Official Patch / Upgrade Link:
👉 https://github.com/vllm-project/vllm/releases/tag/v0.14.0

Overview

A critical security weakness was identified in vLLM related to how model code is loaded during initialization. Under certain conditions, Python code embedded inside a model repository could be executed automatically without explicit user consent. This execution occurs before authentication or API access controls are enforced, making the issue particularly dangerous for production deployments.

The flaw exists in the logic that processes model configuration fields such as auto_map, which are used to dynamically determine which Python classes should be loaded for a given model. Improper validation of these mappings allowed arbitrary Python modules from remote or local model repositories to be imported and executed during startup.

As a result, if a malicious or compromised model is loaded, arbitrary code may be executed on the host system running vLLM.

Root Cause

During model startup, vLLM reads metadata from the model’s configuration file (typically config.json).
This file may contain an auto_map field that tells vLLM where to find Python classes implementing model behavior.

The issue arises because:

The auto_map entries were resolved without enforcing trust boundaries
Remote or external Python modules could be fetched and imported automatically
The trust_remote_code safeguard was bypassed during this resolution process
Python imports were executed directly by the runtime

In practical terms, loading a model was enough to execute attacker-supplied Python code, even if the deployment operator never explicitly trusted that code.

Why This Is Dangerous

Code execution happens at model load time, not when handling requests
No API key, token, or user authentication is required
Model loading often occurs automatically in CI/CD pipelines, containers, or startup scripts
The vLLM process typically runs with access to GPUs, secrets, model caches, and internal networks

Once exploited, attackers may:

Execute arbitrary shell commands
Install persistence mechanisms
Exfiltrate API keys, credentials, or training data
Pivot laterally to other internal systems
Tamper with inference results or inject backdoors

Attack Scenarios

Scenario 1 – Malicious Public Model

A model repository is created or modified to include malicious Python code
The repository appears legitimate (typosquatting, reused namespace, or cloned popular model)
A vLLM deployment is configured to load this model by name
Code executes automatically during model initialization

Scenario 2 – User-Supplied Model

A service allows users to specify model paths or names
An attacker points the service to a crafted model repository
vLLM loads the model and executes embedded Python code

Scenario 3 – Compromised Local Model Directory

An attacker gains write access to a model directory
Malicious files are placed alongside model artifacts
vLLM loads the model and executes the injected code

Proof of Concept (Educational)

No official exploit code has been released publicly.
However, exploitation is conceptually simple and feasible for educational or research purposes.

A typical proof-of-concept would involve:

Creating a model repository with a manipulated config.json
Adding a Python module referenced by auto_map
Including code that executes upon import (e.g., command execution, file write)
Loading the model via vLLM

Detection & Monitoring Guidance

Because exploitation occurs during model loading, traditional API-level monitoring is insufficient. Detection must focus on startup behavior, process execution, file access, and network activity.

Key Log Sources to Monitor

vLLM Application Logs
- Model loading messages
- auto_map resolution logs
- Unexpected warnings or stack traces during startup
Operating System Process Logs
- Linux: auditd (execve)
- Windows: Sysmon (Event ID 1)
- Detection of Python subprocesses spawned by vLLM
File Integrity Monitoring (FIM)
- Creation or modification of .py files in model directories
- Changes to config.json or tokenizer files
Network Logs
- Outbound HTTP(S) or Git traffic from inference servers
- Unexpected connections to model hosting platforms
Container Runtime Logs
- New containers spawning processes at startup
- Image pulls triggered unexpectedly

Indicators of Exploitation

Python processes executing from model cache directories
Unexpected shell commands launched by the vLLM process
Network access occurring during model load when no deployment was planned
Newly created Python files inside model directories
Sudden configuration changes without a corresponding release event

Sigma Detection Rules

Sigma Rule – vLLM Executing Python from Model Cache

title: vLLM Suspicious Python Execution During Model Load
id: 1f4c2d9a-9c34-4f21-b6d2-vllm22807
status: experimental
description: Detects Python execution originating from vLLM model cache directories.
author: Security Team
logsource:
  product: linux
  category: process_creation
detection:
  selection:
    ParentImage|contains:
      - "vllm"
    Image|endswith:
      - "python"
    CommandLine|contains:
      - ".cache"
      - "huggingface"
  condition: selection
level: high
tags:
  - attack.execution
  - attack.initial_access

Sigma Rule – Unexpected Network Activity at Startup

title: vLLM Unexpected Outbound Network During Startup
id: 9a7b6c1d-vllm-net-22807
status: experimental
description: Detects outbound network connections initiated by vLLM during model loading.
logsource:
  product: linux
  category: network_connection
detection:
  selection:
    ProcessName: "vllm"
    DestinationPort:
      - 443
      - 80
  condition: selection
level: medium
tags:
  - attack.command_and_control

Sigma Rule – Model Directory File Creation

title: vLLM Model Directory Python File Creation
id: 77b0c3aa-vllm-file-22807
status: experimental
description: Detects new Python files created in model directories.
logsource:
  product: linux
  category: file_event
detection:
  selection:
    TargetFilename|endswith: ".py"
    TargetFilename|contains:
      - "/models/"
      - "/.cache/"
  condition: selection
level: high
tags:
  - attack.persistence

Mitigation & Hardening Recommendations

Immediate upgrade to vLLM v0.14.0 or later
Avoid loading untrusted or user-supplied models
Enforce strict allow-lists for model repositories
Disable automatic remote code trust by default
Run vLLM in isolated containers or VMs
Restrict outbound network access from inference hosts
Apply file integrity monitoring on model directories
Treat model artifacts as executable supply-chain components

Security Classification

CWE: Improper Control of Code Execution (Code Injection)
MITRE ATT&CK Tactics:
- Initial Access
- Execution
- Persistence (post-exploitation)
- Defense Evasion (if malicious code hides activity)

Final Takeaway

This vulnerability highlights a broader industry risk: machine learning models are executable artifacts, not just data.
Any system that dynamically loads model code must treat models with the same level of scrutiny as third-party software dependencies.

Upgrading to the fixed version and implementing strong runtime monitoring are strongly advised.