CVE ID: CVE-2026-22807
Affected Component: vLLM (model loading / auto_map resolution)
Vulnerability Type: Unsafe model loading leading to Remote Code Execution
Attack Vector: Network / Supply chain (model repository)
Authentication Required: No (pre-authentication)
User Interaction: Not required once a malicious model is loaded
CVSS v3.x Score: 8.8 (High)
Severity: High / Critical in production environments
Exploitability: High when untrusted or user-controlled models are loaded
Exploit Availability: No official public exploit published; exploitation is technically straightforward and feasible for educational and research purposes
Patch Status: Fixed in vLLM v0.14.0
Official Patch / Upgrade Link:
👉 https://github.com/vllm-project/vllm/releases/tag/v0.14.0
Overview
A critical security weakness was identified in vLLM related to how model code is loaded during initialization. Under certain conditions, Python code embedded inside a model repository could be executed automatically without explicit user consent. This execution occurs before authentication or API access controls are enforced, making the issue particularly dangerous for production deployments.
The flaw exists in the logic that processes model configuration fields such as auto_map, which are used to dynamically determine which Python classes should be loaded for a given model. Improper validation of these mappings allowed arbitrary Python modules from remote or local model repositories to be imported and executed during startup.
As a result, if a malicious or compromised model is loaded, arbitrary code may be executed on the host system running vLLM.
Root Cause
During model startup, vLLM reads metadata from the model’s configuration file (typically config.json).
This file may contain an auto_map field that tells vLLM where to find Python classes implementing model behavior.
The issue arises because:
- The
auto_mapentries were resolved without enforcing trust boundaries - Remote or external Python modules could be fetched and imported automatically
- The
trust_remote_codesafeguard was bypassed during this resolution process - Python imports were executed directly by the runtime
In practical terms, loading a model was enough to execute attacker-supplied Python code, even if the deployment operator never explicitly trusted that code.
Why This Is Dangerous
- Code execution happens at model load time, not when handling requests
- No API key, token, or user authentication is required
- Model loading often occurs automatically in CI/CD pipelines, containers, or startup scripts
- The vLLM process typically runs with access to GPUs, secrets, model caches, and internal networks
Once exploited, attackers may:
- Execute arbitrary shell commands
- Install persistence mechanisms
- Exfiltrate API keys, credentials, or training data
- Pivot laterally to other internal systems
- Tamper with inference results or inject backdoors
Attack Scenarios
Scenario 1 – Malicious Public Model
- A model repository is created or modified to include malicious Python code
- The repository appears legitimate (typosquatting, reused namespace, or cloned popular model)
- A vLLM deployment is configured to load this model by name
- Code executes automatically during model initialization
Scenario 2 – User-Supplied Model
- A service allows users to specify model paths or names
- An attacker points the service to a crafted model repository
- vLLM loads the model and executes embedded Python code
Scenario 3 – Compromised Local Model Directory
- An attacker gains write access to a model directory
- Malicious files are placed alongside model artifacts
- vLLM loads the model and executes the injected code
Proof of Concept (Educational)
No official exploit code has been released publicly.
However, exploitation is conceptually simple and feasible for educational or research purposes.
A typical proof-of-concept would involve:
- Creating a model repository with a manipulated
config.json - Adding a Python module referenced by
auto_map - Including code that executes upon import (e.g., command execution, file write)
- Loading the model via vLLM
Detection & Monitoring Guidance
Because exploitation occurs during model loading, traditional API-level monitoring is insufficient. Detection must focus on startup behavior, process execution, file access, and network activity.
Key Log Sources to Monitor
- vLLM Application Logs
- Model loading messages
auto_mapresolution logs- Unexpected warnings or stack traces during startup
- Operating System Process Logs
- Linux: auditd (
execve) - Windows: Sysmon (Event ID 1)
- Detection of Python subprocesses spawned by vLLM
- Linux: auditd (
- File Integrity Monitoring (FIM)
- Creation or modification of
.pyfiles in model directories - Changes to
config.jsonor tokenizer files
- Creation or modification of
- Network Logs
- Outbound HTTP(S) or Git traffic from inference servers
- Unexpected connections to model hosting platforms
- Container Runtime Logs
- New containers spawning processes at startup
- Image pulls triggered unexpectedly
Indicators of Exploitation
- Python processes executing from model cache directories
- Unexpected shell commands launched by the vLLM process
- Network access occurring during model load when no deployment was planned
- Newly created Python files inside model directories
- Sudden configuration changes without a corresponding release event
Sigma Detection Rules
Sigma Rule – vLLM Executing Python from Model Cache
title: vLLM Suspicious Python Execution During Model Load
id: 1f4c2d9a-9c34-4f21-b6d2-vllm22807
status: experimental
description: Detects Python execution originating from vLLM model cache directories.
author: Security Team
logsource:
product: linux
category: process_creation
detection:
selection:
ParentImage|contains:
- "vllm"
Image|endswith:
- "python"
CommandLine|contains:
- ".cache"
- "huggingface"
condition: selection
level: high
tags:
- attack.execution
- attack.initial_access
Sigma Rule – Unexpected Network Activity at Startup
title: vLLM Unexpected Outbound Network During Startup
id: 9a7b6c1d-vllm-net-22807
status: experimental
description: Detects outbound network connections initiated by vLLM during model loading.
logsource:
product: linux
category: network_connection
detection:
selection:
ProcessName: "vllm"
DestinationPort:
- 443
- 80
condition: selection
level: medium
tags:
- attack.command_and_control
Sigma Rule – Model Directory File Creation
title: vLLM Model Directory Python File Creation
id: 77b0c3aa-vllm-file-22807
status: experimental
description: Detects new Python files created in model directories.
logsource:
product: linux
category: file_event
detection:
selection:
TargetFilename|endswith: ".py"
TargetFilename|contains:
- "/models/"
- "/.cache/"
condition: selection
level: high
tags:
- attack.persistence
Mitigation & Hardening Recommendations
- Immediate upgrade to vLLM v0.14.0 or later
- Avoid loading untrusted or user-supplied models
- Enforce strict allow-lists for model repositories
- Disable automatic remote code trust by default
- Run vLLM in isolated containers or VMs
- Restrict outbound network access from inference hosts
- Apply file integrity monitoring on model directories
- Treat model artifacts as executable supply-chain components
Security Classification
- CWE: Improper Control of Code Execution (Code Injection)
- MITRE ATT&CK Tactics:
- Initial Access
- Execution
- Persistence (post-exploitation)
- Defense Evasion (if malicious code hides activity)
Final Takeaway
This vulnerability highlights a broader industry risk: machine learning models are executable artifacts, not just data.
Any system that dynamically loads model code must treat models with the same level of scrutiny as third-party software dependencies.
Upgrading to the fixed version and implementing strong runtime monitoring are strongly advised.
