LightLLM — Unauthenticated Remote Code Execution via pickle.loads()
CVE ID: CVE-2026-26220
Product: LightLLM
Affected Component: PD (Prefill-Decode) Disaggregation Mode – PD Master WebSocket Endpoints
Vulnerability Type: Unsafe Deserialization (CWE-502)
Attack Vector: Network
Authentication Required: No
User Interaction: None
Impact: Remote Code Execution (RCE)
CVSS Score: 9.3 (Critical)
Severity: Critical
Exploitability: High
Exploit Availability: Public proof-of-concept code has been observed in security research communities
Technical Description
A critical unsafe deserialization vulnerability was identified in LightLLM when operating in PD (Prefill-Decode) mode. The PD master service exposes WebSocket endpoints that accept binary messages from connected workers. The received binary payloads were passed directly into Python’s pickle.loads() function without authentication, validation, or integrity checks.
Because Python pickle deserialization allows arbitrary object reconstruction, crafted serialized objects can execute system-level commands during the deserialization process. If an attacker establishes a WebSocket connection to the PD master endpoint and submits a malicious pickle payload, arbitrary code can be executed on the server.
The service was intentionally designed to bind to a routable interface in PD deployments, meaning the vulnerable endpoint was reachable over the network. Since no authentication mechanism was enforced before deserialization, exploitation could be performed remotely without credentials.
This vulnerability results in full remote code execution under the privileges of the LightLLM process.
Affected Versions
LightLLM versions up to and including 1.1.0 running in PD mode were affected.
Any deployment where:
- PD Master was enabled
- The service was reachable over network interfaces
- No network-level isolation was enforced
was considered vulnerable.
Root Cause
The root cause was direct deserialization of untrusted network input using:
pickle.loads(untrusted_data)
Python pickle is not a safe format for untrusted input. It supports arbitrary object instantiation and execution of functions via the __reduce__ protocol. When deserialization occurs, embedded callable references can be executed immediately.
No authentication, signature verification, allowlist enforcement, or transport-level validation was implemented before invoking pickle.loads().
Attack Scenario
The following exploitation chain was observed:
- A WebSocket connection was established to the PD master endpoint.
- A legitimate-looking registration JSON message was sent.
- A malicious binary frame containing a crafted pickle object was transmitted.
- During deserialization, embedded code was executed.
- Arbitrary commands were executed on the host.
This could result in:
- Reverse shells
- File creation
- Credential theft
- Data exfiltration
- Lateral movement inside the network
- Container escape (if running in weakly configured environments)
Because exploitation occurred before authentication, internet-exposed deployments were at extreme risk.
Proof-of-Concept (Educational)
Public security researchers demonstrated exploitation using custom pickle objects that invoked system commands during deserialization.
The PoC structure typically included:
- A malicious class overriding
__reduce__ - A reference to
os.system,subprocess, or similar execution primitive - A serialized payload delivered over WebSocket as a binary frame
The payload did not require bypass techniques because the application directly trusted network input.
Impact Assessment
If exploited, the attacker gains:
- Full command execution capability
- Access to model memory and inference data
- Ability to modify model behavior
- Access to environment variables and secrets
- Potential pivot into internal GPU clusters
- Persistence via cron jobs, systemd services, or backdoors
In GPU clusters or AI inference environments, this could expose:
- API keys
- Internal model weights
- Customer data
- Distributed worker credentials
MITRE ATT&CK Mapping
Initial Access
T1190 – Exploit Public-Facing Application
Execution
T1059 – Command and Scripting Interpreter
Persistence
T1547 – Boot or Logon Autostart Execution
Defense Evasion
T1027 – Obfuscated Files or Information
Lateral Movement
T1021 – Remote Services
Detection Guidance
Log Sources to Monitor
- Application logs (LightLLM runtime logs)
- WebSocket gateway logs
- Reverse proxy logs (NGINX, Envoy)
- Firewall logs
- EDR telemetry
- Sysmon (Windows)
- auditd (Linux)
- Container runtime logs (Docker / Kubernetes)
Indicators of Exploitation
- WebSocket connections to PD endpoints from unknown IPs
- Binary WebSocket frames immediately after JSON registration
- Python processes spawning shell interpreters
- Unexpected child processes from LightLLM service
- Creation of suspicious files in
/tmp - Outbound network connections from inference servers
- Reverse shell traffic patterns
- Unusual CPU spikes during WebSocket traffic
Detection Rules
WebSocket Endpoint Access
index=web_logs
(uri_path="/pd_register" OR uri_path="/kv_move_status")
| stats count by src_ip, uri_path, status
Suspicious Python Child Processes
index=edr_logs
(Image="*python*" AND (CommandLine="*os.system*" OR CommandLine="*subprocess*" OR CommandLine="*/bin/sh*" OR CommandLine="*bash*" OR CommandLine="*nc*"))
| stats count by host, user, CommandLine
Linux auditd Monitoring
type=EXECVE
exe="/usr/bin/python*"
| grep -E "sh|bash|nc|curl|wget"
Sysmon Rule Logic
Detect when:
- ParentImage contains
python.exe - ChildImage is
cmd.exe,powershell.exe,bash.exe - EventID = 1 (Process Create)
Network-Based Detection
Alert when:
- WebSocket upgrade request to PD endpoint
- Followed by large binary payload (> 1KB)
- From non-worker IP address
Incident Response Recommendations
If exploitation is suspected:
- Immediately isolate the host.
- Capture volatile memory if possible.
- Collect application logs and WebSocket traffic logs.
- Review process execution history.
- Rotate API keys and credentials.
- Rebuild the system from trusted images.
- Validate no persistence mechanisms remain.
Simply restarting the service is not sufficient.
Mitigation
Immediate Mitigation
- Block external access to PD master ports at firewall level.
- Restrict access to trusted worker IPs only.
- Disable PD mode temporarily if possible.
Permanent Fix
Unsafe deserialization using pickle.loads() was removed and replaced with safer serialization mechanisms in the patched version.
All deployments should upgrade immediately.
Official Patch / Upgrade
Upgrade to the latest patched release of LightLLM from the official repository:
Official Repository & Releases Page:
https://github.com/ModelTC/LightLLM/releases
Upgrade using:
pip install --upgrade lightllm
Or deploy the latest container image from the official repository.
Only official releases from the LightLLM GitHub repository should be trusted.
Security Hardening Recommendations
- Never expose PD master directly to the internet.
- Enforce mutual TLS between nodes.
- Implement authentication tokens for worker registration.
- Use network segmentation.
- Monitor for unsafe deserialization patterns in code reviews.
- Disable pickle for any network boundary.
Risk Summary
This vulnerability represents a textbook unsafe deserialization issue with full remote code execution impact. Because authentication was not required and the service was network-accessible by design, exploitation difficulty was low. Public research has already demonstrated real-world exploitability.
Organizations running distributed inference clusters should treat this vulnerability as high priority and verify that no exposed PD master instances remain unpatched.
