CVE-2026-26220: Critical LightLLM Flaw Enables Unauthenticated Remote Code Execution via Unsafe Pickle Deserialization

LightLLM — Unauthenticated Remote Code Execution via `pickle.loads()`

CVE ID: CVE-2026-26220
Product: LightLLM
Affected Component: PD (Prefill-Decode) Disaggregation Mode – PD Master WebSocket Endpoints
Vulnerability Type: Unsafe Deserialization (CWE-502)
Attack Vector: Network
Authentication Required: No
User Interaction: None
Impact: Remote Code Execution (RCE)
CVSS Score: 9.3 (Critical)
Severity: Critical
Exploitability: High
Exploit Availability: Public proof-of-concept code has been observed in security research communities

Technical Description

A critical unsafe deserialization vulnerability was identified in LightLLM when operating in PD (Prefill-Decode) mode. The PD master service exposes WebSocket endpoints that accept binary messages from connected workers. The received binary payloads were passed directly into Python’s pickle.loads() function without authentication, validation, or integrity checks.

Because Python pickle deserialization allows arbitrary object reconstruction, crafted serialized objects can execute system-level commands during the deserialization process. If an attacker establishes a WebSocket connection to the PD master endpoint and submits a malicious pickle payload, arbitrary code can be executed on the server.

The service was intentionally designed to bind to a routable interface in PD deployments, meaning the vulnerable endpoint was reachable over the network. Since no authentication mechanism was enforced before deserialization, exploitation could be performed remotely without credentials.

This vulnerability results in full remote code execution under the privileges of the LightLLM process.

Affected Versions

LightLLM versions up to and including 1.1.0 running in PD mode were affected.

Any deployment where:

PD Master was enabled
The service was reachable over network interfaces
No network-level isolation was enforced

was considered vulnerable.

Root Cause

The root cause was direct deserialization of untrusted network input using:

pickle.loads(untrusted_data)

Python pickle is not a safe format for untrusted input. It supports arbitrary object instantiation and execution of functions via the __reduce__ protocol. When deserialization occurs, embedded callable references can be executed immediately.

No authentication, signature verification, allowlist enforcement, or transport-level validation was implemented before invoking pickle.loads().

Attack Scenario

The following exploitation chain was observed:

A WebSocket connection was established to the PD master endpoint.
A legitimate-looking registration JSON message was sent.
A malicious binary frame containing a crafted pickle object was transmitted.
During deserialization, embedded code was executed.
Arbitrary commands were executed on the host.

This could result in:

Reverse shells
File creation
Credential theft
Data exfiltration
Lateral movement inside the network
Container escape (if running in weakly configured environments)

Because exploitation occurred before authentication, internet-exposed deployments were at extreme risk.

Proof-of-Concept (Educational)

Public security researchers demonstrated exploitation using custom pickle objects that invoked system commands during deserialization.

The PoC structure typically included:

A malicious class overriding __reduce__
A reference to os.system, subprocess, or similar execution primitive
A serialized payload delivered over WebSocket as a binary frame

The payload did not require bypass techniques because the application directly trusted network input.

Impact Assessment

If exploited, the attacker gains:

Full command execution capability
Access to model memory and inference data
Ability to modify model behavior
Access to environment variables and secrets
Potential pivot into internal GPU clusters
Persistence via cron jobs, systemd services, or backdoors

In GPU clusters or AI inference environments, this could expose:

API keys
Internal model weights
Customer data
Distributed worker credentials

MITRE ATT&CK Mapping

Initial Access
T1190 – Exploit Public-Facing Application

Execution
T1059 – Command and Scripting Interpreter

Persistence
T1547 – Boot or Logon Autostart Execution

Defense Evasion
T1027 – Obfuscated Files or Information

Lateral Movement
T1021 – Remote Services

Detection Guidance

Log Sources to Monitor

Application logs (LightLLM runtime logs)
WebSocket gateway logs
Reverse proxy logs (NGINX, Envoy)
Firewall logs
EDR telemetry
Sysmon (Windows)
auditd (Linux)
Container runtime logs (Docker / Kubernetes)

Indicators of Exploitation

WebSocket connections to PD endpoints from unknown IPs
Binary WebSocket frames immediately after JSON registration
Python processes spawning shell interpreters
Unexpected child processes from LightLLM service
Creation of suspicious files in /tmp
Outbound network connections from inference servers
Reverse shell traffic patterns
Unusual CPU spikes during WebSocket traffic

Detection Rules

WebSocket Endpoint Access

index=web_logs 
(uri_path="/pd_register" OR uri_path="/kv_move_status")
| stats count by src_ip, uri_path, status

Suspicious Python Child Processes

index=edr_logs 
(Image="*python*" AND (CommandLine="*os.system*" OR CommandLine="*subprocess*" OR CommandLine="*/bin/sh*" OR CommandLine="*bash*" OR CommandLine="*nc*"))
| stats count by host, user, CommandLine

Linux auditd Monitoring

type=EXECVE 
exe="/usr/bin/python*" 
| grep -E "sh|bash|nc|curl|wget"

Sysmon Rule Logic

Detect when:

ParentImage contains python.exe
ChildImage is cmd.exe, powershell.exe, bash.exe
EventID = 1 (Process Create)

Network-Based Detection

Alert when:

WebSocket upgrade request to PD endpoint
Followed by large binary payload (> 1KB)
From non-worker IP address

Incident Response Recommendations

If exploitation is suspected:

Immediately isolate the host.
Capture volatile memory if possible.
Collect application logs and WebSocket traffic logs.
Review process execution history.
Rotate API keys and credentials.
Rebuild the system from trusted images.
Validate no persistence mechanisms remain.

Simply restarting the service is not sufficient.

Mitigation

Immediate Mitigation

Block external access to PD master ports at firewall level.
Restrict access to trusted worker IPs only.
Disable PD mode temporarily if possible.

Permanent Fix

Unsafe deserialization using pickle.loads() was removed and replaced with safer serialization mechanisms in the patched version.

All deployments should upgrade immediately.

Official Patch / Upgrade

Upgrade to the latest patched release of LightLLM from the official repository:

Official Repository & Releases Page:
https://github.com/ModelTC/LightLLM/releases

Upgrade using:

pip install --upgrade lightllm

Or deploy the latest container image from the official repository.

Only official releases from the LightLLM GitHub repository should be trusted.

Security Hardening Recommendations

Never expose PD master directly to the internet.
Enforce mutual TLS between nodes.
Implement authentication tokens for worker registration.
Use network segmentation.
Monitor for unsafe deserialization patterns in code reviews.
Disable pickle for any network boundary.

Risk Summary

This vulnerability represents a textbook unsafe deserialization issue with full remote code execution impact. Because authentication was not required and the service was network-accessible by design, exploitation difficulty was low. Public research has already demonstrated real-world exploitability.

Organizations running distributed inference clusters should treat this vulnerability as high priority and verify that no exposed PD master instances remain unpatched.