Vulnerability Overview
- CVE: CVE-2026-21869
- Severity: High
- CVSS Score: 8.8 (High impact across confidentiality, integrity, and availability)
- Impact: Denial of Service (crash), Memory Corruption, Potential for Remote Code Execution
- Exploitability: Easily reachable via normal HTTP API calls
- Exploit Availability: Known proof-of-concept
What Is This Vulnerability?
CVE-2026-21869 is a programming flaw in the widely used llama.cpp AI model server. Specifically, the server takes input from HTTP requests, converts that input into internal parameters, and then uses those parameters to manage the model’s memory buffer where tokens and context are stored.
The bug happens when a specific parameter — called n_discard — is allowed to be negative. Normally this number tells the server how many tokens to “throw away” when the context window fills up. But if someone sends a negative number here, the server will perform incorrect calculations that move pointers and indexes outside the valid memory area. In programming terms, this is called an out-of-bounds write.
This is serious because a write outside valid memory can cause:
- A crash (quickly rendering the service unavailable)
- Corruption of internal memory
- In certain builds or environments where memory protections are weak, it can lead to remote code execution — meaning an attacker might make the server run arbitrary instructions
In short, malformed user input can corrupt server memory and break or take over the server.
How an Attacker Could Trigger This
A vulnerable llama.cpp server accepts JSON requests such as:
POST /completions
{
"model": "some-model",
"input": "some text",
"n_discard": <some number>
}
If the value provided for n_discard is allowed to be negative, and if the server needs to shift its internal context window (because it has too many tokens in memory), it attempts to recalculate token positions using a negative number. This results in pointer and index calculations that move outside the allowed memory range.
An attacker does not need credentials or special access. Simply sending a crafted JSON request to the server is enough to hit this vulnerable code path.
As long as the server:
- has context shifting enabled
- is running a vulnerable build
- is reachable over the network
the flaw can be triggered.
The immediate result is usually a server crash. On servers compiled without memory safety checks, the attacker can sometimes overwrite internal data structures in a controllable way, forming the basis for a potential remote code execution path.
This behavior is deterministic and can be reproduced reliably in a laboratory environment.
Proof of Concept (PoC) – Educational Use Only
⚠️ WARNING
This PoC is intended only for learning, validation, and defensive research in a controlled laboratory environment.
Do NOT run this against production systems, public servers, or systems you do not own or have explicit permission to test.
PoC Objective
Demonstrate that supplying a negative n_discard value to a vulnerable llama.cpp HTTP server triggers a deterministic crash or memory corruption when context shifting occurs.
Preconditions
- Vulnerable version of
llama.cpp llama-serverrunning locally or in an isolated test environment- Context shifting enabled (default behavior)
- Server reachable on a test port (example:
8080)
PoC Example (Crash Demonstration)
curl -X POST http://127.0.0.1:8080/completions \
-H "Content-Type: application/json" \
-d '{
"model": "test-model",
"input": "Generate enough text to force the context window to fill and trigger a context shift.",
"n_discard": -32
}'
Expected Outcome
When the server attempts to manage its token context:
- Invalid index calculations occur internally
- Memory is accessed or written outside valid bounds
- The process terminates unexpectedly
Typical observable results include:
- Immediate server crash
SIGSEGVorSIGABRTin system logs- Assertion failure or sanitizer output (in debug builds)
- Core dump generation
The behavior is repeatable and deterministic under the same conditions.
Why This PoC Works
The negative n_discard value is accepted without validation and propagated into internal context management logic. During context shifting, this causes incorrect pointer and index calculations, resulting in an out-of-bounds write.
Detection and Monitoring
1. Web Server Logs – Malformed JSON
Look for negative values in the n_discard field within HTTP request bodies.
Example pattern:
"n_discard": -32
Any POST request to completion endpoints containing a minus sign in front of the n_discard value should be treated as suspicious. Even basic string matching can be effective.
2. Abnormal Crash Activity
Frequent or sudden crashes of the model server — especially during routine requests — may indicate attempted exploitation.
Watch for:
- SIGSEGV or SIGABRT signals
- Core dump generation
- Repeated crash-restart loops
3. Application Log Errors
During context shifting, invalid index calculations may produce internal error messages such as:
pos_min == -1
These messages strongly suggest invalid memory handling caused by malformed input.
4. Network IDS / WAF Indicators
Intrusion detection systems can flag requests such as:
POST /completions with "n_discard": -#
While tuning is necessary to avoid false positives, even basic rules can surface exploitation attempts.
Potential Signs of Exploitation
- A filtered or blocked request containing negative
n_discardimmediately followed by a crash - A server crash directly after receiving malformed JSON from a specific client IP
- Core dumps indicating out-of-bounds memory writes
When observed together, these indicators strongly suggest an exploitation attempt.
How This Could Lead to Code Execution
An out-of-bounds write allows a program to write past its intended memory region. In C or C++ applications, this may overwrite:
- Internal state variables
- Function pointers
- Return addresses
- Virtual function tables (vtables)
If an attacker can predict or influence what gets overwritten, they may redirect execution flow to attacker-controlled instructions.
This typically requires:
- A non-hardened build
- Predictable memory layout
- Missing protections such as ASLR, RELRO, or stack canaries
Hardened production builds are more difficult to exploit but remain susceptible to crashes.
Detection of Proof-of-Concept Attempts
Proof-of-concept testing must only occur in isolated lab environments.
Safe testing involves:
- Running the server with verbose logging
- Sending anomalous JSON with negative
n_discard - Observing crashes or memory errors
Never perform such testing in production or expose exploit techniques publicly.
Indicators of PoC attempts include malformed parameter values followed immediately by application failure.
Recommended Defensive Response
1. Restrict Network Access
Limit access to the server to trusted networks and clients only.
2. Enforce Input Validation
Before requests reach the server:
- Require
n_discardto be non-negative - Reject malformed or unexpected numeric values
- Enforce strict JSON schema validation
3. Apply Official Patch
Upgrade to the patched version as soon as it is released.
Official patch / upgrade:
https://github.com/ggml-org/llama.cpp/security/advisories
Use only official sources for updates.
4. System Hardening
- Use AddressSanitizer in testing
- Enable RELRO, PIE, and stack canaries in production
- Run the service under a restricted user account
Effective Detection Locations
| Log Source | What to Monitor |
|---|---|
| Web / VPN Logs | Negative n_discard values |
| Proxy / Load Balancer | Malformed request patterns |
| Application Logs | Assertion failures, crashes |
| System Logs | Segfaults, restart loops |
| IDS / IPS | JSON anomaly signatures |
Mapping to Weakness and Attack Patterns
This vulnerability is a classic out-of-bounds write, a highly dangerous memory safety issue. The typical attack sequence is:
- Crafted JSON with invalid numeric input
- Server processes input in context management
- Memory is written outside valid bounds
- Crash or memory corruption occurs
- Corruption may be exploited in advanced attack chains
Memory corruption vulnerabilities should always be treated as high risk.
Final Takeway
- CVE-2026-21869 is a high-risk vulnerability in an AI model serving environment.
- The attack surface is a standard HTTP API with no authentication required.
- Detection relies on identifying malformed input and correlated crashes.
- Mitigation involves input validation, access restriction, and patching.
- While full exploitation requires expertise, service disruption is trivial.
