CVE-2026-21869: Unauthenticated Memory Corruption in llama.cpp via Malformed Context Handling

Vulnerability Overview

CVE: CVE-2026-21869
Severity: High
CVSS Score: 8.8 (High impact across confidentiality, integrity, and availability)
Impact: Denial of Service (crash), Memory Corruption, Potential for Remote Code Execution
Exploitability: Easily reachable via normal HTTP API calls
Exploit Availability: Known proof-of-concept

What Is This Vulnerability?

CVE-2026-21869 is a programming flaw in the widely used llama.cpp AI model server. Specifically, the server takes input from HTTP requests, converts that input into internal parameters, and then uses those parameters to manage the model’s memory buffer where tokens and context are stored.

The bug happens when a specific parameter — called n_discard — is allowed to be negative. Normally this number tells the server how many tokens to “throw away” when the context window fills up. But if someone sends a negative number here, the server will perform incorrect calculations that move pointers and indexes outside the valid memory area. In programming terms, this is called an out-of-bounds write.

This is serious because a write outside valid memory can cause:

A crash (quickly rendering the service unavailable)
Corruption of internal memory
In certain builds or environments where memory protections are weak, it can lead to remote code execution — meaning an attacker might make the server run arbitrary instructions

In short, malformed user input can corrupt server memory and break or take over the server.

How an Attacker Could Trigger This

A vulnerable llama.cpp server accepts JSON requests such as:

POST /completions
{
  "model": "some-model",
  "input": "some text",
  "n_discard": <some number>
}

If the value provided for n_discard is allowed to be negative, and if the server needs to shift its internal context window (because it has too many tokens in memory), it attempts to recalculate token positions using a negative number. This results in pointer and index calculations that move outside the allowed memory range.

An attacker does not need credentials or special access. Simply sending a crafted JSON request to the server is enough to hit this vulnerable code path.

As long as the server:

has context shifting enabled
is running a vulnerable build
is reachable over the network

the flaw can be triggered.

The immediate result is usually a server crash. On servers compiled without memory safety checks, the attacker can sometimes overwrite internal data structures in a controllable way, forming the basis for a potential remote code execution path.

This behavior is deterministic and can be reproduced reliably in a laboratory environment.

Proof of Concept (PoC) – Educational Use Only

⚠️ WARNING
This PoC is intended only for learning, validation, and defensive research in a controlled laboratory environment.
Do NOT run this against production systems, public servers, or systems you do not own or have explicit permission to test.

PoC Objective

Demonstrate that supplying a negative n_discard value to a vulnerable llama.cpp HTTP server triggers a deterministic crash or memory corruption when context shifting occurs.

Preconditions

Vulnerable version of llama.cpp
llama-server running locally or in an isolated test environment
Context shifting enabled (default behavior)
Server reachable on a test port (example: 8080)

PoC Example (Crash Demonstration)

curl -X POST http://127.0.0.1:8080/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "test-model",
    "input": "Generate enough text to force the context window to fill and trigger a context shift.",
    "n_discard": -32
  }'

Expected Outcome

When the server attempts to manage its token context:

Invalid index calculations occur internally
Memory is accessed or written outside valid bounds
The process terminates unexpectedly

Typical observable results include:

Immediate server crash
SIGSEGV or SIGABRT in system logs
Assertion failure or sanitizer output (in debug builds)
Core dump generation

The behavior is repeatable and deterministic under the same conditions.

Why This PoC Works

The negative n_discard value is accepted without validation and propagated into internal context management logic. During context shifting, this causes incorrect pointer and index calculations, resulting in an out-of-bounds write.

Detection and Monitoring

1. Web Server Logs – Malformed JSON

Look for negative values in the n_discard field within HTTP request bodies.

Example pattern:

"n_discard": -32

Any POST request to completion endpoints containing a minus sign in front of the n_discard value should be treated as suspicious. Even basic string matching can be effective.

2. Abnormal Crash Activity

Frequent or sudden crashes of the model server — especially during routine requests — may indicate attempted exploitation.

Watch for:

SIGSEGV or SIGABRT signals
Core dump generation
Repeated crash-restart loops

3. Application Log Errors

During context shifting, invalid index calculations may produce internal error messages such as:

pos_min == -1

These messages strongly suggest invalid memory handling caused by malformed input.

4. Network IDS / WAF Indicators

Intrusion detection systems can flag requests such as:

POST /completions with "n_discard": -#

While tuning is necessary to avoid false positives, even basic rules can surface exploitation attempts.

Potential Signs of Exploitation

A filtered or blocked request containing negative n_discard immediately followed by a crash
A server crash directly after receiving malformed JSON from a specific client IP
Core dumps indicating out-of-bounds memory writes

When observed together, these indicators strongly suggest an exploitation attempt.

How This Could Lead to Code Execution

An out-of-bounds write allows a program to write past its intended memory region. In C or C++ applications, this may overwrite:

Internal state variables
Function pointers
Return addresses
Virtual function tables (vtables)

If an attacker can predict or influence what gets overwritten, they may redirect execution flow to attacker-controlled instructions.

This typically requires:

A non-hardened build
Predictable memory layout
Missing protections such as ASLR, RELRO, or stack canaries

Hardened production builds are more difficult to exploit but remain susceptible to crashes.

Detection of Proof-of-Concept Attempts

Proof-of-concept testing must only occur in isolated lab environments.

Safe testing involves:

Running the server with verbose logging
Sending anomalous JSON with negative n_discard
Observing crashes or memory errors

Never perform such testing in production or expose exploit techniques publicly.

Indicators of PoC attempts include malformed parameter values followed immediately by application failure.

Recommended Defensive Response

1. Restrict Network Access

Limit access to the server to trusted networks and clients only.

2. Enforce Input Validation

Before requests reach the server:

Require n_discard to be non-negative
Reject malformed or unexpected numeric values
Enforce strict JSON schema validation

3. Apply Official Patch

Upgrade to the patched version as soon as it is released.

Official patch / upgrade:
https://github.com/ggml-org/llama.cpp/security/advisories

Use only official sources for updates.

4. System Hardening

Use AddressSanitizer in testing
Enable RELRO, PIE, and stack canaries in production
Run the service under a restricted user account

Effective Detection Locations

Log Source	What to Monitor
Web / VPN Logs	Negative `n_discard` values
Proxy / Load Balancer	Malformed request patterns
Application Logs	Assertion failures, crashes
System Logs	Segfaults, restart loops
IDS / IPS	JSON anomaly signatures

Mapping to Weakness and Attack Patterns

This vulnerability is a classic out-of-bounds write, a highly dangerous memory safety issue. The typical attack sequence is:

Crafted JSON with invalid numeric input
Server processes input in context management
Memory is written outside valid bounds
Crash or memory corruption occurs
Corruption may be exploited in advanced attack chains

Memory corruption vulnerabilities should always be treated as high risk.

Final Takeway

CVE-2026-21869 is a high-risk vulnerability in an AI model serving environment.
The attack surface is a standard HTTP API with no authentication required.
Detection relies on identifying malformed input and correlated crashes.
Mitigation involves input validation, access restriction, and patching.
While full exploitation requires expertise, service disruption is trivial.