Product Details

OneFlow is an open-source deep learning framework used for building, training, and serving AI/ML models at scale. It is commonly deployed in GPU-accelerated environments, containerized ML platforms, internal research pipelines, and production inference services.

Because OneFlow frequently processes external datasets, user-controlled inputs, and dynamically generated tensors, weaknesses in input validation can be abused to destabilize systems even without authentication.

Vulnerability Summary Table

CVE Name	CVE ID	CVSS Score	Severity	Attack Vector	Exploitability	Exploit Availability
Malformed Tensor Creation DoS	CVE-2025-71011	7.5	High	Network / Local	Low complexity	No public exploit
Scatter Operation Validation DoS	CVE-2025-71009	7.8	High	Network / Local	Low complexity	No public exploit
Autograd Segmentation Fault DoS	CVE-2025-71008	8.1	High	Network / Local	Medium complexity	No public exploit

Note: Exploitation is feasible using crafted inputs only. No authentication bypass or privilege escalation is required.

CVE-2025-71011

Malformed Tensor Creation Leading to Denial of Service

Technical Details

Improper validation of tensor parameters has been identified in OneFlow’s tensor creation logic. Tensor dimensions, data types, and shape metadata are not sufficiently constrained before memory allocation occurs.

As a result, memory allocation requests can be issued with:

Excessively large dimension values
Invalid or negative dimensions
Inconsistent shape and datatype combinations

These conditions cause the runtime to allocate invalid memory regions or exhaust system resources.

Exploitation Scenario (Educational Context)

When untrusted data is passed into a model (for example, through inference APIs or preprocessing steps), a malicious tensor definition may be embedded within the request. Upon execution, OneFlow attempts to materialize the tensor and the process terminates unexpectedly.

This exploitation does not require code execution. Service disruption alone is achieved.

Impact

Application crashes
GPU or system memory exhaustion
Training or inference job termination
Reduced availability of shared ML infrastructure

MITRE ATT&CK Mapping

T1499 – Endpoint Denial of Service
T1190 – Exploit Public-Facing Application

Detection & Monitoring

Log Sources to Enable

OneFlow runtime logs
Application stdout / stderr
GPU driver logs (CUDA, DCGM)
Container runtime logs (Docker / Kubernetes)

Indicators of Exploitation

Repeated tensor allocation failures
Sudden spikes in memory consumption
Abnormal tensor dimension values logged during execution
Frequent container restarts or pod crashes

Splunk SIEM Use Cases

Use Case 1 – Tensor Allocation Failure Detection

Objective: Detect malformed tensor creation attempts.

Log Source: OneFlow runtime logs
Logic:

Monitor error messages related to tensor initialization
Trigger alerts on repeated allocation failures within a short time window

SPL:

index=ml_logs "tensor" AND ("allocation failed" OR "invalid shape")
| stats count by host
| where count > 5

Use Case 2 – Memory Exhaustion Correlation

Objective: Identify DoS attempts via resource abuse.

Log Source: GPU metrics + application logs
Logic:

Correlate memory spikes with tensor creation errors
Alert when abnormal memory growth precedes crashes

Mitigation Guidance

Strict limits on tensor size and dimensionality should be enforced at API boundaries
User-supplied data should be validated before being passed into OneFlow operations
Untrusted ML workloads should be isolated from production systems

Official Patch / Upgrade

👉 https://github.com/Oneflow-Inc/oneflow/security/advisories

CVE-2025-71009

Scatter Operations Input Validation Denial of Service

Technical Details

Scatter operations in OneFlow fail to properly validate index tensors before execution. Index values outside valid tensor boundaries are not consistently rejected.

When invalid indices are processed, out-of-bounds memory access occurs, leading to immediate process termination.

Exploitation Scenario (Educational Context)

A crafted input containing malicious index tensors is supplied to a model using scatter operations (commonly seen in embeddings and sparse updates). During execution, invalid memory is accessed and the runtime crashes.

Impact

Immediate crash of inference or training pipelines
Denial-of-service in shared GPU environments
Reduced system stability

MITRE ATT&CK Mapping

T1499 – Endpoint Denial of Service
T1055 – Memory Corruption Behavior

Detection & Monitoring

Indicators

Scatter-related segmentation faults
Kernel or GPU fault messages
Repeated crashes during embedding layer execution

Splunk SIEM Use Cases

Use Case 3 – Scatter Operation Crash Detection

Objective: Detect repeated crashes tied to scatter operations.

Log Source: Application logs + OS crash logs
Logic:

Identify segmentation faults referencing scatter operators
Alert on repeated occurrences from the same source

Example SPL (Conceptual):

index=ml_logs ("scatter" AND ("segmentation fault" OR "out of bounds"))
| stats count by host
| where count > 3

Mitigation Guidance

Index bounds validation should be enforced before scatter execution
External control over index tensors should be restricted
Defensive checks should be added at preprocessing layers

Official Patch / Upgrade

👉 https://github.com/Oneflow-Inc/oneflow/security/advisories

CVE-2025-71008

Autograd Segmentation Fault Leading to Denial of Service

Technical Details

A flaw has been identified in OneFlow’s autograd engine where malformed computation graphs or inconsistent gradient metadata may trigger segmentation faults during backward propagation.

This condition arises when gradient shapes or graph dependencies are not fully validated before execution.

Exploitation Scenario (Educational Context)

When training pipelines accept externally influenced model definitions or gradient parameters, malformed graphs may be introduced. During backpropagation, invalid memory references are accessed, causing the process to terminate.

Impact

Training job failures
Loss of model training progress
Service disruption in training-as-a-service platforms

MITRE ATT&CK Mapping

T1499 – Endpoint Denial of Service
T1106 – Native API Abuse

Detection & Monitoring

Indicators

Segmentation faults during backward passes
Autograd-related crash traces
Repeated training job restarts

Splunk SIEM Use Cases

Use Case 4 – Autograd Crash Detection

Objective: Detect abnormal backward-pass failures.

Log Source: OneFlow autograd logs
Logic:

Alert on segmentation faults occurring during gradient computation
Correlate with recent model input changes

Example SPL (Conceptual):

index=ml_logs ("autograd" AND "segmentation fault")
| stats count by job_id
| where count > 1

Overall Risk Assessment

These vulnerabilities present a high operational risk due to their ability to cause reliable denial-of-service in AI/ML environments. While no data leakage or privilege escalation has been identified, disruption of ML workloads alone may result in:

Service outages
SLA violations
Financial and reputational impact

All affected systems should be patched promptly, especially those processing untrusted or external inputs.

Official Patch / Upgrade

👉 https://github.com/Oneflow-Inc/oneflow/security/advisories

Final Takeaway

These OneFlow vulnerabilities enable reliable denial-of-service attacks by abusing weak input validation in tensor creation, scatter operations, and the autograd engine. Any ML system that processes untrusted or external data is at risk of crashes, resource exhaustion, and service disruption. While no data exposure is involved, the operational impact is high, especially for shared GPU and ML-as-a-service environments. Immediate patching and stronger input validation are essential to maintain availability and stability.

Critical OneFlow Flaws Expose AI Workloads to Easy Denial-of-Service Attacks

Product Details

Vulnerability Summary Table

CVE-2025-71011

Malformed Tensor Creation Leading to Denial of Service

Technical Details

Exploitation Scenario (Educational Context)

Impact

MITRE ATT&CK Mapping

Detection & Monitoring

Log Sources to Enable

Indicators of Exploitation

Splunk SIEM Use Cases

Use Case 1 – Tensor Allocation Failure Detection

Use Case 2 – Memory Exhaustion Correlation

Mitigation Guidance

Official Patch / Upgrade

CVE-2025-71009

Scatter Operations Input Validation Denial of Service

Technical Details

Exploitation Scenario (Educational Context)

Impact

MITRE ATT&CK Mapping

Detection & Monitoring

Indicators

Splunk SIEM Use Cases

Use Case 3 – Scatter Operation Crash Detection

Mitigation Guidance

Official Patch / Upgrade

CVE-2025-71008

Autograd Segmentation Fault Leading to Denial of Service

Technical Details

Exploitation Scenario (Educational Context)

Impact

MITRE ATT&CK Mapping

Detection & Monitoring

Indicators

Splunk SIEM Use Cases

Use Case 4 – Autograd Crash Detection

Overall Risk Assessment

Official Patch / Upgrade

Final Takeaway

Aegiron