Product Details
OneFlow is an open-source deep learning framework used for building, training, and serving AI/ML models at scale. It is commonly deployed in GPU-accelerated environments, containerized ML platforms, internal research pipelines, and production inference services.
Because OneFlow frequently processes external datasets, user-controlled inputs, and dynamically generated tensors, weaknesses in input validation can be abused to destabilize systems even without authentication.
Vulnerability Summary Table
| CVE Name | CVE ID | CVSS Score | Severity | Attack Vector | Exploitability | Exploit Availability |
|---|---|---|---|---|---|---|
| Malformed Tensor Creation DoS | CVE-2025-71011 | 7.5 | High | Network / Local | Low complexity | No public exploit |
| Scatter Operation Validation DoS | CVE-2025-71009 | 7.8 | High | Network / Local | Low complexity | No public exploit |
| Autograd Segmentation Fault DoS | CVE-2025-71008 | 8.1 | High | Network / Local | Medium complexity | No public exploit |
Note: Exploitation is feasible using crafted inputs only. No authentication bypass or privilege escalation is required.
CVE-2025-71011
Malformed Tensor Creation Leading to Denial of Service
Technical Details
Improper validation of tensor parameters has been identified in OneFlow’s tensor creation logic. Tensor dimensions, data types, and shape metadata are not sufficiently constrained before memory allocation occurs.
As a result, memory allocation requests can be issued with:
- Excessively large dimension values
- Invalid or negative dimensions
- Inconsistent shape and datatype combinations
These conditions cause the runtime to allocate invalid memory regions or exhaust system resources.
Exploitation Scenario (Educational Context)
When untrusted data is passed into a model (for example, through inference APIs or preprocessing steps), a malicious tensor definition may be embedded within the request. Upon execution, OneFlow attempts to materialize the tensor and the process terminates unexpectedly.
This exploitation does not require code execution. Service disruption alone is achieved.
Impact
- Application crashes
- GPU or system memory exhaustion
- Training or inference job termination
- Reduced availability of shared ML infrastructure
MITRE ATT&CK Mapping
- T1499 – Endpoint Denial of Service
- T1190 – Exploit Public-Facing Application
Detection & Monitoring
Log Sources to Enable
- OneFlow runtime logs
- Application stdout / stderr
- GPU driver logs (CUDA, DCGM)
- Container runtime logs (Docker / Kubernetes)
Indicators of Exploitation
- Repeated tensor allocation failures
- Sudden spikes in memory consumption
- Abnormal tensor dimension values logged during execution
- Frequent container restarts or pod crashes
Splunk SIEM Use Cases
Use Case 1 – Tensor Allocation Failure Detection
Objective: Detect malformed tensor creation attempts.
Log Source: OneFlow runtime logs
Logic:
- Monitor error messages related to tensor initialization
- Trigger alerts on repeated allocation failures within a short time window
SPL:
index=ml_logs "tensor" AND ("allocation failed" OR "invalid shape")
| stats count by host
| where count > 5
Use Case 2 – Memory Exhaustion Correlation
Objective: Identify DoS attempts via resource abuse.
Log Source: GPU metrics + application logs
Logic:
- Correlate memory spikes with tensor creation errors
- Alert when abnormal memory growth precedes crashes
Mitigation Guidance
- Strict limits on tensor size and dimensionality should be enforced at API boundaries
- User-supplied data should be validated before being passed into OneFlow operations
- Untrusted ML workloads should be isolated from production systems
Official Patch / Upgrade
👉 https://github.com/Oneflow-Inc/oneflow/security/advisories
CVE-2025-71009
Scatter Operations Input Validation Denial of Service
Technical Details
Scatter operations in OneFlow fail to properly validate index tensors before execution. Index values outside valid tensor boundaries are not consistently rejected.
When invalid indices are processed, out-of-bounds memory access occurs, leading to immediate process termination.
Exploitation Scenario (Educational Context)
A crafted input containing malicious index tensors is supplied to a model using scatter operations (commonly seen in embeddings and sparse updates). During execution, invalid memory is accessed and the runtime crashes.
Impact
- Immediate crash of inference or training pipelines
- Denial-of-service in shared GPU environments
- Reduced system stability
MITRE ATT&CK Mapping
- T1499 – Endpoint Denial of Service
- T1055 – Memory Corruption Behavior
Detection & Monitoring
Indicators
- Scatter-related segmentation faults
- Kernel or GPU fault messages
- Repeated crashes during embedding layer execution
Splunk SIEM Use Cases
Use Case 3 – Scatter Operation Crash Detection
Objective: Detect repeated crashes tied to scatter operations.
Log Source: Application logs + OS crash logs
Logic:
- Identify segmentation faults referencing scatter operators
- Alert on repeated occurrences from the same source
Example SPL (Conceptual):
index=ml_logs ("scatter" AND ("segmentation fault" OR "out of bounds"))
| stats count by host
| where count > 3
Mitigation Guidance
- Index bounds validation should be enforced before scatter execution
- External control over index tensors should be restricted
- Defensive checks should be added at preprocessing layers
Official Patch / Upgrade
👉 https://github.com/Oneflow-Inc/oneflow/security/advisories
CVE-2025-71008
Autograd Segmentation Fault Leading to Denial of Service
Technical Details
A flaw has been identified in OneFlow’s autograd engine where malformed computation graphs or inconsistent gradient metadata may trigger segmentation faults during backward propagation.
This condition arises when gradient shapes or graph dependencies are not fully validated before execution.
Exploitation Scenario (Educational Context)
When training pipelines accept externally influenced model definitions or gradient parameters, malformed graphs may be introduced. During backpropagation, invalid memory references are accessed, causing the process to terminate.
Impact
- Training job failures
- Loss of model training progress
- Service disruption in training-as-a-service platforms
MITRE ATT&CK Mapping
- T1499 – Endpoint Denial of Service
- T1106 – Native API Abuse
Detection & Monitoring
Indicators
- Segmentation faults during backward passes
- Autograd-related crash traces
- Repeated training job restarts
Splunk SIEM Use Cases
Use Case 4 – Autograd Crash Detection
Objective: Detect abnormal backward-pass failures.
Log Source: OneFlow autograd logs
Logic:
- Alert on segmentation faults occurring during gradient computation
- Correlate with recent model input changes
Example SPL (Conceptual):
index=ml_logs ("autograd" AND "segmentation fault")
| stats count by job_id
| where count > 1
Overall Risk Assessment
These vulnerabilities present a high operational risk due to their ability to cause reliable denial-of-service in AI/ML environments. While no data leakage or privilege escalation has been identified, disruption of ML workloads alone may result in:
- Service outages
- SLA violations
- Financial and reputational impact
All affected systems should be patched promptly, especially those processing untrusted or external inputs.
Official Patch / Upgrade
👉 https://github.com/Oneflow-Inc/oneflow/security/advisories
Final Takeaway
These OneFlow vulnerabilities enable reliable denial-of-service attacks by abusing weak input validation in tensor creation, scatter operations, and the autograd engine. Any ML system that processes untrusted or external data is at risk of crashes, resource exhaustion, and service disruption. While no data exposure is involved, the operational impact is high, especially for shared GPU and ML-as-a-service environments. Immediate patching and stronger input validation are essential to maintain availability and stability.
