Critical OneFlow Flaws Expose AI Workloads to Easy Denial-of-Service Attacks

Product Details

OneFlow is an open-source deep learning framework used for building, training, and serving AI/ML models at scale. It is commonly deployed in GPU-accelerated environments, containerized ML platforms, internal research pipelines, and production inference services.

Because OneFlow frequently processes external datasets, user-controlled inputs, and dynamically generated tensors, weaknesses in input validation can be abused to destabilize systems even without authentication.


Vulnerability Summary Table

CVE NameCVE IDCVSS ScoreSeverityAttack VectorExploitabilityExploit Availability
Malformed Tensor Creation DoSCVE-2025-710117.5HighNetwork / LocalLow complexityNo public exploit
Scatter Operation Validation DoSCVE-2025-710097.8HighNetwork / LocalLow complexityNo public exploit
Autograd Segmentation Fault DoSCVE-2025-710088.1HighNetwork / LocalMedium complexityNo public exploit

Note: Exploitation is feasible using crafted inputs only. No authentication bypass or privilege escalation is required.


CVE-2025-71011

Malformed Tensor Creation Leading to Denial of Service

Technical Details

Improper validation of tensor parameters has been identified in OneFlow’s tensor creation logic. Tensor dimensions, data types, and shape metadata are not sufficiently constrained before memory allocation occurs.

As a result, memory allocation requests can be issued with:

  • Excessively large dimension values
  • Invalid or negative dimensions
  • Inconsistent shape and datatype combinations

These conditions cause the runtime to allocate invalid memory regions or exhaust system resources.

Exploitation Scenario (Educational Context)

When untrusted data is passed into a model (for example, through inference APIs or preprocessing steps), a malicious tensor definition may be embedded within the request. Upon execution, OneFlow attempts to materialize the tensor and the process terminates unexpectedly.

This exploitation does not require code execution. Service disruption alone is achieved.

Impact

  • Application crashes
  • GPU or system memory exhaustion
  • Training or inference job termination
  • Reduced availability of shared ML infrastructure

MITRE ATT&CK Mapping

  • T1499 – Endpoint Denial of Service
  • T1190 – Exploit Public-Facing Application

Detection & Monitoring

Log Sources to Enable

  • OneFlow runtime logs
  • Application stdout / stderr
  • GPU driver logs (CUDA, DCGM)
  • Container runtime logs (Docker / Kubernetes)

Indicators of Exploitation

  • Repeated tensor allocation failures
  • Sudden spikes in memory consumption
  • Abnormal tensor dimension values logged during execution
  • Frequent container restarts or pod crashes

Splunk SIEM Use Cases

Use Case 1 – Tensor Allocation Failure Detection

Objective: Detect malformed tensor creation attempts.

Log Source: OneFlow runtime logs
Logic:

  • Monitor error messages related to tensor initialization
  • Trigger alerts on repeated allocation failures within a short time window

SPL:

index=ml_logs "tensor" AND ("allocation failed" OR "invalid shape")
| stats count by host
| where count > 5

Use Case 2 – Memory Exhaustion Correlation

Objective: Identify DoS attempts via resource abuse.

Log Source: GPU metrics + application logs
Logic:

  • Correlate memory spikes with tensor creation errors
  • Alert when abnormal memory growth precedes crashes

Mitigation Guidance

  • Strict limits on tensor size and dimensionality should be enforced at API boundaries
  • User-supplied data should be validated before being passed into OneFlow operations
  • Untrusted ML workloads should be isolated from production systems

Official Patch / Upgrade

👉 https://github.com/Oneflow-Inc/oneflow/security/advisories


CVE-2025-71009

Scatter Operations Input Validation Denial of Service

Technical Details

Scatter operations in OneFlow fail to properly validate index tensors before execution. Index values outside valid tensor boundaries are not consistently rejected.

When invalid indices are processed, out-of-bounds memory access occurs, leading to immediate process termination.

Exploitation Scenario (Educational Context)

A crafted input containing malicious index tensors is supplied to a model using scatter operations (commonly seen in embeddings and sparse updates). During execution, invalid memory is accessed and the runtime crashes.

Impact

  • Immediate crash of inference or training pipelines
  • Denial-of-service in shared GPU environments
  • Reduced system stability

MITRE ATT&CK Mapping

  • T1499 – Endpoint Denial of Service
  • T1055 – Memory Corruption Behavior

Detection & Monitoring

Indicators

  • Scatter-related segmentation faults
  • Kernel or GPU fault messages
  • Repeated crashes during embedding layer execution

Splunk SIEM Use Cases

Use Case 3 – Scatter Operation Crash Detection

Objective: Detect repeated crashes tied to scatter operations.

Log Source: Application logs + OS crash logs
Logic:

  • Identify segmentation faults referencing scatter operators
  • Alert on repeated occurrences from the same source

Example SPL (Conceptual):

index=ml_logs ("scatter" AND ("segmentation fault" OR "out of bounds"))
| stats count by host
| where count > 3

Mitigation Guidance

  • Index bounds validation should be enforced before scatter execution
  • External control over index tensors should be restricted
  • Defensive checks should be added at preprocessing layers

Official Patch / Upgrade

👉 https://github.com/Oneflow-Inc/oneflow/security/advisories


CVE-2025-71008

Autograd Segmentation Fault Leading to Denial of Service

Technical Details

A flaw has been identified in OneFlow’s autograd engine where malformed computation graphs or inconsistent gradient metadata may trigger segmentation faults during backward propagation.

This condition arises when gradient shapes or graph dependencies are not fully validated before execution.

Exploitation Scenario (Educational Context)

When training pipelines accept externally influenced model definitions or gradient parameters, malformed graphs may be introduced. During backpropagation, invalid memory references are accessed, causing the process to terminate.

Impact

  • Training job failures
  • Loss of model training progress
  • Service disruption in training-as-a-service platforms

MITRE ATT&CK Mapping

  • T1499 – Endpoint Denial of Service
  • T1106 – Native API Abuse

Detection & Monitoring

Indicators

  • Segmentation faults during backward passes
  • Autograd-related crash traces
  • Repeated training job restarts

Splunk SIEM Use Cases

Use Case 4 – Autograd Crash Detection

Objective: Detect abnormal backward-pass failures.

Log Source: OneFlow autograd logs
Logic:

  • Alert on segmentation faults occurring during gradient computation
  • Correlate with recent model input changes

Example SPL (Conceptual):

index=ml_logs ("autograd" AND "segmentation fault")
| stats count by job_id
| where count > 1

Overall Risk Assessment

These vulnerabilities present a high operational risk due to their ability to cause reliable denial-of-service in AI/ML environments. While no data leakage or privilege escalation has been identified, disruption of ML workloads alone may result in:

  • Service outages
  • SLA violations
  • Financial and reputational impact

All affected systems should be patched promptly, especially those processing untrusted or external inputs.


Official Patch / Upgrade

👉 https://github.com/Oneflow-Inc/oneflow/security/advisories


Final Takeaway

These OneFlow vulnerabilities enable reliable denial-of-service attacks by abusing weak input validation in tensor creation, scatter operations, and the autograd engine. Any ML system that processes untrusted or external data is at risk of crashes, resource exhaustion, and service disruption. While no data exposure is involved, the operational impact is high, especially for shared GPU and ML-as-a-service environments. Immediate patching and stronger input validation are essential to maintain availability and stability.


Aegiron

Backed by 11+ years in cybersecurity and incident response, we decode the latest threats shaping today’s digital battlefield. This blog cuts through the noise with clear insights on vulnerabilities, emerging exploits, and the cyber news defenders can’t afford to miss.