Microsoft Calls for Rethinking Security: New Threat Modeling Framework Targets AI-Driven Applications

Modern AI systems—especially generative models and agentic applications—change the fundamentals of how we reason about security risk. Traditional threat modeling assumptions (deterministic execution, fixed code paths, predictable inputs/outputs) no longer hold for systems that interpret language, act autonomously, and behave probabilistically.

Why AI Requires a New Threat Modeling Approach

AI systems differ from classical software in three key ways:

Nondeterministic behavior: Outputs are not fixed for a given input, requiring risk evaluation across distributions of possible outcomes rather than single execution paths.
Instruction-following bias: Models treat user inputs and prompts as blended instructions, expanding the attack surface to include manipulated or adversarial text/images that effectively become commands.
System expansion: AI systems commonly integrate tools, memory, APIs, and autonomous actions, introducing new failure modes and opportunity for cascading misuse.

Because of these factors, external inputs can influence model behavior in ways that look like executable intent. This creates attack surfaces (e.g., prompt or data injection, tool misuse, incorrect outputs treated as fact) that aren’t captured by classic threat categories.

Core Principles of AI Threat Modeling

A robust AI threat model must start with a clear understanding of what needs protection and how the system actually behaves in practice. Key assets include:

User safety and impact from incorrect or harmful outputs
Trust and correctness of responses
Privacy and confidentiality of training and runtime data
Integrity of prompts, memory, and agent actions

Effective modeling goes beyond identifying threats to prioritizing them based on real system behavior and business impact, not just theoretical attack vectors.

Modelling and Analysis Steps

A practical AI threat modeling process typically involves:

Map the actual architecture: Document how prompts are constructed, how memory and external data are accessed, which tools are invoked, and where trust boundaries exist.
Enumerate misuse scenarios: Include both malicious attacks and accidental misuse that could lead to harm.
Assess impact vs likelihood: Rare but high-impact events may require different mitigation strategies than frequent, low-impact ones.
Design architectural mitigations: Focus on reducing potential damage (“blast radius”) rather than assuming perfect safety.
Embed observability: Logging, monitoring, and audit trails help detect misuse and improve models over time.

Architectural Mitigations to Consider

Some common architectural controls for AI threat mitigation include:

Separation of instructions and untrusted input to limit unintended command execution
Least-privilege access for tools, data sources, and operations
Human-in-the-loop approvals for high-risk or irreversible actions
Input validation and output redaction before sensitive data leaves the system
Scoped allow-lists for external APIs and retrieval systems

Unlike traditional software, eliminating all residual risk is not realistic for AI systems; non-determinism means there will always be edge behaviors. Threat modeling helps teams design layered defenses that contain and control risk deliberately rather than reacting late in the lifecycle.