What Happened, How It Worked, and How to Defend Against It
Software supply-chain attacks don’t usually look dramatic.
There’s no pop-up, no crash, no obvious warning. Most of the time, everything looks normal — and that’s exactly why they work.
The PyPI attack that targeted the PyTorch ecosystem is a perfect example. It didn’t break cryptography, exploit kernels, or compromise source repositories. Instead, it quietly abused trust, automation, and default behaviors most developers never think twice about.
This article breaks the incident down end-to-end and then goes deep on prevention, mitigation, and practical analysis, including how to safely audit setup.py, where most of the damage actually happens.
What the attack actually was
This incident was a software supply-chain attack targeting developers who use PyTorch via the Python packaging ecosystem.
Key facts up front:
- PyTorch’s official source code was not compromised
- No GitHub repositories were hacked
- The attack lived entirely in malicious third-party packages
- Distribution happened through the Python Package Index (PyPI)
The attackers uploaded packages that looked legitimate, relied on normal pip install behavior, and executed malicious code during installation.
No exploit required.
No vulnerability scanner would scream.
Everything behaved “normally”.
Why PyTorch and ML environments were ideal targets
Machine-learning environments are unusually valuable and unusually soft.
They often include:
- Cloud GPU instances
- Expensive compute credits
- CI/CD pipelines
- Research data
- Proprietary models
- Long-lived cloud credentials in environment variables
At the same time:
- Dependency trees are large
- Nightly and experimental builds are common
- Security hardening is often secondary to speed
- Developers copy installation commands from blogs, issues, and notebooks
From an attacker’s perspective, this is a high-return environment with low friction.
How the attack worked
The attackers used two main techniques.
1 Typosquatting
They published packages with names that were almost identical to real PyTorch-related packages:
- small spelling changes
- extra hyphens or underscores
- names resembling internal or nightly components
A quick glance wouldn’t reveal the difference.
2 Dependency confusion
Some builds referenced internal package names that were never meant to exist publicly.
Attackers:
- Discovered those names
- Uploaded packages with the same names to PyPI
- Assigned higher version numbers
- Let
pipresolve the dependency automatically
Because pip prefers public packages with higher versions, the malicious package won.
Where the malicious code lived
The most important detail in this entire story:
Installing a Python package executes code.
The malicious logic was placed in:
setup.py- custom install hooks
- occasionally top-level
__init__.py
That means:
- You didn’t have to import the package
- You didn’t have to run your application
- Running
pip installwas enough
This is not a bug. It’s how Python packaging works.
What the malicious code actually did
The malware was small, quiet, and deliberate.
Step 1: Execute silently during install
The code ran automatically as soon as installation started and was designed to:
- avoid errors
- avoid output
- let installation succeed
Failed installs attract attention. Successful ones don’t.
Step 2: Collect system information
The code gathered basic context:
- OS type
- username
- execution path
- Python version
- whether it was running in CI or cloud
This helped the attacker understand the environment.
Step 3: Steal environment variables
This was the real payload.
The malware scanned environment variables for:
- cloud credentials (AWS, GCP, Azure)
- CI tokens
- GitHub/GitLab tokens
- database passwords
- API keys
In CI and ML workflows, secrets are often exposed this way by design.
Step 4: Decide whether to act
If nothing valuable was found, the malware sometimes did nothing at all.
This reduced noise and helped it stay invisible.
Step 5: Exfiltrate quietly
If secrets were found:
- data was encoded
- sent over HTTPS
- sent once
- no retries
- no output
From the developer’s perspective:
“pip install worked fine.”
From the attacker’s perspective:
“We just got cloud credentials.”
Step 6: Exit cleanly
No persistence.
No backdoor.
No dropped files.
That’s why many victims never knew they were compromised.
Why this was hard to detect
Traditional security tooling struggles with attacks like this because:
- Code runs during installation
- Static scanners often don’t inspect
setup.py - There’s no persistence to detect later
- Network traffic looks normal
- Package names look legitimate
This is supply-chain abuse, not classic malware.
Prevention: how to avoid installing malicious packages
Prevention matters more than cleanup.
1 Control where packages come from
- Use private mirrors or internal registries
- Block direct internet installs in CI
- Allow-list approved packages
This alone stops most dependency-confusion attacks.
2 Pin dependency versions
Never allow floating versions in CI or production.
Bad:
torch>=2.0
Good:
torch==2.1.0
Attackers rely on version bumps to get installed.
3 Be suspicious of install commands
If the command didn’t come from official documentation:
- slow down
- verify the package name
- check the publisher
Speed is the enemy of supply-chain security.
4 Treat CI as hostile terrain
CI systems are prime targets.
Defensive steps:
- minimize secrets
- rotate credentials frequently
- restrict outbound network access
- log DNS and HTTPS traffic during builds
Mitigation: limiting damage when prevention fails
Assume something will eventually slip through.
1 Monitor outbound traffic
Most malicious packages need to exfiltrate data.
Unexpected outbound HTTPS during installs is a strong signal.
2 Rotate secrets immediately
If you suspect a malicious install:
- rotate all exposed credentials
- assume environment variables were stolen
- don’t wait for proof
3 Use scanners — but don’t trust them blindly
Dependency scanners help, but:
- many ignore install-time code
- simple malware looks “legitimate”
They are necessary, not sufficient.
How to safely analyze a suspicious Python package
Never analyze suspicious packages on your main machine.
1 Use an isolated environment
- VM or container
- no secrets
- no cloud access
- no SSH keys
- no shared directories
Treat it like malware analysis.
2 Download — don’t install
Do not run pip install.
Instead:
- download the archive
- extract it manually
- inspect files offline
Installing executes attacker code.
How to audit setup.py properly
This is where most supply-chain attacks hide.
1 What normal setup.py looks like
Legitimate scripts usually:
- define metadata
- list dependencies
- maybe compile extensions
They do not:
- access the network
- read environment variables
- execute shell commands
- decode or execute hidden payloads
2 Red flags to look for
Be suspicious if you see:
- HTTP or socket usage
- environment variable scraping
exec()oreval()- base64 or compressed blobs
- dynamically generated code
- custom install hooks doing “extra work”
If the installer is doing more than installing, ask why.
Typical malicious install flow
- Malicious package uploaded to PyPI
- Developer or CI installs dependency
setup.pyexecutes automatically- Secrets are collected
- Data is exfiltrated
- Installation completes normally
Response and ecosystem impact
Once discovered:
- malicious packages were removed
- dependency references were audited
- internal naming practices were hardened
- the Python Software Foundation reviewed registry safeguards
However, stolen credentials can’t be recalled.
Some damage is permanent and invisible.
The most important takeaway
The key lesson is simple but uncomfortable:
Installing a dependency is executing untrusted code.
Once teams internalize that:
- they audit installers
- they isolate builds
- they lock down CI
- they stop trusting names alone
That mindset shift matters more than any single tool.
