PyPI vs npm: The Hidden ML Supply Chain Threat

So, is your carefully curated Node.js dependency graph actually safer than that messy requirements.txt file your data science team swears by? It’s a question most developers aren’t even asking themselves. We’ve all heard the dire warnings about npm’s sprawling dependency tree, the endless alerts from Dependabot, the existential dread of running npm audit. It’s the usual Silicon Valley song and dance, complete with consultants and pricey security tools. But what if all that attention has, ironically, left a bigger, more dangerous blind spot elsewhere?

Look, I’ve been kicking the tires on this tech circus for two decades, and my BS detector goes off when the narrative gets too neat. The latest dust-up comes from a deep dive comparing supply chain vulnerabilities in npm and PyPI. And the conclusion? It’s not where you’d expect. The popular opinion is that npm, with its massive ecosystem and frequent updates, is the prime target. But the data suggests otherwise, and it’s frankly a bit unnerving.

The Uncomfortable Numbers

After simulating dependency landscapes for both Node.js and a Python ML project (specifically PyTorch Lightning), the results paint a starkly different picture than the prevailing narrative. The headline grabber: a significantly longer detection time for simulated attacks on the Python side. We’re talking nearly 12 hours for PyPI versus a little over 4 for npm. That’s not a rounding error; that’s a gaping chasm.

Here’s a quick rundown of what the simulation unearthed:

Metric	npm (Node.js)	PyPI (Python/ML)
Direct packages in my stack	47	23
Total transitive packages	1,247	891
Surface not audited by scanner	34%	61%
Time to manual detection (simulated)	4h 20min	11h 45min
Packages without hash verification enabled	12%	78%
Maintainers with 2FA active (estimated avg)	~60%	~31%

That 78% of PyPI packages without hash verification is a punch to the gut. This isn’t some theoretical musing; it’s a direct measurement against actual production requirements.txt files, cross-referenced with PyPI’s own index. If your Python projects aren’t using a proper lock file (and let’s be honest, pip freeze isn’t it), you’re already skating on thin ice before we even get to the attacker’s toolkit.

Why Is PyPI So Much Slower to Detect Attacks?

This isn’t some arbitrary outcome. The data points to three fundamental differences in how these ecosystems operate:

1. The Installation Model: A Reproducibility Problem

pm, when used with a package-lock.json (or npm-shrinkwrap.json), offers a deterministic installation process. It pins down the exact versions of every single dependency, creating a reproducible build. Python’s pip install -r requirements.txt, on the other hand, resolves dependencies at runtime. This means that two separate installs, even hours apart, could pull down different versions of a package without anyone batting an eye in a code review. While pip install --require-hashes -r requirements-locked.txt gets closer to reproducibility, it’s not the default and has its own quirks.

# npm: this pins the entire tree
npm ci --audit
# Python: this is NOT a real lock file
pip install -r requirements.txt
# This gets closer, but has its own limitations
pip install --require-hashes -r requirements-locked.txt

2. The ML Lifecycle: A Frozen Target

In the Node.js world, tools like Dependabot are constantly churning out pull requests. It’s noisy, sure, but it means dependencies are reviewed frequently. In machine learning, it’s a different beast. Teams are often hesitant to update core libraries like torch, transformers, or lightning because touching those versions can break trained models. This intentional freezing, while understandable from a model stability perspective, creates a massive, extended window for attackers to sneak in malicious code via typosquatting. The simulation introduced a fictional torch-utils package, and it lingered undetected for 11 days, while its npm counterpart was flagged by Snyk in under 18 hours.

3. Security Culture: Not Primarily DevSecOps

This is the hard truth: the majority of engineers writing requirements.txt for ML pipelines come from a background focused on algorithmic convergence, not necessarily software supply chain security. Their primary objective is getting the model to train, to perform. Security is often an afterthought, or worse, relegated to a separate team that doesn’t fully grasp the nuances of their particular stack. This disconnect means vulnerabilities can persist, unexamined, because the immediate pressure is on model performance.

“PyPI lives in an operational blind spot for most backend teams — and that blind spot is exactly the vector attackers are exploiting most consistently in 2025.”

This isn’t to say npm is pristine. Far from it. But the narrative seems to have fixated on the symptoms — the sheer volume of packages — rather than the underlying architectural and cultural factors that make PyPI, particularly in the ML sphere, a more tempting and currently less defended target.

What Does This Mean for Your Stack?

For organizations running Python in production, especially those with significant ML components, this is a wake-up call. Relying solely on automated scanning of npm packages while leaving PyPI dependencies largely unscrutinized is a gamble. It’s time to shift some of that security focus. Properly implementing and enforcing hash verification with lock files for Python is no longer optional; it’s a fundamental security hygiene measure. Furthermore, fostering a security-aware culture among data science and ML teams is paramount. They need to understand that a compromised dependency can invalidate months of work — or worse.

Will this mean more noise, more alerts for your teams? Probably. But isn’t that a small price to pay for not being the next major supply chain attack headline? The convenience of pip install might be costing us more than we think.

🧬 Related Insights

Frequently Asked Questions

What does this simulation actually measure? This simulation compares the time it takes for security tools and manual review to detect a malicious package introduction in two distinct dependency ecosystems: npm (Node.js) and PyPI (Python). It factors in package audit surface, hash verification practices, and maintainer security habits to estimate detection timelines.

Will this replace my job? No. This research highlights potential risks and encourages better security practices, not the replacement of human oversight. Developers and security professionals remain essential for understanding context, implementing defenses, and responding to threats.

Is npm actually secure then? npm faces its own significant supply chain risks, particularly due to its vast number of packages and frequent updates, which can overwhelm audit processes. However, this research suggests that PyPI, especially within ML contexts, currently presents a less scrutinised and therefore potentially more exploitable attack vector due to differences in installation determinism, package lifecycle, and security culture.

PyPI vs npm: The Hidden ML Supply Chain Threat

Key Takeaways

The Uncomfortable Numbers

Why Is PyPI So Much Slower to Detect Attacks?

What Does This Mean for Your Stack?

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

The Uncomfortable Numbers

Why Is PyPI So Much Slower to Detect Attacks?

What Does This Mean for Your Stack?

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

OCM: 1 Standard for Software Bills of Delivery [Cloud Native]

Deno's Sandbox vs. npm's Wild West

European Gov Sites: 3000 Trackers, 1000 Open DBs [Security Crisis]

Cloudflare's AI Code Review: Orchestrated Agents Deliver Scale

Stay in the loop

Key Takeaways