Notes · Dissecting Real Systems
growing
node_modules Is the Heaviest Object in the Universe
The same "the hash is the identity" idea that powers a build cache also explains why pnpm stores on disk what npm copies a hundred times over.
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
— Andy Hunt & Dave Thomas, The Pragmatic Programmer (1999), Tip 15: "DRY—Don't Repeat Yourself"
Cite this
Mangalapilly, Y. J. (2026, June). node_modules Is the Heaviest Object in the Universe. Saṃhitā Notes. https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/ @online{mangalapilly2026node,
author = {Yesudeep Jose Mangalapilly},
title = {node\_modules Is the Heaviest Object in the Universe},
journal = {Sa\d{m}hit\=a Notes},
year = {2026},
month = {June},
url = {https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/},
urldate = {2026-07-02},
} Yesudeep Jose Mangalapilly. “node_modules Is the Heaviest Object in the Universe.” Saṃhitā Notes, 2026. https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/. TY - ELEC
AU - Mangalapilly, Yesudeep Jose
TI - node_modules Is the Heaviest Object in the Universe
T2 - Saṃhitā Notes
PY - 2026
UR - https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/
Y2 - 2026-07-02
ER - A sequel to The Hash Is the Identity, pointed at a tool every JavaScript developer already fights. By the end you'll see that pnpm and a remote build cache are the same idea wearing different clothes: a file's content hash is its identity. In a build farm that idea lets a result be shared; on your laptop it lets a package be stored once instead of a hundred times — and the joke about node_modules stops being a joke.
There is a running joke that node_modules is the heaviest object in the universe. Like most good jokes it encodes a real grievance: install a handful of JavaScript projects and you will find the same dependency, byte-for-byte identical, copied into each one. Ten projects that all use lodash keep ten copies of lodash. The grievance is not that the packages are large. It is that they are duplicated — and duplication is a choice, not a law of physics.
node_modules is heavy not because packages are big, but because the same bytes are stored over and over.
The previous piece showed how a build farm makes a result shared across a whole team: name an artifact by the hash of its contents, and anyone who needs the same contents gets the same artifact, computed once. pnpm applies the identical principle to the most mundane thing on your machine — the packages in node_modules — and the payoff is the mirror image. Where the build cache used content addressing to share work, pnpm uses it to save space.
The npm model: a copy per project
The default behavior most developers know is npm's. Each project gets its own node_modules, and each node_modules contains its own full copy of every dependency. pnpm's own motivation page states the cost plainly:
If you have 100 projects using a dependency, you will have 100 copies of that dependency saved on disk.
That is the heaviest-object problem in one sentence. The bytes of a given package version are completely determined — version 4.17.21 of a package is the same files everywhere — and yet they are written to disk once per project that asks for them.
Here is the part worth getting exactly right: npm's problem is not that it lacks a content-addressed store. It has one — the cache under ~/.npm/_cacache is built on a library literally named cacache ("content-addressable cache"), keyed by Subresource Integrity hashes. The difference is what happens at install time. npm's own docs: a package "is loaded into the cache, and then unpacked into ./node_modules/foo". Unpacked — copied out. The shared name exists in the cache and is abandoned at the project boundary; each project's copy is thereafter identified by where it sits, not by what it contains. The split between npm and pnpm is not content-addressing-versus-not; it is copy out of the store versus link into it.
The pnpm model: one store, linked
pnpm makes one change with consequences all the way down: it keeps a single, global, content-addressed store, and a project's node_modules is built out of links into that store rather than copies. From the same motivation page:
All the files are saved in a single place on the disk. When packages are installed, their files are hard-linked from that single place, consuming no additional disk space.
Hard link — a second directory entry pointing at the same bytes on disk, not a copy of them. The file has one set of blocks and two (or a hundred) names; deleting one name leaves the bytes alive as long as another points at them. A reflink (copy-on-write clone) shares blocks the same way but gives the new name private blocks the moment either side writes. Learn more.
The title of this essay is a quantity, so weigh it yourself:
The store lives in a fixed place per machine — on Linux, ~/.local/share/pnpm/store; on macOS, ~/Library/pnpm/store — and holds each file once, addressed by a hash of its content. When you install a package, pnpm does not copy its files into your project. It links them. The precise mechanism is a per-filesystem choice — the default packageImportMethod: auto will, per pnpm's settings docs, "try to clone packages from the store. If cloning is not supported then hardlink packages from the store," and copy only as a last resort. On APFS (the macOS default) and other copy-on-write filesystems that means clones; elsewhere, hard links. Either way the effect is the same: your node_modules entry and the store entry share the same blocks on disk. A hundred projects linking the same package add the package to the store once and pay only for a hundred cheap directory entries, not a hundred copies of the bytes.
This is exactly the build-cache move from the previous article, run in reverse. There, FindMissingBlobs meant identical content was uploaded once, ever. Here, identical content is stored once, ever. Both work for the same reason: the content's hash is its identity, so two things with the same bytes are, to the system, the same thing.
Why content addressing pays off twice: per-file dedup
The deepest part is subtle, and it is where naming-by-content earns its keep a second time. The store is keyed by the hash of each file, not by the package the file came from. So the unit of sharing is the file, not the package — and dependency versions that overlap share their unchanged files automatically. From pnpm's docs:
For instance, if it has 100 files, and a new version has a change in only one of those files,
pnpm updatewill only add 1 new file to the store, instead of cloning the entire dependency just for the singular change.
Imagine a library where every book is filed by a fingerprint of its contents, not by who wrote it or which shelf it came from. Two slightly different editions of the same book share every page that didn't change; only the one revised page is filed anew. You never store the same page twice, even across different books.
Note pnpm's own "for instance": the one-file upgrade is an idealization. Any real release changes at least package.json (the version field lives there), so the practical floor is two files, and real releases touch a handful. The point survives intact: the unit of storage is the file, so an upgrade pays for the files that changed, not a fresh copy of the whole package. Because identity is content, the question "do I already have this?" is answered for files, not packages — and the answer is yes far more often than per-package thinking would suggest.
The layout that falls out: a flat store and symlinks
There is a structural bonus that comes free with this design, and it fixes a second old npm complaint. pnpm lays out node_modules as a flat virtual store under node_modules/.pnpm, with the real (hard-linked) package directories there and symlinks arranging them into the dependency graph each package actually sees. The structure docs put it precisely:
Every file of every package inside
node_modulesis a hard link to the content-addressable store.
Phantom dependency — a package your code can require() even though you never declared it, because a flattened node_modules happened to hoist it where your code could reach it. It works until the day the dependency that pulled it in is removed, and then it silently breaks.
Because each package only gets symlinks to the dependencies it actually declared — rather than a single flattened heap where everything is reachable — code cannot import packages it never listed. pnpm calls this out as the payoff:
A great bonus of this layout is that only packages that are really in the dependencies are accessible.
That is the phantom-dependency class of bug, designed out of existence by the same layout that saved the disk space. Content addressing gave a file one identity; the symlink graph gives each package exactly the view it declared, and nothing more.
The honest limits
The mechanism is a lever, not a miracle. Three caveats keep the claim truthful.
- Links need one filesystem. A hard link (or clone) can only point at bytes on the same volume, so pnpm keeps "one store per disk." Put the store on a different drive than your project and, in pnpm's own words, it "will copy packages from the store instead of hard-linking them."
- Hard links cut both ways. Two names for the same blocks means an edit through either name edits both: patch a file inside a hard-linked
node_modulesand you have silently modified the shared store for every project on the machine. pnpm knows this — its settings docs note that with cloning "you may edit files in your node_modules and they will not be modified in the central content-addressable store," andverifyStoreIntegrityexists precisely because store mutation happens: "if a file in the store has been modified, the content of this file is checked before linking it to a project's node_modules." - It dedups whole files, not pieces of files. Sharing is per file: two files that are byte-identical share one copy; two files that differ by a single character are two separate stored objects. pnpm does not diff within a file. "Per file, not per package" is the honest framing — already a large win, but not block-level delta compression.
- The savings are proportional, not absolute. pnpm's disk benefit grows with the number of projects and shared dependencies. One project with no overlap saves little; a laptop with forty projects sharing a deep, common dependency tree saves enormously. The benchmark numbers pnpm publishes are about speed and shift with each release — treat any specific figure as as-of-a-date, not a constant.
Worth one more sentence of breadth: pnpm's is not the only answer to the title's joke. Yarn Plug'n'Play removes node_modules altogether — a single generated loader file tells Node where every package lives — trading filesystem convention for resolution logic rather than deduplicating the convention.
The same idea, twice
Step back and the two articles are one idea applied in two directions. A file's content hash is its identity. Point that at a build farm and identical work is computed once and shared by everyone. Point it at your node_modules and identical files are stored once and deduplicated across every project. Same principle, opposite scarcity: in the cloud it buys time; on your disk it buys space.
Name a thing by its contents and duplication becomes impossible by construction — whether the thing is a compiled artifact or a file in node_modules.
The joke about node_modules was always really a joke about copying. Once the package manager names files by content instead of by location, the copies collapse into links, the heaviest object in the universe goes on a diet, and the grievance turns out to have had a one-line fix all along: stop storing the same bytes twice.
Lessons
- npm has a content-addressed cache too (
cacache, keyed by integrity hashes) — but it unpacks copies into every project'snode_modules; "100 projects using a dependency" means "100 copies on disk." - pnpm keeps one global content-addressed store and links files into each project — copy-on-write clones where the filesystem supports them, hard links otherwise — so identical bytes are stored once and shared.
- Because the store is keyed by file content, dedup is per-file: a new package version that changes one file adds one file, not a fresh copy of the whole package.
- The flat
.pnpmstore plus a declared-only symlink graph also designs out phantom dependencies — you can't import what you didn't list. - It's the same "the hash is the identity" principle as a remote build cache, aimed at space instead of shared work. Limits: hard links need one filesystem; dedup is per-file, not block-level; savings scale with how much your projects overlap.
References
- “pnpm: motivation.” pnpm. — "saving disk space," in pnpm's own words
- “pnpm: the symlinked node_modules structure.” pnpm. — the store-and-symlink layout
- “pnpm: settings (packageImportMethod, verifyStoreIntegrity).” pnpm. — clone vs hardlink vs copy, and the store-integrity check
- “cacache.” npm. — npm's own content-addressable cache
- “Yarn Plug'n'Play.” Yarn. — the eliminate-node_modules alternative
- “Hard links.” — and content-addressable storage — the underlying mechanisms
- “The Hash Is the Identity.” — the same principle, in a remote build cache
How to cite
Mangalapilly, Y. J. (2026, June). node_modules Is the Heaviest Object in the Universe. Saṃhitā Notes. https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/ @online{mangalapilly2026node,
author = {Yesudeep Jose Mangalapilly},
title = {node\_modules Is the Heaviest Object in the Universe},
journal = {Sa\d{m}hit\=a Notes},
year = {2026},
month = {June},
url = {https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/},
urldate = {2026-07-02},
} Yesudeep Jose Mangalapilly. “node_modules Is the Heaviest Object in the Universe.” Saṃhitā Notes, 2026. https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/. TY - ELEC
AU - Mangalapilly, Yesudeep Jose
TI - node_modules Is the Heaviest Object in the Universe
T2 - Saṃhitā Notes
PY - 2026
UR - https://yesudeep.com/blog/node-modules-is-the-heaviest-object-in-the-universe/
Y2 - 2026-07-02
ER - Webmentions
Annotations
Thank you — your note is held for review and will appear once approved.
Thank you — your note is published.
Please sign in below to leave a note.
