Notes · Dissecting Real Systems

growing

The Build Is Proportional to the Change

What every build system is really doing — and the one decision that separates the ones that scale.

Yesudeep Jose Mangalapilly

Published 8 days ago · Updated yesterday · 13 min read

build-systems, bazel, buck2, gradle, dissecting-systems

Civilization advances by extending the number of important operations which we can perform without thinking about them.

— Alfred North Whitehead, An Introduction to Mathematics (1911)

Cite this

APA

Mangalapilly, Y. J. (2026, June). The Build Is Proportional to the Change. Saṃhitā Notes. https://yesudeep.com/blog/build-is-proportional-to-change/

BibTeX

@online{mangalapilly2026the,
          author  = {Yesudeep Jose Mangalapilly},
          title   = {The Build Is Proportional to the Change},
          journal = {Sa\d{m}hit\=a Notes},
          year    = {2026},
          month   = {June},
          url     = {https://yesudeep.com/blog/build-is-proportional-to-change/},
          urldate = {2026-07-02},
        }

Plain

Yesudeep Jose Mangalapilly. “The Build Is Proportional to the Change.” Saṃhitā Notes, 2026. https://yesudeep.com/blog/build-is-proportional-to-change/.

RIS

TY  - ELEC
        AU  - Mangalapilly, Yesudeep Jose
        TI  - The Build Is Proportional to the Change
        T2  - Saṃhitā Notes
        PY  - 2026
        UR  - https://yesudeep.com/blog/build-is-proportional-to-change/
        Y2  - 2026-07-02
        ER  -

The first of five pieces dissecting how build systems work. By the end you'll be able to: describe the four-step algebra every build system shares; explain why content hashes beat timestamps; and articulate the one design decision — a bounded vs. a Turing-complete configuration language — that separates the systems that scale from the ones that don't.

Someone I follow posted a familiar complaint the other day: a C++ project that had grown just large enough to hurt. Compiles crept from seconds to minutes, the IDE lagged, CMake took an age to load. The instinct was to buy a faster laptop — to throw silicon at the problem.

It's the wrong lever. A faster machine shrinks the build a little, and linearly: twice the cores, roughly half the wait, until you hit the next wall. What you want is a different shape of cost altogether — a build whose price is set by what you changed, not by how much exists. That isn't a hardware property. It's an architectural one, and it's the whole reason build systems exist.

Hardware buys a linear discount; the right model changes the shape of the cost.

This is the first in a series taking real systems apart at the source level — reading the code, not the marketing — to see how they actually work. Build systems are a good place to start, because the good ones all converge on the same handful of ideas, and the single decision that separates them is unusually clean.

Ten million files, and you change one

Start with the thought experiment that the whole field is an answer to. Imagine a repository of ten million files. You edit one of them. How much should rebuild?

The lazy answer is "everything downstream" — anything that might, transitively, depend on what you touched. The right answer is far smaller: only the things that genuinely depend on the change, and nothing else. The gap between those two answers is where build systems live.

The way to see it is as a graph. Files and the actions that relate them form a directed graph; editing a node dirties the things reachable from it, and only those. Click a node below and watch the dirty set ripple outward — and notice what doesn't light up.

Edit a.c and the work is a.o, lib, app — three actions, because those are what depend on it. b.c and b.o sit untouched. The cost of the build was proportional to the change, not to the six-node graph, and certainly not to a ten-million-file repository it might have been embedded in. That is the entire promise. Everything that follows is about how different systems keep — or break — it.

The cost of a build should be proportional to the size of the change, not the size of the repository.

What a build system actually is

Strip away the surface and every build system is the same four-step pipeline. The differences between them are entirely in how each step is implemented.

The shared algebra. Every build system is some implementation of these four steps.

You describe the work as a graph of nodes (files) and edges (the actions that turn inputs into outputs). The system decides which nodes are dirty — stale with respect to what changed. It executes the minimal set of actions to make them fresh. And it caches results so the same work is never done twice. Four steps. The art is all in the seams: how you detect dirtiness, what a cache key is made of, whether execution can be sandboxed or shipped to another machine, and — the decision this whole essay turns on — what language you write the graph in.

Task-based or artifact-based?

Before the field divides on those seams, it divides on a more basic question: what does the user actually describe? There are two answers, and they shape everything downstream.

In a task-based system you describe tasks — named, imperative procedures — and the dependencies between tasks. make test depends on make build; a Gradle assemble task depends on compile. You run a task by name, and the system orders the tasks it depends on. Make, Gradle, npm run, and Rake all work this way. The mental model is "a graph of things to do."

In an artifact-based (or rule-based) system you describe outputs — the artifacts you want — and what each is derived from. You don't write "compile then link"; you declare that app is produced by linking these objects, and each object is produced by compiling a source. The system derives the action graph from the artifact dependencies. Bazel and Buck2 work this way. The mental model is "a graph of things that exist, and the rules that relate them."

A task-based build is a to-do list: "wash, chop, fry, plate" — and when something changes, you re-run the list from wherever you feel unsure. An artifact-based build is a pantry inventory: "a plated dish exists if fried vegetables exist, which exist if chopped ones do" — hand it what changed and it computes the shortest path back to done. Lists re-run; inventories reconcile.

Task-based: a graph of procedures you invoke. Artifact-based: a graph of outputs, from which the actions are derived.

The distinction matters because the artifact framing is what makes "proportional to the change" achievable. When the graph is over files that exist, the system can hash those files and skip an action whose inputs are unchanged — the decision is local and content-based. When the graph is over tasks to run, the system only knows "task B depends on task A"; whether B can be skipped is up to B's own up-to-date check, which is easy to get wrong and historically often is. Task graphs tell you an order; artifact graphs tell you a derivation — and you can only cache a derivation.

This is why the modern, scale-oriented systems are all artifact-based, and why Gradle — task-based at heart — has to work hard (a per-task input/output fingerprint, a build cache, a configuration cache) to claw back the cacheability that artifact systems get by construction. The two models aren't equal; one of them makes the central promise easy and the other makes it a feature you bolt on.

The field, read at the source

Make (1976) is the baseline everyone inherits. A target is stale if its modification time is older than a prerequisite's; rules are per-file; the model is one machine and a filesystem. It is genuinely fine for small projects, and its failure modes — timestamp lies, recursive-make losing the global graph (the failure Peter Miller named in 1998's Recursive Make Considered Harmful) — are the bugs every successor set out to fix. (The ./configure dance of autoconf and Automake grew up around exactly these limitations.)

Ninja (2012) is deliberately not a configuration language. Its own manual calls it an "assembler" for builds: a generator (CMake, Meson, gn) emits the .ninja files, and Ninja just executes the graph as fast as possible. It tracks staleness by modification time plus a hash of the command line — so changing a compiler flag correctly forces a rebuild, something plain Make misses. It is single-machine by design and proud of it.

Bazel (2015) is where the model goes industrial. Dirtiness is decided by content digests, with timestamps demoted to a hint: where the filesystem can't produce a fast digest, Bazel tracks a ctime/mtime/inode proxy (FileContentsProxy, in the source) whose movement flags the file for another look; the digest itself is recomputed when the file is next consumed as an action input (DigestUtils' manual fallback, cached by that proxy) — so the hash, not the clock, makes the final call. Its incremental engine — Skyframe, in the source — memoizes a graph of pure functions and recomputes only what genuinely changed. Actions can be sandboxed for hermeticity and shipped to a remote cluster for execution and caching. And its build files are written in Starlark, a language designed with one remarkable restriction we'll come to.

Buck2 (2023), Meta's ground-up rewrite, takes the same ideas further. Its engine, DICE, holds target nodes, configured nodes, and action nodes in a single incremental graph — no separate "load then analyze" phases. Remote execution isn't bolted on; local execution is treated as the special case. It, too, uses Starlark (a Rust implementation), and is honest in its own README that local-only builds aren't yet hermetic — the guarantee holds under remote execution.

Gradle uses content hashes like Bazel — but writes its build scripts in Groovy or Kotlin, full general-purpose languages. That one choice, as we'll see, costs it something the others don't pay.

Scored against the four steps, the field looks like this:

System	Dirty by	Graph engine	Cache	Hermetic	Config language
Make	mtime	per-file rules	none	no	Make
Ninja	mtime + cmd hash	Node/Edge	build log	no	(generated)
Bazel	content digest	Skyframe	local + remote CAS	sandbox + RE	Starlark — bounded
Buck2	content digest	DICE (one graph)	remote-first	RE-only	Starlark — bounded
Gradle	content digest	task graph	local + remote HTTP	no	Groovy/Kotlin — full

The first two columns of that table have a formal map: Build Systems à la Carte (Mokhov, Mitchell & Peyton Jones, ICFP 2018) factors any build system into a rebuilder — how "dirty" is decided — crossed with a scheduler — how the graph is walked — and reconstructs Make, Bazel, and even Excel as points in that grid.

Read down the last column. Three of the five chose a deliberately limited language; one chose an unlimited one. That column is the lede.

The restriction is the design

Here is the insight the whole survey was built to reach. The best build systems rest on a deliberate restriction — a configuration language chosen to be general enough to be useful but bounded enough to evaluate predictably during analysis.

Starlark — as specified, and as Bazel enforces it — is, on purpose, not Turing-complete. You can read this directly in the source.

Turing-complete — able to express any computation a general-purpose computer can, including loops that may never end. A language that is not Turing-complete trades that power for a guarantee that every program halts. Learn more.

Both implementations refuse while loops outright — Bazel's parser answers "'while' not supported, use 'for' instead" — so iteration is bounded by a collection that already exists. Recursion is where they differ: Bazel forbids it at runtime ("function called recursively"), while Buck2's Rust implementation permits it and stops runaway programs with a call-stack cap instead — fuel, not a language rule. Either way an evaluation always halts; in Bazel's dialect that is a guarantee of the language itself. A later piece dissects the split and what each side of it buys.

That sounds like a limitation. It is the opposite. Because evaluating a BUILD file is guaranteed to finish — and can touch nothing outside the file — the build graph can be computed by bounded evaluation — read, cached, parallelized, shared across a team — without running code that might never come back. An incremental build is, mathematically, the least fixed point of the dirty-propagation function over that graph; it's computable cheaply precisely because the space is bounded.

Gradle made the opposite choice, and you can watch it pay for it. Its build scripts are arbitrary Groovy and Kotlin, so Gradle cannot know the task graph without running the scripts to completion. Having run them, it then has to bolt on a "Configuration Cache" — a whole subsystem that serializes the result of that execution so it can be skipped next time, complete with several distinct failure modes for the cases where arbitrary user code did something un-serializable. The existence and complexity of that cache is the proof: when your configuration language can do anything, you have to execute it to learn anything, and then cache the execution itself.

A bounded language is evaluated into a graph; a Turing-complete one must be run, then its run re-cached.

This is a general principle wearing build-system clothes. The right primitive for a hard problem is usually the most general structure that stays decidable — the bounded fixed point, not the unlimited one. A DAG forbids cycles so the build always terminates. A CRDT forbids the operations that wouldn't commute so merges can't conflict. Starlark forbids unbounded computation so the graph can be discovered by bounded evaluation rather than arbitrary execution. In each case the restriction is the feature.

The honest limits

None of this makes the graph-based systems the right answer everywhere.

For a small project, Make is correct. The machinery of content hashing, sandboxing, and a bounded configuration language earns its keep only once the graph is large enough that "rebuild what changed" saves real time; below that threshold it's overhead and ceremony.

And there's a deeper limit that no build system can engineer its way past: it can only be as fine-grained as the language it builds for lets it be. Bazel scans C++ includes to track header dependencies — but at whole-header granularity. Change one declaration in a hundred-line header and every file that includes it must recompile, even the ones that never named the symbol you touched. The build graph is honest about the dependency the language actually exposes, and C++'s preprocessor exposes a coarse one. (C++20 modules are the long road out.)

Edit one symbol in a header; every includer rebuilds, because the dependency is tracked per-header, not per-symbol.

So the honest version of the original advice isn't "buy a faster Mac" and it isn't "use Bazel." It's: past a certain size, make your build's cost proportional to your changes — and understand that how granular that can get is set partly by your build system and partly by the language you feed it.

Where this goes

The survey gives us the shared algebra and the decisive restriction. The machines that implement them are worth opening up on their own, and the next pieces in this series will:

The Build That Restarts Itself — how Bazel's Skyframe actually computes the incremental graph, and what a "Skyframe restart" is doing.
There Are No Phases — Buck2's bet that one graph beats two phases, and why Meta rebuilt rather than adopted.
A Language That Can't Loop Forever — the Starlark language-design essay this survey keeps gesturing at.
The Hash Is the Identity — the content-addressed store and Merkle trees that let a whole team share one build cache, where "your build is proportional to the change" becomes "to anyone's change."

The thread through all of them is the one this piece started with: the cost of building should track what you changed. Everything good in a build system is in service of that, and the cleverest part is usually something it has decided not to let you do.

Lessons

A build is a graph; its cost should be proportional to what changed, not to how much code exists. That's an architectural property, not a hardware one.
Every build system is the same four steps — graph, decide-dirty, execute, cache — and differs only in how each is implemented.
Content hashes beat timestamps for deciding staleness, because a value that recomputes unchanged can halt the rebuild.
The decisive design choice is the configuration language: a bounded one (Starlark) can be evaluated into a graph predictably; a Turing-complete one (Gradle's Groovy/Kotlin) must be executed, then that execution cached.
The restriction is the design. The right primitive is the most general one that stays decidable.
A build can only be as fine-grained as the language it builds for allows — C++'s per-header dependency tracking is the standing example.

References

Mokhov, Mitchell & Peyton Jones. “Build Systems à la Carte.” ICFP, 2018. — the rebuilder × scheduler taxonomy this survey walks through; the JFP 2020 extension adds cloud builds
Miller. “Recursive Make Considered Harmful.” AUUGN, 1998. — why a fragmented graph can't be correct — the case for one global graph, a quarter-century early
“GNU Make manual.” — · Ninja manual (the "assembler for builds" design)
“Bazel: an introduction.” — · Buck2: why? · Gradle user guide
“Starlark.” — Bazel's build language · the Starlark spec
“Build automation.” — and incremental computing for the broader background

How to cite

APA

Mangalapilly, Y. J. (2026, June). The Build Is Proportional to the Change. Saṃhitā Notes. https://yesudeep.com/blog/build-is-proportional-to-change/