Notes · Dissecting Real Systems
evergreen
Designing an API That Outlives You
I wrote watchdog in 2010. Fifteen years and three maintainers later it still ships the same public API I designed — here's what made it last.
Every module in the second decomposition is characterized by its knowledge of a design decision which it hides from all others. Its interface or definition was chosen to reveal as little as possible about its inner workings.
— D. L. Parnas, "On the Criteria To Be Used in Decomposing Systems into Modules", CACM 15(12), 1972
Cite this
Mangalapilly, Y. J. (2026, February). Designing an API That Outlives You. Saṃhitā Notes. https://yesudeep.com/blog/designing-an-api-that-outlives-you/ @online{mangalapilly2026designing,
author = {Yesudeep Jose Mangalapilly},
title = {Designing an API That Outlives You},
journal = {Sa\d{m}hit\=a Notes},
year = {2026},
month = {February},
url = {https://yesudeep.com/blog/designing-an-api-that-outlives-you/},
urldate = {2026-07-01},
} Yesudeep Jose Mangalapilly. “Designing an API That Outlives You.” Saṃhitā Notes, 2026. https://yesudeep.com/blog/designing-an-api-that-outlives-you/. TY - ELEC
AU - Mangalapilly, Yesudeep Jose
TI - Designing an API That Outlives You
T2 - Saṃhitā Notes
PY - 2026
UR - https://yesudeep.com/blog/designing-an-api-that-outlives-you/
Y2 - 2026-07-01
ER - A retrospective on a library I wrote — and what its longevity taught me about API design. By the end you'll see the one structural decision that let watchdog's public API survive fifteen years, and a complete rewrite of everything beneath it: a small, stable facade held rigidly apart from the swappable machinery underneath.
Around 2009-2010 I needed to watch a directory for file changes in Python, across Linux, macOS, and Windows, with one piece of code. Nothing did it well, so I wrote watchdog — and the reason I needed it is a more interesting story than the library. Fifteen years later it's still on PyPI — version 6.0, released in late 2024 — maintained now by Mickaël Schoentgen, long after it passed from me to Thomas Amland to him. The internals have been rewritten more than once. The thing a user writes has barely changed:
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class MyHandler(FileSystemEventHandler):
def on_any_event(self, event):
print(event)
observer = Observer()
observer.schedule(MyHandler(), ".", recursive=True)
observer.start()That code from a 2010 tutorial still runs today. I didn't know, writing it, that I was making a thing that would outlive my involvement entirely. But looking back, one decision is why it lasted — and it's the decision this essay is about.
The half of a system that users touch should be the half that never has to change.
Why it existed: a design studio's slowest hour
watchdog wasn't born from a computer-science itch. It was born from a bottleneck in a design business. At the time I was running HappyChickoo, a small studio doing print and online media design for labor institutions and enterprises — clients like the offshore-maritime group Greatship Global, the Maharashtra Institute of Labour Studies, the Nitaai Gauras art gallery. A lot of that work was iconography and logos: artwork that had to be resized into a dozen variants and then seen in context — what does this mark actually look like sitting in a real web page?
And the loop to find out was agonizing. An illustrator would tweak a file, save it, hand it to a developer, who would drop it into a page and reload the browser to see the result — then report back, and around it went. Every small visual decision paid that whole round-trip in full. For a studio whose product was fast iteration on how things look, the slowest hour of the day was the one spent shuttling files between people just to see them rendered.
The bottleneck wasn't the design work. It was the wait between changing a file and seeing the change.
So watchdog was the fix for that: have the machine notice the instant a file changed and trigger the rebuild-and-reload itself, so the artwork appeared in the page the moment it was saved. The save-share-render ceremony collapsed into a save. It cut our delivery timelines substantially — and only later did "watch a directory and react to changes" turn out to be a general enough need that the library outgrew the studio that prompted it. The most durable tools often start as something specific that annoyed someone enough to automate.
The problem: one job, five mechanisms
Watching a filesystem for changes sounds simple until you notice every operating system does it completely differently. Linux has inotify. macOS has FSEvents (and kqueue before it). The BSDs have kqueue. Windows has ReadDirectoryChangesW — which watchdog drives as a plain blocking call on a dedicated thread, interrupted with CancelIoEx (read winapi.py: the overlapped-I/O and completion-port machinery is defined there but the read path never uses it). And on a platform that offers nothing, the fifth mechanism is the fallback: poll — snapshot the directory, wait, snapshot again, diff.
Five mechanisms, five sets of quirks, five entirely different programming models — for one conceptual task: "tell me when a file changes." A naïve library would expose some of that difference to the user. The whole value of watchdog is that it exposes none of it.
Imagine five countries that each deliver mail a totally different way — one by bicycle, one by drone, one by pneumatic tube. You just want to send a letter without learning five postal systems. watchdog is the clerk at the counter: you hand them the letter and the address, and they deal with whichever system the destination happens to use. Your side of the counter never changes, no matter how the back room works.
The decision: a facade held apart from its backends
The design is a hard line between two halves. On the user's side is a small, stable facade: an Observer you start, and a FileSystemEventHandler you subclass to receive on_created, on_modified, on_moved, on_deleted, or the catch-all on_any_event. On the other side, hidden, is a backend per platform — and the two are deliberately not allowed to leak into each other.
Underneath, watchdog is a producer/consumer system. The architecture file names the two roles exactly: an
EventEmitter— Producer thread base class subclassed by event emitters that generate events and populate a queue with them.
EventDispatcher— Consumer thread base class subclassed by event observer threads that dispatch events from an event queue to appropriate event handlers.
(Both docstrings quoted from src/watchdog/observers/api.py.)
Facade — a single, simplified interface placed in front of a more complex subsystem, so callers depend on the simple front and never on the moving parts behind it. The OS-specific emitters are the moving parts; the Observer is the front. Learn more.
Each platform supplies an emitter — InotifyEmitter, FSEventsEmitter, KqueueEmitter, the Windows ReadDirectoryChangesW backend, or the PollingEmitter — and every emitter's job is to translate its operating system's idiosyncratic signal into one uniform FileSystemEvent and push it onto a shared queue. The Observer drains that queue and dispatches to your handler. Your code talks only to the queue's far side; it never learns which emitter ran.
Observer + handler) over five swappable per-platform backends. Your code talks to the facade; the right native mechanism is chosen and hidden underneath. That seam is the whole design.This is the thing that made it last. The user-facing surface — Observer, schedule(), FileSystemEventHandler, the event classes — is small, and it sits behind a wall from the part that actually had to change. Operating systems evolved, backends were rewritten, the polling fallback was tuned, emitters came and went. None of it crossed the wall. The facade stayed put because nothing about it depended on how the job was actually done.
Longevity comes from the seam: put everything volatile on one side of a narrow interface, and let the stable side face the user.
The small decisions that aged well
The big wall is the headline, but a handful of smaller choices are the reason the facade itself never needed to grow awkward. Each one, looking back, bought years.
- A watch is a handle, not a re-description.
schedule()returns anObservedWatchobject representing that watch; to stop it you pass the handle back tounschedule(). Callers never have to re-specify a watch they already created — the library remembers, and hands you an opaque token. Opaque handles age well because they can grow new internals without changing how you hold them. - One boolean spans every backend. The
recursiveflag — "watch the whole subtree, or just this directory" — means the same thing oninotify,FSEvents, and polling alike. A single, uniform knob over five wildly different mechanisms is exactly the leak that didn't happen. - A catch-all alongside the specifics. You can override the precise
on_created/on_modifiedhooks, or juston_any_event, documented plainly as the "catch-all event handler." Offering both the firehose and the filtered taps, without forcing a choice, let simple users stay simple and serious users get precision — from the same handler contract. - Events are immutable, on purpose. A
FileSystemEventis immutable so that it "can be used as keys in dictionaries or be added to sets." That one property makes deduplicating a storm of events trivial — and it's a guarantee, not a convention, so fifteen years of callers could rely on it.
None of these is clever. Each is just a small refusal to make the user's side depend on anything that might move.
If I designed it today
The facade aged well; the shape of the call shows its age. In 2010, Python had no async / await, so the natural way to deliver a stream of events was a background thread pushing into callbacks you registered by subclassing. Control inverts — the library calls you. It works, but it's the idiom of its decade. So rather than guess at a modern replacement, I did what the method of this series demands: read what today's libraries actually do, and find the structure they converge on.
What the field settled on
I read the public surface of six current file-watchers. Each delivers a change a different way: notify (Rust, the backend much of the ecosystem wraps) hands you a channel; fsnotify (Go) gives you two channels, one for events and one for errors; chokidar (Node) is an event emitter you attach .on('change', …) to; @parcel/watcher calls you back with a batch array; and watchdog, of course, runs a thread and calls your subclassed method.
One of them states the modern shape plainly. watchfiles — Python over a Rust core — is simply:
async for changes in awatch("./src"):
for change in changes:
print(change)No handler class, no thread you start and join, no callback. The events are an asynchronous stream; you iterate it.
The structure underneath all of them
Line those six idioms up and the same object is under every one. A channel is a stream you receive from. An event emitter is a stream fanned out by event name — .on('change', …) is "filter to changes, then for-each." A thread-plus-callback is the stream with control inverted: the library runs the loop and calls you. A batch callback is a stream whose element is a set of changes rather than one. An async generator is that stream stated as itself, with nothing wrapped around it.
A watch is an asynchronous stream of immutable change events. Callbacks, threads, channels, and emitters are all just encodings of that one thing.
So the async iterator isn't a stylistic preference; it's the fixed point — the form the others are degenerate cases of, and the only one that needs no adapter. That settles the core contract. The rest of the design is then a matter of reading what each library learned the hard way and folding it in.
The design, with each choice earned
async for — Python's syntax for consuming an asynchronous generator: a producer that yields values over time without blocking the event loop, and a consumer that reads them in a plain loop. It turns "register a callback and wait" into "iterate." Learn more.
The event is immutable — frozen, hashable, usable in a set — because every library that thought about it made it so (watchdog's events are "required to be immutable"; watchfiles puts them in a set). And each tick yields a batch, not a lone event, because coalescing redundant changes is mandatory, and a set is what coalescing produces — watchfiles yields a set per tick, @parcel/watcher guarantees "only one notification per file."
async for batch in watch(
"src/",
changes={Change.created, Change.modified},
ignore=["*.pyc", ".git/"],
debounce=0.4,
):
rebuild(batch)Every parameter there is something a real library already ships: changes is notify's kernel-level event-kind mask; ignore is chokidar's and @parcel's ignore lists; debounce is watchfiles' debounce and the entire reason notify's separate debouncer and chokidar's awaitWriteFinish exist. Debouncing is a parameter, not your chore, precisely because fsnotify — which has none — has to send users to a hand-rolled dedup loop and can still drop events as an overflow. And because the core is just a stream, filtering can also be ordinary iterator composition: an async def that wraps watch(…) and re-yields the subset. The language's own async-iterator machinery is the transform algebra; chokidar had to invent an emitter, we inherit it.
Two old footguns disappear by construction. Cancellation is just leaving the loop — exit the async for by break, exception, timeout, or task cancellation and the watcher tears itself down, so watchdog's stop() then join() two-step (forget the join() and you leak a thread) has nothing left to forget. And errors arrive at the loop boundary: a fatal condition raises out of the async for into the same try as your handler, and lost-event overflow is delivered as a typed event in the stream rather than on a separate channel you might forget to wire up — the split that both fsnotify and chokidar make you remember.
What the issue tracker says it would fix — and what it wouldn't
It's tempting to claim a redesign fixes everything users complain about. Watchdog's issue tracker won't let me, and the honest split is the interesting part. Some of the most-reported pain is the 2010 API's shape, and the async stream removes it structurally:
- *"Modified files trigger more than one event" (#346, #1003).* One save, three callbacks — because there's no built-in place to coalesce, so every user reinvents it. First-class batching per quiet window collapses the redundant ones into a single yielded set.
- *"Introduce a debouncer" (#315).* Literally a request for the
debounceparameter. (Watchdog eventually added one — but only to its CLI auto-restart wrapper, not the event API. Debouncing belongs in the contract, not bolted onto one command.) - *"Consuming events in a recurring loop rather than on occurrence?" (#392).* A user asking, in so many words, for a pull model instead of pushed callbacks.
async foris exactly that. The existence of a whole wrapper library — hachiko, "an asyncio-compatible wrapper around Watchdog" — is the market confirming the callback model taxes async users. - *"EventEmitter threads never terminate" (#64); execution stuck (#700).* The
stop()/join()handshake that hangs, the thread you must check is alive. When cancellation is just leaving the loop, the user has no thread to mismanage.
But the tracker is just as clear about what a redesign can't touch, and an honest essay has to say so. A new API fixes how events are delivered; it cannot fix what the operating system emits.
The "inotify watch limit reached" failures are a kernel resource cap — identical under any delivery model. Move-and-rename unreliability sits at the seam between kernel and library: for a file moved out of the watched tree, the kernel actually reports an IN_MOVED_FROM carrying a pairing cookie — the man page notes the cookie "allows the resulting pair of IN_MOVED_FROM and IN_MOVED_TO events to be connected" — but the matching IN_MOVED_TO never arrives, because the destination is unwatched. The library can only degrade the unpaired half-move to a delete; and macOS's FSEvents gives no reliable rename pairing at all. Network filesystems emit no events to deliver. And even the duplicate-event complaint has a hard half the parameter only mitigates: when the three modified events are because a large file is still being written (#309), no debounce window truly knows the write is done — there is no portable "file closed" signal — so it trades the symptom for latency, honestly, rather than curing it.
Note
The dividing line is clean: the redesign fixes the ergonomic complaints (no coalescing point, no pull model, hand-managed threads) and inherits the platform ones (kernel limits, rename semantics, network filesystems) unchanged. Worth knowing which half any "rewrite" is actually addressing.
Is "async all the way down" even possible?
The redesign delivers events through an async iterator — but is the watch itself async underneath, or is there a thread hiding? The honest answer is platform-dependent, and it's worth being precise because the easy claim ("it's all async now!") is false.
On Linux, yes: inotify is a readable file descriptor, so an event loop can watch it directly (loop.add_reader) with no thread at all — async to the metal. But macOS and Windows can't. macOS's FSEvents delivers through a Core Foundation run loop callback, not a pollable descriptor; Windows's ReadDirectoryChangesW completes through overlapped I/O for which stock asyncio offers no public registration API (its proactor loop is IOCP-based internally, but that machinery isn't exposed for arbitrary handles). On both, something must run a native loop and forward events across — a bridge thread is unavoidable in practice.
This is exactly why watchfiles runs its watcher on a worker thread and surfaces it as an async generator: not laziness, but the only portable option. So the precise claim is narrower than "async all the way down":
Thread-free watching is achievable on Linux and impossible on macOS and Windows. The async-iterator contract is the portable part — not the absence of a thread.
And that's fine — it's the whole point of the seam. The thread becomes a hidden implementation detail with no user-visible lifecycle. "There's a thread, but you can't touch it or deadlock on it" is precisely what retires the stop() / join() bug class from your code.
The four ways to deliver a change
Lined up, the delivery models trade off along the same axes — and the async iterator isn't best because it hides a thread (they all do, on macOS and Windows) but because it hands you caller-driven control with clean cancellation:
| thread + callback | channel | event emitter | async iterator | |
|---|---|---|---|---|
| example | watchdog | notify, fsnotify | chokidar | watchfiles |
| control flow | inverted | caller-driven | inverted | caller-driven |
| cancellation | manual; can hang | clean (close it) | manual | clean (leave loop) |
| backpressure | none (unbounded) | bounded channel | poor | bounded — if the bridge builds it |
| coalescing | none built-in | raw | opt-in | first-class |
| filtering | subclass + ifs | over the channel | glob option | parameters |
| loop integration | poor (needs glue) | good | native (JS) | native |
| thread hidden? | no — you manage it | yes | yes | yes |
What stays exactly the same
Here is the part that matters for an essay about durable design. Change all of that on top, and the design underneath is untouched: one facade over per-platform emitters feeding a single stream of immutable events. The async iterator is a nicer doorway onto the very same room. The wall between the stable surface and the swappable backends is what makes the surface replaceable — you could ship the async API as a thin new facade over the unchanged emitter machinery. It's the same thin-surface-over-deep-room pattern an SDK uses when its stable core names only an abstraction and lets plugins supply the swappable instances.
A good seam doesn't only let the implementation evolve. It lets the interface evolve too, when the language itself moves on.
What I'd tell my 2010 self
The honest part: I didn't design watchdog expecting it to outlive me. I designed it to hide an annoying cross-platform mess behind one clean call, because that's what I wanted to use. The longevity was a side effect of that instinct — keep the surface small, keep the volatile machinery behind a wall — applied without my fully realizing it was a durability strategy.
But that is the strategy, and it generalizes past this one library. The lifespan of a public API is set by how little of your implementation it reveals. Every internal detail you let leak into the signature is a future migration you've signed up for; every one you hide behind a facade is a freedom to change your mind later without breaking anyone — the freedom you forfeit the moment a consumer you can't reach pins a version of you. A small, honest surface over a wall is a gift to whoever maintains the thing after you — and, fifteen years on, the documentation still carries my name on its title page next to "and contributors," which is the nicest possible proof that the wall held.
You don't make an API last by predicting the future. You make it last by refusing to let the future leak through the door.
Lessons
- The half users touch should be the half that never changes. Put a small, stable facade in front, and everything volatile behind it.
- watchdog's seam is a producer/consumer wall: per-platform emitters (
inotify,FSEvents,kqueue,ReadDirectoryChangesW, polling) push uniform events onto a queue; oneObserverdispatches them. The user never learns which backend ran. - Small choices compound: opaque watch handles, one
recursiveflag across all backends, a catch-all beside the specific hooks, and immutable events usable as set keys — each refuses to leak the implementation. - An API's lifespan is set by how little it reveals. Every leaked detail is a future migration; every hidden one is a freedom to change later.
- A redesign fixes delivery, not physics. The issue tracker is split cleanly: an async stream cures the ergonomic complaints (no coalescing point, no pull model, hand-managed threads) but inherits the platform ones (kernel watch limits, rename semantics, network filesystems) unchanged — and "async all the way down" is real only on Linux, since FSEvents and
ReadDirectoryChangesWforce a hidden bridge thread. - You can't predict the future — only refuse to let it through the door. That's what let one 2010 design survive three maintainers and a decade of OS churn.
References
- “watchdog.” GitHub. — on GitHub · the API reference and changelog
- “watchfiles.” — the
async forshape; the rest of the modern field read for the redesign: notify (Rust, the common backend), fsnotify (Go channels), chokidar (Node emitter), @parcel/watcher (guaranteed coalescing) - “watchdog issue tracker.” — the evidence for what a redesign would and wouldn't fix · hachiko, the asyncio wrapper users reach for
- “facade pattern.” — and the producer–consumer structure underneath it
- “inotify(7).” man7.org. — the kernel's own account of move cookies and their limits
- “The Hash Is the Identity.” — another case where a narrow interface is the whole design
- “Reading a Codebase.” — the method, including "what did the design refuse to allow?"
How to cite
Mangalapilly, Y. J. (2026, February). Designing an API That Outlives You. Saṃhitā Notes. https://yesudeep.com/blog/designing-an-api-that-outlives-you/ @online{mangalapilly2026designing,
author = {Yesudeep Jose Mangalapilly},
title = {Designing an API That Outlives You},
journal = {Sa\d{m}hit\=a Notes},
year = {2026},
month = {February},
url = {https://yesudeep.com/blog/designing-an-api-that-outlives-you/},
urldate = {2026-07-01},
} Yesudeep Jose Mangalapilly. “Designing an API That Outlives You.” Saṃhitā Notes, 2026. https://yesudeep.com/blog/designing-an-api-that-outlives-you/. TY - ELEC
AU - Mangalapilly, Yesudeep Jose
TI - Designing an API That Outlives You
T2 - Saṃhitā Notes
PY - 2026
UR - https://yesudeep.com/blog/designing-an-api-that-outlives-you/
Y2 - 2026-07-01
ER - Webmentions
Annotations
Thank you — your note is held for review and will appear once approved.
Thank you — your note is published.
Please sign in below to leave a note.
