Notes · Dissecting Real Systems

growing

Everything Is an Action

The architecture of Firebase Genkit rests on one primitive — a self-describing, observable, callable function — and the whole SDK is layers of specializations of it.

· · 9 min read

architecture, genkit, gen-ai, sdk-design, software-design, dissecting-systems

I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.

— Fred Brooks, The Mythical Man-Month (1975), ch. 4

Cite this
APA
Mangalapilly, Y. J. (2025, January). Everything Is an Action. Saṃhitā Notes. https://yesudeep.com/blog/everything-is-an-action/
BibTeX
@online{mangalapilly2025everything,
          author  = {Yesudeep Jose Mangalapilly},
          title   = {Everything Is an Action},
          journal = {Sa\d{m}hit\=a Notes},
          year    = {2025},
          month   = {January},
          url     = {https://yesudeep.com/blog/everything-is-an-action/},
          urldate = {2026-07-01},
        }
Plain
Yesudeep Jose Mangalapilly. “Everything Is an Action.” Saṃhitā Notes, 2025. https://yesudeep.com/blog/everything-is-an-action/.
RIS
TY  - ELEC
        AU  - Mangalapilly, Yesudeep Jose
        TI  - Everything Is an Action
        T2  - Saṃhitā Notes
        PY  - 2025
        UR  - https://yesudeep.com/blog/everything-is-an-action/
        Y2  - 2026-07-01
        ER  - 

An architecture walk through Firebase Genkit, from the design I worked on for its Python implementation. By the end you'll see the single primitive the whole SDK is built on — the Action — how models, tools, flows, retrievers, and evaluators are all instances of it, and how the layers above it (the generation veneer, the RAG pipeline, the dev-time reflection server) are specializations rather than separate systems.

Genkit is Google's open-source SDK for building generative-AI features — calling models, composing prompts, doing retrieval-augmented generation, running evaluations — across JavaScript, Go, Python, and a preview Dart SDK. I worked on the architecture of the Python implementation in early 2025, before its public alpha that April. This is a dissection of how it's put together, and the thing worth dissecting is that it's put together from one idea.

Genkit has many capabilities and one primitive. Everything you can call is an Action; everything else is layers of specialization over it.

The primitive: an Action

Start where the method says to — the one type everything hangs from. In Genkit it is the Action, and its own source comment — the doc comment on Action in js/core/src/action.ts — is the cleanest definition you'll find:

Self-describing, validating, observable, locally and remotely callable function.

Unpack those four adjectives, because each is load-bearing. Self- describing: an Action carries its name and a schema for its inputs and outputs. Validating: those schemas are enforced at the boundary, so a malformed call fails before it runs. Observable: every invocation emits a trace span, so you can see what happened. Locally and remotely callable: the same Action can run in-process or be invoked over the wire. A function with those four properties is not just a function — it's a function the rest of the system can reason about without knowing what it does.

Think of an Action like a labeled, sealed appliance with a spec sheet on the side. You don't need to know what's inside; the label tells you what goes in, what comes out, and it has a little meter showing it's running. Because every appliance in the kitchen follows that same convention, the kitchen's wiring — power, monitoring, remote control — is built once and works for all of them.

Why one primitive is the whole design

Here is the move that makes the architecture cohere. Genkit doesn't have a model system, and a tool system, and a flow system, and a retriever system. It has Actions — and a model, a tool, a flow, a retriever are all kinds of Action. The registry enumerates the kinds (ACTION_TYPES in js/core/src/registry.ts), and the core of the list is the architecture in miniature: model, tool, flow, prompt, retriever, indexer, embedder, reranker, evaluator. Each is the same primitive with a different schema and meaning. The enum has kept growing since this piece was first planted — as of mid-2026 it holds twenty-one kinds, including background-model, resource, an agent trio, and a versioned tool.v2 — and the accretion is an honest datapoint about the design: the closed list of kinds is exactly where a one-primitive architecture absorbs new requirements, and a .v2 entry is the registry conceding that even envelopes eventually need migrations.

Action kinds — the registry tags each Action with its kind. A model takes a prompt and returns a completion; a retriever takes a query and returns documents; a flow wraps your own logic. Same envelope, different contents — which is why one set of machinery serves all of them.

The payoff is leverage. Because tracing is defined once on Action, every model call, tool call, and retrieval is observable for free. Because remote callability is defined once on Action, the dev tooling can invoke any of them over HTTP without special-casing each kind. Because schema validation is defined once on Action, every kind gets type-safety at its boundary. Build the envelope well, once, and every capability you add inherits all of it. That is what it means for an architecture to be one idea applied until the features fall out.

The layers above it

If the Action is the floor, the rest of the SDK is layers stacked on it, each a narrower, friendlier specialization of the one below. It helps to name the tiers — these are my descriptive labels for the shape, not Genkit's own terms — from the floor up.

Genkit's layered SDK, redrawn from the original design. The Action/Flow/Registry substrate (Core) carries everything; the AI layer adds model and document abstractions; a thin Veneer is what most users actually call; plugins slot concrete providers in from the side.

Core is the substrate: Actions, the Flows that wrap your own AI logic into typed, streamable, deployable functions, the Registry that every Action registers itself with, and the telemetry integration. A Flow, in Genkit's words, "wraps your AI logic to provide type-safe inputs and outputs, streaming support, developer-UI integration, and easy deployment" — which is to say, a Flow is an Action you wrote, dressed for production.

The AI layer adds the domain vocabulary: a Model abstraction and a Document abstraction, and on top of them the retriever, indexer, reranker, and embedder; tools; prompts; chat sessions and agents; output formats; and model middleware. The middleware deserves a callout because it's a tidy idea — small wrappers that give every model capabilities the underlying provider may lack.

The model-middleware, by their real names (exports of js/ai/src/model/middleware.ts): simulateSystemPrompt (a system prompt for models without one), simulateConstrainedGeneration (structured output by injected instruction), augmentWithContext (append retrieved documents — RAG for models with no native context slot), downloadRequestMedia (fetch referenced media URLs), and validateSupport (fail fast when a request needs a capability the model lacks) — plus retry and fallback, deprecated in place. Each is a function wrapping a model Action — so middleware is just Actions composing Actions.

The Veneer is the top: the small, friendly surface most users ever touch. generate() — "the primary interface through which you interact with generative AI models" — plus the retrieval helpers retrieve(), index(), rerank(), embed(). Underneath, generate() orchestrates model Actions, tools, formats, and middleware; on the surface, it's one call. The veneer is thin on purpose: it's the small stable doorway onto a deep room.

Plugins: concrete providers behind the abstractions

The abstractions would be empty without implementations, and Genkit keeps those at arm's length in plugins — separate packages, one per provider. google-genai, vertex-ai, ollama and the rest each register concrete models, embedders, and retrievers as Actions into the registry. The framework depends on the abstraction; the plugin supplies the instance. Separate packages, yes — but not separate repositories: the core and every plugin live in one workspace, so a change to the core's interface breaks its plugins in CI, not in a user's install. (A companion piece makes that case.)

The package layout: a thin user-facing genkit over core and ai; prompts authored as dotprompt files rendered through Handlebars; a plugin depending on genkit to register a concrete provider.

This is the dependency-inversion principle (which a later piece takes up at module scale) applied to a whole SDK: the stable core never names a vendor, and a new provider is a new package, not a change to the framework. Prompts get the same treatment — authored as Dotprompt files (.prompt, with YAML frontmatter), "an executable prompt template file format… agnostic to programming language and model provider" that "extends the popular Handlebars templating language with GenAI-specific features." The prompt is data, rendered through a template engine, not code baked into the SDK.

The reflection server: the registry, made inspectable

One more piece ties the architecture to the developer experience, and it falls straight out of "everything is a registered Action." Because every Action is named and remotely callable, and because the registry knows them all, you can expose the whole catalog over a local HTTP server and get a developer UI for free. That's the Reflection API — in the Python implementation, a Starlette app with routes like /api/actions (list the catalog) and /api/runAction (invoke one), which the dev UI talks to.

Because every capability is a named, callable, self-describing Action, the tooling can list and run all of them without knowing what any of them is.

It runs only in development — it's deliberately disabled in sandboxed runtimes, since exposing the registry is exactly the unrestricted access production shouldn't grant. But in development it's the reason you can open a UI, see every flow and model and prompt your app registered, and run any of them by hand. The introspection tooling isn't a separate feature built alongside the SDK; it's a consequence of the SDK's one primitive being self-describing — the same payoff a build engine gets from making every computation a uniform, named node in one registry.

The Python implementation: async-first

Worth a note since it's where I worked. The Python SDK is async-first — its API is asynchronous, with strong typing through Pydantic schemas playing the role that the JavaScript implementation gives to Zod. Actions are async def; flows and the reflection server are ASGI apps. The shape follows the domain: a generation call is mostly waiting — on a model, on a tool, on retrieval — so an async core lets one process keep many requests in flight without a thread per request. The same "schemas at every boundary" discipline that makes an Action self-describing in JavaScript is what Pydantic provides in Python, so the architecture ports without distortion.

Catto in a hat. Cat images generated by a Gemini model using Genkit Python.
Catto in a top hat. Same Gemini model, via Genkit Python.

The whole thing, in one sentence

Step back and Genkit is a single primitive elaborated. An Action is a self-describing, observable, callable function. Models, tools, flows, retrievers, and evaluators are kinds of Action. The AI layer specializes them into a domain vocabulary; the veneer wraps the common cases into a few friendly calls; plugins supply the concrete providers; and because the primitive is self-describing, the dev tooling falls out for free. Many capabilities, one idea — which is the only way an SDK this broad stays comprehensible.

Lessons

  • Find the one primitive. Genkit's is the Action — a self-describing, validating, observable, locally-and-remotely-callable function.
  • Make everything an instance of it. Models, tools, flows, retrievers, indexers, embedders, rerankers, evaluators are all kinds of Action, so tracing, validation, and remote-callability are defined once and inherited by all.
  • Layer specializations, don't add systems. Core (Action/Flow/Registry) \to AI abstractions \to a thin generate() veneer; each tier narrows the one below rather than introducing a parallel mechanism.
  • Invert the provider dependency. The framework names abstractions; plugins supply concrete models and stores; prompts are data (Dotprompt/Handlebars), not code.
  • Self-describing primitives give you tooling for free. The reflection server is just the registry of named, callable Actions exposed over HTTP.

References

  1. Genkit.” GitHub. — on GitHub · the docs, Flows, RAG, and evaluation
  2. Genkit for Python and Go.” Firebase blog, 2025. — (Firebase blog, April 2025) — the public alpha
  3. Dotprompt.” — the executable prompt-file format over Handlebars
  4. Designing an API That Outlives You.” — the thin-veneer principle, elsewhere
  5. Utils Is Where Modularity Goes to Die.” — dependency inversion at module scale

How to cite

APA
Mangalapilly, Y. J. (2025, January). Everything Is an Action. Saṃhitā Notes. https://yesudeep.com/blog/everything-is-an-action/
BibTeX
@online{mangalapilly2025everything,
          author  = {Yesudeep Jose Mangalapilly},
          title   = {Everything Is an Action},
          journal = {Sa\d{m}hit\=a Notes},
          year    = {2025},
          month   = {January},
          url     = {https://yesudeep.com/blog/everything-is-an-action/},
          urldate = {2026-07-01},
        }
Plain
Yesudeep Jose Mangalapilly. “Everything Is an Action.” Saṃhitā Notes, 2025. https://yesudeep.com/blog/everything-is-an-action/.
RIS
TY  - ELEC
        AU  - Mangalapilly, Yesudeep Jose
        TI  - Everything Is an Action
        T2  - Saṃhitā Notes
        PY  - 2025
        UR  - https://yesudeep.com/blog/everything-is-an-action/
        Y2  - 2026-07-01
        ER  - 

Annotations

Thank you — your note is held for review and will appear once approved.

Thank you — your note is published.

Please sign in below to leave a note.

Type to search · ↑↓ to move · ↵ to open · Esc to close