zlow: The Algorithmic Skeleton of an LLM Pipeline
A few months ago we wrote about schema-first LLM pipelines — the pattern we’d been using in kingmaker where schemas declare what data to fetch and what side effects to run, and the LLM just fills boxes. The idea was sound. The implementation worked in production.
The problem was that the implementation was kingmaker. Hardcoded Convex queries. A 600-line step runner with if-ladders for every extension type. A trigger bus baked into the database. Nothing you could take somewhere else.
We wanted the pattern without the host. That’s zlow.
What We Were Extracting
The core insight from the previous post was the inversion: in a tool-calling agent, the LLM decides what to call. In schema-first pipelines, the schema declares what happens. The LLM fills fields. Algorithms execute effects.
Tool-calling: LLM decides "I need to call search()"
Schema-first: schema declares .rag$webSearch()
LLM fills the field. Algorithms execute the search.
In kingmaker this manifests as:
// Schema declares input enrichment
game: Z.string().rag$getGameMetadata()
// Schema declares routing
worthFollowingUp: Z.boolean().emit$on("followup:needed")
// Schema declares side effects
videoCommand: Z.string().job$video()
The LLM sees structured input, produces structured output. The annotations declare what happens before and after. The runner — which we call the step processor — is dumb. It reads the schema and does what it says.
This pattern works. We wanted to know if it was genuinely general, or if it was accidentally working because kingmaker’s data was in Convex, the extensions were all game-industry-specific, and the bus was baked in.
The Architecture
zlow has three layers:
The extension registry — a simple map from namespace$method keys to handler functions. You register handlers; the step processor looks them up when it sees matching annotations in the schema.
registerInputExtension('rag', 'getGameMetadata', async (game) => {
// fetch from wherever your data lives
return { score: 0.93, reviews: 42180, tags: ['Strategy', 'Medieval'] }
})
registerExtension('notify', 'send', async (value) => {
// post to wherever notifications go
return { sent: true }
})
Input extensions run before the model call. Output extensions run after. The schema declares which is which by which registry the handler is in, not by any annotation syntax.
The step processor — takes a schema definition, an input, and some options. Runs input extensions, assembles the enriched context, calls the model (or a mock), validates output against the schema, runs output extensions.
The pipeline — parses all schemas upfront, builds a routing table from _config.subscribesTo and _config.onComplete, runs steps in sequence following emitted triggers, threads context between steps.
The _config field in the schema is the control plane:
Z.object({
_config: Z.object({
subscribesTo: Z.literal("generate_video"),
onComplete: Z.literal("script:ready"),
systemPrompt: Z.literal("You are HYPE BEAST..."),
model: Z.literal("google/gemini-3-flash-preview"),
temperature: Z.literal(0.85),
}),
game: Z.string().rag$getGameMetadata(),
script: Z.string(),
// ...
})
Routing, model selection, system prompt, temperature — all in the schema. The Step interface is exactly two fields: name and schema. Nothing else. If it isn’t in the schema, it doesn’t exist.
This matters more than it sounds. When the schema IS the control plane, a step is a database record. It can be stored, versioned, diffed, edited at runtime. Kingmaker already does this — schemas live in Convex and can be patched via mutations. zlow is the runtime that can execute any of them.
The Thing That Made It Testable
The previous post mentioned testability. It’s worth being concrete about what that actually means in practice.
A zlow pipeline can run with any step replaced by a mock function:
const result = await pipeline([analysisStep, ragStep, publishStep]).run(
{ topic: 'test input' },
{
mocks: {
analysisStep: () => ({ topic: 'test', confidence: 0.9, needsMoreInfo: null }),
ragStep: () => ({ enrichedContext: 'found docs' }),
publishStep: () => ({ published: true, url: 'http://...' }),
},
}
)
assert.deepEqual(result.path, ['analysisStep', 'publishStep']) // ragStep skipped — needsMoreInfo was null
That test runs in milliseconds. Zero API calls. It tests real routing logic — the emit$on annotation on needsMoreInfo is what caused ragStep to be skipped. It tests real schema validation — if the mock returns an invalid shape, the test throws with a clear error. It tests real context threading — each mock receives the previous step’s output as input.
We also built a schema fuzzer that generates one test case per routing branch:
const cases = await fuzz(analysisStep.schema)
// → 2 cases: one where needsMoreInfo is null (ragStep skipped)
// one where needsMoreInfo has a value (ragStep runs)
The fuzzer only varies fields with emit$ annotations — the fields that affect routing. Non-routing fields get representative values. For a schema with two routing fields, you get four cases. For a schema with one, you get two. The test count grows with routing complexity, not schema size.
This is the concrete version of the testability claim: you can exercise every routing branch of a 10-step pipeline without a single LLM call. When you do test with real models, you’re testing one thing: does the LLM produce valid schema output? That’s a much smaller, cheaper, faster test than “does the whole pipeline work.”
Input Extensions: The Piece That Was Missing
The previous post described a pipeline where rag$ extensions run before the model call. The initial zlow implementation didn’t actually do this — extensions only ran after the model call. The LLM never benefited from the enrichment it was supposed to have.
This was the most important architectural gap. Without input extensions, the step processor’s flow is:
model call → validate → output extensions
With them, it’s:
input extensions → assemble context bundle → model call → validate → output extensions
The difference: the model sees a $context object containing all the input extension results. If rag$getGameMetadata fetches a game’s score and review count before the model runs, those numbers are in the prompt. The LLM can use them.
We ran a test to verify this was actually happening:
registerInputExtension('rag', 'getGameMetadata', (game) => ({
score: 0.93, reviews: 42180, tags: ['City Builder', 'Strategy', 'Medieval']
}))
// No mock — real Gemini call
const result = await pipeline([ingressStep, scriptwriterStep]).run(
{ game: 'Manor Lords', angle: 'Hit 1M sales in 48 hours after early access launch' }
)
Gemini’s output:
One developer just shook the entire world. Manor Lords just smashed one million sales in only forty-eight hours! Forget the hype, this medieval masterpiece is actually the real deal. You get deep base building and massive strategy battles with a 93% positive score! Over 42,000 players are already obsessed with this historical sim.
“93% positive score.” “Over 42,000 players.” Those aren’t numbers the model hallucinated from training data — they came through $context.rag$getGameMetadata. The schema declared the enrichment. The algorithm fetched it. The model used it.
That’s the inversion working end to end.
The Kingmaker Validation
We wanted to know if zlow could express the actual workflow that runs in kingmaker — not a toy version, but the real scriptwriter step with real input enrichment.
The kingmaker step runner has a ~600-line file with hardcoded extension handlers:
if (ragExtensions.getGameMetadata) {
// ... hardcoded Convex query
}
if (ragExtensions.getRecentProductions) {
// ... hardcoded Convex query
}
// ... six more if-blocks
In zlow, those eight if-blocks become:
registerInputExtension('rag', 'getGameMetadata', async (game) => { /* ... */ })
registerInputExtension('rag', 'getRecentProductions', async () => { /* ... */ })
// ...
The schema stays identical. The runner stays dumb. The handlers live wherever they belong — in your application code, connected to your actual database.
The hype-beast step expressed as a zlow Step:
const scriptwriterStep: Step = {
name: 'scriptwriterStep',
schema: `Z.object({
_config: Z.object({
subscribesTo: Z.literal("generate_video"),
onComplete: Z.literal("script:ready"),
systemPrompt: Z.literal("You are HYPE BEAST..."),
temperature: Z.literal(0.85),
}),
game: Z.string().rag$getGameMetadata(),
angle: Z.string(),
hookLine: Z.string(),
script: Z.string(),
tagline: Z.string(),
title: Z.string(),
})`,
}
No system prompt on the Step object. No configuration outside the schema. The Step is exactly two fields: name and schema. The schema is the full control plane.
Where This Is Honest
We haven’t replaced kingmaker with zlow. We’ve proven the architecture handles kingmaker’s core workflow in a test environment. The real extension handlers — the ones that actually hit Convex queries, actually fetch Steam data, actually create video jobs — aren’t wired up.
The capability ladder (routing to more expensive models when the schema demands it, skipping the model call entirely for algorithmic-only steps) is partially done. The model is schema-selectable, but the “skip model entirely” step type isn’t declared yet.
Real-world validation against a live kingmaker run — actual game data, actual deduplication logic, actual job creation — hasn’t happened.
Those aren’t excuses. They’re the next tests the architecture has to pass.
Why This Approach
The field mostly optimizes for how much the LLM can do. More tools. Better reasoning. Longer context windows. zlow optimizes for a different constraint: how much of the system can exist in algorithmic space, where it’s testable, deterministic, and cheap.
The answer is: most of it. Routing is algorithmic. Context assembly is algorithmic. Side effects are algorithmic. Circuit breakers are algorithmic. The only thing that isn’t algorithmic is the reasoning — the part where structured input becomes structured output through something we can’t fully predict or control.
zlow puts the seam at the schema. Everything on one side is deterministic. Everything on the other side visits latent space and comes back with an answer.
That’s a narrow seam. But it’s in exactly the right place.
zlow is an internal library in active development. The schema-first pipelines post covers the underlying pattern. Zontax is the schema language that makes the annotations possible.