View as .md

Durable workflows

Status: engine complete, schedule live. Proven live (2026-06-09): the engine, the conductor daemon, the least-privilege role, read + spawn, the send seam, the one-bot invariant (createThread seeds the driving bot triggerMode: 'always'), the finish seam both ways — inline in /api/wf/step and the reconcile backstop (POST /api/wf/reconcile, called by the conductor on a 30s timer) — plus durable sleep and the human-gate step. Proof 1: conductor killed mid-run; the step event emitted while it was dead; restart resumed exactly-once; thread reads clean. Proof 2: a run slept 20s at zero compute, parked on a human gate, and continued seconds after a human replied — released by the sweep, which is the gate’s only path (a human reply fires no workflow hook). Proof 3 (schedule): a workflow with schedule = '* * * * *' fired at five consecutive minute slots, each spawning a run the conductor completed in one attempt; clearing the column removed the cron job. Operational (2026-06-09): the conductor runs as a supervised sprite service on arbe1 (auto-restart on crash + boot, kill-proven), heartbeats via the reconcile sweep, and arbe wf runs reports its liveness — closing arbe-5793. Console (arbe-ea32, 2026-06-12): the /workflows playground — create a recipe, spawn, live stepper (active / sleep / gate / failed per step), conductor health badge, Absurd-internals debug pane (checkpoints, events, await state), embedded run thread. Thread-driven: the conductor never runs tools; it sends an instruction into the run thread and the bound bot does the work. This doc is the design and the decisions as built.

Why

Now that we have agents, threads, tools, and sandboxes (dispatch_task, pi inside a coding agent), the missing piece is time and durability: a thing that wakes on a schedule or an event, runs a sequence of steps that may span minutes or days, survives restarts, and can pause to wait for a human. Cron jobs but composed of agent work, and observable as conversation.

arbe has an unfair advantage over Temporal / Inngest / pg_durable: every run is already a thread you can watch, interrupt, and talk to. We don’t get opaque background jobs — we get durable workflows that are also threads. Human-in-the-loop is the part generic engines are worst at, and for us it’s free: “wait for a human reply” is just waiting on a stream entry.

Goal — done when

A user can describe a recurring job once — e.g. “every morning run this command suite in a sandbox, and if a check fails have pi investigate and post findings to a thread; wait for my 👍 before taking the next action” — close the laptop, and trust that:

it fires on its own (schedule or event), with no one watching;
each run is a thread they can open, read, and interrupt;
it survives restarts — a crash mid-run resumes from the last finished step, never re-runs a completed one, never silently drops;
it can sleep for hours or days and block on a human without holding any process open;
every step is just run_command or dispatch_task — no new language to learn.

One line: a scheduled-or-triggered chain of run_command / dispatch_task steps that runs unattended, resumes after crashes, can wait on time or a human, and is watchable as a thread the whole way through.

The sharp test that proves it: kill the conductor halfway through a run, bring it back, and the workflow finishes correctly — exactly once per step — and the thread reads like nothing went wrong.

Note: examples here avoid repository / GitHub actions on purpose — we have not wired GitHub tokens into arbe yet. The model applies to repos too once that exists.

Decision — adopt Absurd

Use Absurd as the durable-execution engine via its TypeScript SDK (absurd-sdk). Absurd is Postgres-native durable execution: a SQL schema holds the engine (queue, checkpoints, retries, sleep, events) and a thin SDK (~2K LOC) is a client over it. We write a small daemon (the conductor) that maps Absurd’s primitives onto arbe’s two tool calls.

Key clarification that settled it: Absurd is not a layer on pgmq — it ships its own queue. So pgmq / Supabase Queues and Absurd are alternatives at the same layer, not a stack. We pick Absurd because it gives us the step journal, sleep, and events for free; pgmq would give us only the mailbox and leave the durable layer to hand-roll.

It also stays inside arbe’s no-DSL line: Absurd steps are ordinary code around checkpoints (ctx.step(...)), not a SQL DSL like pg_durable. It re-runs around cached checkpoints rather than doing deterministic replay.

Naming: arbe surfaces speak workflow / run / step / attempt and never leak engine words (task, queue, checkpoint, lease). Our run is Absurd’s task — the stable identity that survives retries; Absurd’s own “run” (one try) surfaces as attempt. The wf_* RPCs key on it (migration 20260609080000). User-facing articulation: docs/workflows.md.

Mapping Absurd → arbe:

Need	Absurd	arbe wrapping
Workflow definition	`app.registerTask({name}, handler)`	a `workflow` record
A run	`app.spawn(name, params, {idempotencyKey})`	a run thread (child room)
A step	`ctx.step('name', () => …)`	send an instruction into the run thread; the bound bot runs `run_command` / `dispatch_task`
Durable sleep	`ctx.sleepFor` / `ctx.sleepUntil`	—
Wait for human	`ctx.awaitEvent(...)`	a step whose done-check is “a human replied”
Step done	`ctx.awaitEvent('wf.step.done:<runId>:<step>')`	finish: the step’s done-check passes → www emits (inline + reconcile)
Schedule	pg_cron → `absurd.spawn_task(...)`	cron entry per workflow
Exactly-once-ish	`startWorker()` daemon + step caching	`apps/workflow-conductor` on `arbe1`
Agent loop	`beginStep`/`completeStep` over a message log	one `dispatch_task` / pi turn-loop

The agent-loop fit is exact: Absurd’s AI-agent pattern appends each message_end to a durable step log and resumes the loop when the last message isn’t the assistant’s — i.e. when it’s waiting on a human. That is dispatch_task / pi, durable.

License is permissive (confirmed); we pin absurd-sdk@0.4.0 and the matching schema (sql/absurd.sql @ tag 0.4.0) checked into our migrations.

The hard constraint: no new primitive, no DSL

This is the design line, straight from what-not-to-build and primitives:

No workflow DSL. pg_durable is the cautionary example — a whole SQL DSL (~>, &, ?>, df.loop, df.race). That is exactly what arbe says not to build. Activation policies + permissions + tools already compose into workflows.
A workflow is a record (type: 'workflow'), not a seventh primitive. A run is a thread (a child room). Steps are dispatches and stream entries. We already have these.
Durability is operational, like signals. The step queue / timers / resume tokens are the system witnessing its own execution — closer to the signals plane than to content. They are not a new content type and not stored as stream messages.

The model to copy is Armin Ronacher’s Absurd Workflows: minimal, Postgres-only, SELECT ... FOR UPDATE SKIP LOCKED, steps journaled and replayed, agent loops as a single self-iterating step rather than a static DAG. That last point matters for us: an agent doing N turns is one durable step that checkpoints between turns, not a graph we have to author.

Mechanism: Absurd’s schema + a daemon we own

Absurd’s SQL schema is the durable substrate — same DB local and prod, no extra infra; the hand-rolled durable table done well, with a TS client. Steps cache their result (exactly-once-per-step), sleepFor/sleepUntil give durable sleep without holding a process, awaitEvent/emitEvent are the wait-for-human/-signal mechanism, and a crashed worker’s lease expires so the task is reclaimed on restart (steps stay idempotent via caching + idempotencyKey).

Schedule ≠ run — two jobs people conflate. The schedule is pure SQL (cron.schedule(...) → select absurd.spawn_task(...), enqueues a row, no process). The run needs a JS runtime to execute the TS handler, so we run a long-poll daemon: apps/workflow-conductor runs app.startWorker() on the arbe1 sprite over its absurd_worker connection. Nothing calls in; it only calls out to Postgres (claim work) and www (send steps). We rejected a pg_net → /tick model (an endpoint, its auth, the extension, cold-start hops for a few cents of always-on); the daemon’s poll loop keeps the sprite warm, and the lease reclaims if it ever pauses.

The engine and the run thread

A workflow run lives in two durable substrates at once, and the design is the bridge between them:

The engine — Absurd, in our Postgres. The durable program: queue, step journal, sleep, events, leases. The generic arbe-workflow handler runs here, executed by the conductor (apps/workflow-conductor) over its absurd_worker connection. The engine decides what step is next and survives crashes.
The run thread — entries in durable streams. Where the work happens and is watchable: the System agent’s instruction, the bound bot’s replies, the dispatch signals. It does the work and narrates it.

One run, two faces — an Absurd task and a run thread sharing one identity: runId is the Absurd task id, and wf_run_threads maps it to the thread. Each run gets its own fresh run thread — a nightly workflow opens a new thread every night; the thread is per run, not per workflow.

The engine and the run thread meet at exactly two seams:

Send (engine → run thread). Per step, the conductor’s ctx.step(…) calls www POST /api/wf/step; www posts the instruction into the run thread authored by the house System agent, and the bound bot takes a turn with its own run_command / dispatch_task tools. Built and proven.
Finish (run thread → engine). When the turn ends, www emits wf.step.done:<runId>:<step>; the conductor — parked on ctx.awaitEvent for exactly that key — wakes and resumes to the next step. Built: inline (finishStep in /api/wf/step) over the pure done-check (@arbe/core/workflow-steps). The reconcile backstop remains.

Between the seams the conductor is suspended: awaitEvent frees the worker slot, so a run can sleep for days or block on a human while holding no process open. Crash-resume is free — every finished step is a cached checkpoint, so a restarted run never repeats work.

The done-check: a step finishes by reading the run thread

The fact that shapes everything: turn-end lives only on the external stream, never in Postgres. A bound bot’s turn ends with signal.dispatch.completed/failed appended to the run thread’s stream; it flips no Postgres row (only the detached dispatch_task/sandbox path emits signal.thread.status_changed, so the existing findLatestTerminal reconciler never fires here and threads.status stays idle). There is no row-change for a Database Webhook to ride. The only way to know a turn ended is the done-check: read the run thread’s tail and run stepOutcome(tail, step) → completed | failed | null. What it looks for varies by step kind — the findLatest* helpers are its internals:

a command / dispatch step — the bound bot’s signal.dispatch.{completed,failed} is the latest terminal: findLatestDispatchTerminal(tail, botId);
a human-gate step — a human replied after the instruction: findLatestHumanReply(tail, afterTs);
(future) silence — no terminal and no activity past a threshold → failed.

Same shape every time: the step parks on awaitEvent, and the bridge emits wf.step.done:<awaitKey> once that step’s done-check passes. This is Absurd’s own external-completion pattern — a step suspends on an event named for the thing it waits on, and something outside emits when that thing is done.

Two ways to finish: inline (primary) + reconcile (backstop)

Reading the stream needs the stream token and service role — which www holds and the conductor deliberately does not. So the finish runs in www, two ways:

Inline (primary). The send request already triggered the turn: submitThreadEntries hands back a dispatch promise that resolves when the in-process turn finishes. /api/wf/step awaits it, reads the tail, runs the done-check, and emits — all in the request it already owns. Lowest latency, no extra process.
Reconcile (backstop). A www instance can be evicted mid-turn (waitUntil is not durable), losing the inline emit. So a sweep — POST /api/wf/reconcile, called by the conductor on a 30s timer — lists runs still awaiting, reads each tail, and emits any missed completion. Load-bearing, not optional: it is also the human-gate’s only completion path (a human reply fires no workflow hook). We chose the conductor over pg_cron → pg_net: the daemon already holds the secret, and a dead conductor means no progress regardless, so the coupling costs nothing — and no HTTP plumbing or secrets land in Postgres.

They can’t conflict: Absurd events are first-write-wins, so whichever emits wf.step.done:<awaitKey> first resolves the step and the other is a no-op. It’s the same “stream is truth, materialize a missed terminal” shape as pruneStuckThreads — a sibling, not a reuse (different signal, different action).

Failed step ≠ crash: complete-with-payload, don’t throw

First-write-wins has a sharp edge on failure. When a step’s done-check returns { outcome: 'failed' }, that outcome is cached in two places keyed by the run’s task_id, both surviving every retry attempt: the first-write-wins wf.step.done event, and the checkpoint await_event commits from the resolved payload (absurd_init await_event, returns the committed checkpoint before it even reads the event). So an in-place retry (retry_task spawn_new:false, which is what a bounded max_attempts auto-retry does) replays the identical failure and re-runs nothing — the misleading “retrying” loop (the 2026-07-06 incident capped it at 5 attempts; the cap bounds the waste but can’t recover).

So the conductor does not throw on a failed step — a throw hits that retry. It completes the run task with a failure payload (completed_payload = { …, failed: { step, reason } }): no retry fires, and wf_list_runs / wf_show_run read that marker back as state = failed with a synthesized failure_reason. Recovery is a fresh spawn (new task_id → clean keys → whole workflow re-runs). Only a genuine crash throws, and that legitimately resumes from the last checkpoint. The max_attempts policy is therefore correct as written: it now retries only crashes. (In-place per-step retry — invalidate just the failed step’s checkpoint + event so a bounded retry re-runs it — is a deliberate follow-up, gated on the turn-end signal becoming reliable: task arbe-eadf.)

The foundational invariant: one run thread, one bot, always-reply

“The turn ended” is only well-defined because a run thread is bound to exactly one worker bot that always replies. Two reacting bots make turn-end ambiguous; a relevance/mention gate makes it never fire — the one live failure we hit (signal.dispatch.skipped: filtered, until the bot was @-mentioned). So the run thread is created single-bot with the worker bot forced to triggerMode: 'always', enforced structurally at thread creation — not by convention. Without this the bridge is built on sand.

State of record: `wf_run_threads`

The mapping table is the bridge’s source of truth — what the backstop reads, since the conductor’s state is only in-process and the sweep has no memory:

column	role
`run_id` (pk)	the Absurd task id; one row per run
`thread_id`	the run thread
`worker_bot_id`	the one bot whose terminal = turn-end (so the sweep needn’t read `workflows`)
`await_key`	the current step’s key (`<runId>:<step>`), upserted on every send
`awaiting`	true while parked on this step’s turn
`last_emitted_key`	idempotency / drift guard for the sweep
`updated_at`	sweep eligibility; leaves room for a silence-timeout

Steps are strictly sequential per run (the conductor blocks on each awaitEvent), so there’s never more than one live await_key per run — the sequential invariant is the concurrency control, no locking. await_key is authoritative: every send upserts the current key (plus worker_bot_id, awaiting: true), and the inline finish clears awaiting after it emits — migration 20260609090000. The backstop only has to read this table and run the same done-check.

Runtime flow

  pg_cron ──▶ spawn_task('arbe-workflow', { workflowId, steps, payload })
                  │
  ── ENGINE · Absurd / Postgres ──────────────────────────────────
                  ▼
       conductor (startWorker daemon, pg-only) — for each step:
         1. ctx.step ───────────────── SEND ──────────▶ POST /api/wf/step
         2. ctx.awaitEvent('wf.step.done:<runId>:<step>')   ◀── parks here
                                                              ▲
  ── RUN THREAD · threads / durable streams ──────────────────┼───
                                                              │
       run thread (one bound bot, always-reply):              │
         System posts the instruction → bot takes a turn →    │
         signal.dispatch.{completed,failed} on the stream tail │
                  │                                            │
       FINISH = done-check over the tail ─────────────────────┘
         • inline    (primary)   /api/wf/step awaits its dispatch promise
         • reconcile (backstop)  pg_cron → /api/wf/reconcile sweeps awaiting runs
         → wf_emit_event('wf.step.done:<runId>:<step>', { outcome })  [first-write-wins]

The conductor stays thin: only its absurd_worker pg connection (zero public access), APP_URL, and a shared CONDUCTOR_SECRET. All privileged arbe work — open the run thread, post as the house’s kind:'system' “System” agent (not the worker bot, which the dispatcher excludes as the trigger author; not a human, to keep attribution honest), trigger dispatch, emit the terminal event — lives in www. A new workflow is a row, not a deploy.

Status & the two gaps (closed)

The two things that blocked the SDK are solved. Raw Postgres connection: a least-privilege absurd_worker role (migration 20260609020000) with DML on the absurd tables, execute on its functions, and zero public privileges, reached over the Supabase session pooler (port 5432, custom role as absurd_worker.<project-ref>); the secret lives only in the sprite’s .env. No long-lived worker: app.startWorker() on arbe1 (poll + lease, no endpoint, no pg_net). Schema rides our migrations (absurd.sql @ 0.4.0); upgrades use absurdctl migrate --dump-sql → commit → push. pg_cron is enabled; absurd.cleanup_all_queues() is a TODO.

Proven end to end: spawn_task('ping') → the daemon auto-runs it → completed, zero poke; wf_spawn → run enqueued as arbe-workflow, inspectable via arbe wf runs / wf show; and POST /api/wf/step → a run thread opens with the instruction authored by System, the bound bot runs it, signal.dispatch.completed lands — the full thread-driven step.

Operate it

The daemon auto-runs anything spawned, so the loop is spawn + watch:

arbe --local wf spawn <workflowId>  # wf_spawn enqueues an arbe-workflow run
arbe --local wf runs                # daemon picks it up; watch state advance
arbe --local wf show <run_id>       # steps (checkpoints) + events

Two gotchas: the absurd schema isn’t on PostgREST, so inspect through the wf_* RPCs / arbe wf, never the management API; and runs sitting pending mean the conductor is down — arbe wf runs says so itself (it leads with the conductor heartbeat and warns when the beat is >90s old or runnable runs sit unclaimed; wf_health RPC, fed by the reconcile sweep upserting wf_conductors). The conductor is a supervised sprite service on arbe1 (sprite-env services get workflow-conductor) that auto-restarts on crash and on boot — kill--9 proven 2026-06-09. Ops one-liners: debugging.md.

Triggers — one door, not a plane

The sharpening (2026-06-13): triggers don’t grow like steps do. A step is data — a new workflow is a row. A trigger kind is an integration — auth, ingest, a payload schema — so each one is real work and there will be few. Don’t model triggers[] as an open plugin plane; model the one thing they all bottom out in: a call to wf_spawn(id, payload). Cron is that door with no caller; a manual spawn is a human calling it; an inbound webhook is an external system calling it with its body as payload. “More triggers” then costs nothing — there aren’t more triggers, just more callers of the same door, and the GitHub/Linear-specific part (map their body into these steps) is per-workflow data, not engine code.

What the door carries is the only genuinely new mechanism: a payload. A cron run starts empty; an event run starts with context (the PR, the issue). Steps reach it via {{path}} placeholders the conductor renders before posting each instruction (render() in the conductor; migration 20260613150000 snapshots the payload into spawn params beside steps). An unresolved path fails the step early with the path named — silent-"" drift is a debugging tax we refuse. Payload is run-scoped (off the recipe row), so editing a recipe never disturbs a run in flight, same as steps.

run_command templates like everything else — no escaping. Within the recipe-author trust boundary that’s no new surface: a house member can already run anything in a sandbox. The provenance that would matter is the external caller, so when the inbound webhook lands the control is webhook-caller auth (signed/scoped URL), not payload sanitization — that keeps escaping out of the hot path.

Sources, in build order — schedule (built, migration 20260609120000: a schedule column mirrored to one pg_cron job wf:<id> → select public.wf_spawn('<id>'), validated UTC at write time, one firing per slot so no idempotency key); events / signals (a thread entry, tool result, or activation relays to emitEvent; foreshadowed by activation policies); inbound webhook (a stable URL → wf_spawn with the body as payload — the one seam left to build, and once built it is the events router too; unlike schedule it does need an idempotency key, derived from the caller’s delivery id, since webhooks retry — see thinking/workflow-triggers).

The reconcile sweep rides the conductor’s own 30s timer (not pg_cron/pg_net — see the finish seam above) and emits any completion the inline path missed. Database Webhooks are not used for turn-end: the terminal lives on the external stream, not a Postgres row, so there’s nothing for a webhook to ride — why the earlier “webhook relay” plan was dropped. Where execution lives: the conductor sends an instruction; the bound bot runs the tools. Cron only feeds and sweeps; don’t scatter step logic into Edge Functions.

Authoring: data vs. agent-driven

Key constraint from deployments.md: Absurd tasks are registered handlers keyed by name, and a worker that meets an unregistered name just defers it (two-phase rollout: workers first, producers second). So we do not register a task per user-workflow — that would make every new workflow a deploy.

Instead: one generic arbe-workflow handler. It reads the workflow record (steps as data) and, per step, posts the instruction into the run thread via www and ctx.awaitEvents the turn’s terminal event. The workflow id is the task params. wf_spawn snapshots the steps into the spawn params, so the conductor never reads public.workflows mid-run — a run executes the plan as it was at spawn time, and editing a recipe never changes a run in flight. The step vocabulary (run_command / dispatch_task / sleep / human_gate) is the discriminated union in @arbe/core/schemas/workflow — an app-level invariant, not a DB constraint. This collapses both surfaces:

Declarative recipe — the record’s steps are data the generic handler interprets. No deploy per workflow.
Agent-driven — the agent loop is itself a step inside the same handler (the pi-ai-agent pattern). The thread is still the program; the handler is fixed.

Keep the engine dumb and the authoring layer swappable (cognitive layer, per thesis). New handler shapes follow the two-phase rollout; new workflows never do.

Use cases by user type

Solo dev / indie — nightly check: wake → sandbox → run a command suite → if something fails, dispatch pi to investigate → post findings to a thread for morning review.
Eng team / lead — gated job: run a validation step in a sandbox, agent drafts a summary, pause for a human 👍 in the thread, then proceed to the next action. Flakiness hunter: run a check N times in a sandbox, file a task with the offenders.
Researcher / analyst — recurring data pull → analysis thread → cited weekly summary. The thread is the deliverable.
Ops / SRE — scheduled health probes; on anomaly, spin a sandbox, run diagnostics, escalate by @-mentioning a human and waiting.
Non-technical / PM — Monday competitive scan or inbox digest; a workflow that watches a condition and, when it trips, opens a setup thread humans join.

Open questions

A dropped terminal hangs a step (mostly closed). signal.dispatch.failed is best-effort — it can be swallowed after retries (dispatch/signals.ts). The done-check now falls back to the durable fact that survives a lost signal: the bot’s final pi.assistant entry and its stopReason (lostTerminalOutcome in @arbe/core/workflow-steps). A turn that ended stop/length past a grace window reads as completed; error/aborted as failed; an over-cap toolUse turn stalled as the thread tail past TOOL_USE_STALL_MS fails the run with a reason pointing at backgrounding (arbe-5041). Still open: a step that produces no terminal and no final assistant entry at all — a general silence timeout keyed on wf_run_threads.updated_at would cover that, not yet built.
Idempotency for side-effecting steps. The lease auto-extends on each checkpoint write, but a stalled worker still allows brief overlapping execution — a dispatch_task could fire twice. ctx.step caching covers in-task retries; genuinely external side effects need a key derived from ctx.taskID (Absurd’s guidance in concepts.md).
Step log vs. stream. Absurd’s step/checkpoint log is operational (its own tables); the run’s narration — “ran checks”, “waiting for your 👍” — belongs as entries on the run thread. Keep the two cleanly separated.

References

Absurd — the engine we’re adopting. concepts, database/migrations, TS SDK, deployments/rollout, AI-agent pattern, cron pattern, comparison.
Absurd Workflows — Armin Ronacher’s post that frames the minimal-Postgres-durable-execution thesis.
Supabase Queues / pgmq quickstart — the queue substrate we considered and did not pick (Absurd ships its own queue).
microsoft/pg_durable USER_GUIDE — powerful, but a SQL DSL; the thing we are deliberately not building.

See: thinking/primitives, thinking/what-not-to-build, thinking/thesis.