Workflows
A workflow is a recipe; a run is a conversation. The recipe — an agent plus an ordered list of steps — is a row a house owns. Spawning it opens a fresh thread and the run plays out there: each step is a message posted into the thread, the agent’s reply is the work, and when the reply lands the next step goes out. Between steps, nothing runs anywhere.
Four step kinds (@arbe/core/schemas/workflow):
run_command— the bot runs a shell command in the thread’s sandbox;dispatch_task— a natural-language prompt the bot acts on with its normal tools (e.g. dispatching a pi coding agent);sleep— a durable pause ofseconds; nothing posts to the thread and no process is held;human_gate— posts a prompt and parks until a human replies in the thread.
A step’s intent is natural language and the bot’s tools do the work, so a workflow can do anything an agent can — and a new workflow is a row, not a deploy.
A step’s command/prompt may carry {{path}} placeholders, rendered per run against the run’s payload — a JSON object the trigger supplies at spawn (arbe wf spawn <id> --payload '{"branch":"main"}', or payload on POST /api/wf). The payload is run-scoped, not part of the recipe, so editing it never touches the row; cron and bare spawns pass {}. An unresolved {{path}} fails the step early with that path named, rather than posting a half-blank instruction.
What workflows add to threads is time and durability:
- a run can fire on its own (schedule or event), with no one watching;
- it can sleep for hours or days, or gate on a human reply, at zero compute and holding no process open;
- every finished step is checkpointed, so a crashed run resumes where it left off and never repeats a step;
- a retried run keeps its identity — same run id, the
attemptscount ticks up.
Author and watch them in the console at /workflows: create at /workflows/new, edit at /workflows/<id>/edit, delete from the workflows page. Spawn a workflow, follow the live stepper, and reply to a gate in the embedded run thread. The same surface is GET/POST /api/workflows and GET/PATCH/DELETE /api/workflows/<id>. From the CLI:
printf '%s' '{"name":"nightly","steps":[]}' | arbe wf create --stdin # create a recipearbe wf spawn <workflow_id> # fire a runarbe wf runs # recent runs, newest first (+ conductor heartbeat)arbe wf show <run_id> # one run: steps + events (short-id prefix ok)arbe wf proof <workflow_id> # spawn + verify it completes — e2e health check, ~30s–3mEach run gets its own thread, auto-named <workflow> <YYYY-MM-DD> — a nightly workflow opens a new one every night. The thread is the run’s log and its control surface: open it, read it, interrupt it, reply to unblock a gate.
A real recipe, end to end — sweep the website’s public routes every night and report (POST /api/workflows):
{ "houseId": "<house>", "agentId": "<bot>", "name": "nightly-route-health", "schedule": "0 5 * * *", "steps": [ { "kind": "run_command", "name": "sweep", "command": "for p in / /about /login /guide; do curl -s -o /dev/null -w \"%{http_code} %{time_total}s $p\\n\" \"https://arbe.0sk.ar$p\"; done" }, { "kind": "dispatch_task", "name": "report", "prompt": "The previous step swept the public routes (status, total time, path per line). Write a one-paragraph health report: flag any non-200 or any route slower than 1.5s as a regression; otherwise say all is well and name the slowest route." } ]}Every night a fresh thread opens, the sandbox runs the sweep, the bot reads the numbers and writes the verdict — the thread is the report. Add a human_gate step after report and the run parks until someone replies; that’s an approval flow, same recipe shape.
Caveat (2026-06): sandbox egress is allowlisted at our Daytona tier — github, npm, and LLM APIs work, but arbitrary hosts (including arbe.0sk.ar) get a TLS reset, so this exact sweep returns 000s today (arbe-5783). The recipe exists as nightly-route-health, unscheduled until egress is solved.
Four gotchas:
- a
run_commandstep is one bot turn, so it must finish inside the pi turn cap (~2 min). A longer command loses the turn; the run then fails with a reason telling you to restructure it (arbe-5041), rather than parking silently. For slow work (installs, builds, suites), background it: one step starts it withnohup … & echo startedwriting to a log, asleepstep waits, a third step tails the log; - a step’s
nameis its checkpoint identity — renaming a step makes it run again, and names must be unique within a workflow; - steps are snapshotted at spawn, so editing or deleting a recipe never touches a run already in flight (historical runs carry their own snapshot);
- runs sitting
pendingmean the conductor daemon is down;arbe wf runsand the console’s health badge both say so.
Triggers
Every trigger does one thing: call wf_spawn(id, payload). So there is one door, not a growing list of trigger types — cron is the door with no caller, a manual spawn is a human calling it, and an inbound webhook (future) is an external system calling it with its body as payload. A new integration is “point a caller at the door and map its body into the recipe’s {{...}}”, not new engine code.
A schedule is part of the recipe: set schedule on the workflow row to a cron expression (UTC) and each slot spawns a run with an empty payload, exactly as arbe wf spawn. Clear it to stop; an invalid expression is rejected at write time. A trigger mirrors the column into one pg_cron job per workflow (wf:<id>), so there is no scheduler daemon — Postgres fires, the conductor executes.
Under the hood
Absurd — Postgres-native durable execution — is the engine, driven by the conductor (apps/workflow-conductor), a daemon that polls the queue. The wrapper is one word each way, and engine vocabulary stays below this line:
| arbe says | Absurd says |
|---|---|
| run | task (the stable identity) |
| attempt | run (one try at it) |
| step | checkpoint |
| sleep / gate | sleepFor / awaitEvent |
Design and decisions: thinking/durable-workflows.