# Workflows

A workflow is a recipe; a run is a conversation. The recipe — an agent plus an ordered list of steps — is a row a house owns. Spawning it opens a fresh [thread](./system/threads.md) and the run plays out there: each step is a message posted into the thread, the agent's reply is the work, and when the reply lands the next step goes out. Between steps, nothing runs anywhere.

Four step kinds (`@arbe/core/schemas/workflow`):

- `run_command` — the bot runs a shell command in the thread's sandbox;
- `dispatch_task` — a natural-language prompt the bot acts on with its normal tools (e.g. dispatching a pi coding agent);
- `sleep` — a durable pause of `seconds`; nothing posts to the thread and no process is held;
- `human_gate` — posts a prompt and parks until a human replies in the thread.

A step's intent is natural language and the bot's tools do the work, so a workflow can do anything an agent can — and a new workflow is a row, not a deploy.

A step's `command`/`prompt` may carry `{{path}}` placeholders, rendered per run against the run's **payload** — a JSON object the trigger supplies at spawn (`arbe wf spawn <id> --payload '{"branch":"main"}'`, or `payload` on `POST /api/wf`). The payload is run-scoped, not part of the recipe, so editing it never touches the row; cron and bare spawns pass `{}`. An unresolved `{{path}}` fails the step early with that path named, rather than posting a half-blank instruction.

What workflows add to threads is time and durability:

- a run can fire on its own (schedule or event), with no one watching;
- it can sleep for hours or days, or gate on a human reply, at zero compute and holding no process open;
- every finished step is checkpointed, so a crashed run resumes where it left off and never repeats a step;
- a retried run keeps its identity — same run id, the `attempts` count ticks up.

Author and watch them in the console at `/workflows`: create at `/workflows/new`, edit at `/workflows/<id>/edit`, delete from the workflows page. Spawn a workflow, follow the live stepper, and reply to a gate in the embedded run thread. The same surface is `GET/POST /api/workflows` and `GET/PATCH/DELETE /api/workflows/<id>`. From the CLI:

```bash
printf '%s' '{"name":"nightly","steps":[]}' | arbe wf create --stdin  # create a recipe
arbe wf spawn <workflow_id>   # fire a run
arbe wf runs                  # recent runs, newest first (+ conductor heartbeat)
arbe wf show <run_id>         # one run: steps + events (short-id prefix ok)
arbe wf proof <workflow_id>   # spawn + verify it completes — e2e health check, ~30s–3m
```

Each run gets its own thread, auto-named `<workflow> <YYYY-MM-DD>` — a nightly workflow opens a new one every night. The thread is the run's log and its control surface: open it, read it, interrupt it, reply to unblock a gate.

A real recipe, end to end — sweep the website's public routes every night and report (`POST /api/workflows`):

```json
{
	"houseId": "<house>",
	"agentId": "<bot>",
	"name": "nightly-route-health",
	"schedule": "0 5 * * *",
	"steps": [
		{
			"kind": "run_command",
			"name": "sweep",
			"command": "for p in / /about /login /guide; do curl -s -o /dev/null -w \"%{http_code} %{time_total}s $p\\n\" \"https://arbe.0sk.ar$p\"; done"
		},
		{
			"kind": "dispatch_task",
			"name": "report",
			"prompt": "The previous step swept the public routes (status, total time, path per line). Write a one-paragraph health report: flag any non-200 or any route slower than 1.5s as a regression; otherwise say all is well and name the slowest route."
		}
	]
}
```

Every night a fresh thread opens, the sandbox runs the sweep, the bot reads the numbers and writes the verdict — the thread is the report. Add a `human_gate` step after `report` and the run parks until someone replies; that's an approval flow, same recipe shape.

Caveat (2026-06): sandbox egress is allowlisted at our Daytona tier — github, npm, and LLM APIs work, but arbitrary hosts (including arbe.0sk.ar) get a TLS reset, so this exact sweep returns `000`s today (arbe-5783). The recipe exists as `nightly-route-health`, unscheduled until egress is solved.

Four gotchas:

- a `run_command` step is one bot turn, so it must finish inside the pi turn cap (~2 min). A longer command loses the turn; the run then fails with a reason telling you to restructure it (arbe-5041), rather than parking silently. For slow work (installs, builds, suites), background it: one step starts it with `nohup … & echo started` writing to a log, a `sleep` step waits, a third step tails the log;
- a step's `name` is its checkpoint identity — renaming a step makes it run again, and names must be unique within a workflow;
- steps are snapshotted at spawn, so editing or deleting a recipe never touches a run already in flight (historical runs carry their own snapshot);
- runs sitting `pending` mean the conductor daemon is down; `arbe wf runs` and the console's health badge both say so.

## Triggers

Every trigger does one thing: call `wf_spawn(id, payload)`. So there is one door, not a growing list of trigger types — cron is the door with no caller, a manual spawn is a human calling it, and an inbound webhook (future) is an external system calling it with its body as payload. A new integration is "point a caller at the door and map its body into the recipe's `{{...}}`", not new engine code.

A **schedule** is part of the recipe: set `schedule` on the workflow row to a cron expression (UTC) and each slot spawns a run with an empty payload, exactly as `arbe wf spawn`. Clear it to stop; an invalid expression is rejected at write time. A trigger mirrors the column into one pg_cron job per workflow (`wf:<id>`), so there is no scheduler daemon — Postgres fires, the conductor executes.

## Under the hood

[Absurd](https://github.com/earendil-works/absurd) — Postgres-native durable execution — is the engine, driven by the conductor (`apps/workflow-conductor`), a daemon that polls the queue. The wrapper is one word each way, and engine vocabulary stays below this line:

| arbe says | Absurd says |
|---|---|
| run | task (the stable identity) |
| attempt | run (one try at it) |
| step | checkpoint |
| sleep / gate | `sleepFor` / `awaitEvent` |

Design and decisions: [thinking/durable-workflows](./thinking/durable-workflows.md).
