Skip to content
View as .md

Workflows

A workflow is a recipe; a run is a conversation. The recipe — an agent plus an ordered list of steps — is a row a house owns. Spawning it opens a fresh thread and the run plays out there: each step is a message posted into the thread, the agent’s reply is the work, and when the reply lands the next step goes out. Between steps, nothing runs anywhere.

Four step kinds (@arbe/core/schemas/workflow):

  • run_command — the bot runs a shell command in the thread’s sandbox;
  • dispatch_task — a natural-language prompt the bot acts on with its normal tools (e.g. dispatching a pi coding agent);
  • sleep — a durable pause of seconds; nothing posts to the thread and no process is held;
  • human_gate — posts a prompt and parks until a human replies in the thread.

A step’s intent is natural language and the bot’s tools do the work, so a workflow can do anything an agent can — and a new workflow is a row, not a deploy.

A step’s command/prompt may carry {{path}} placeholders, rendered per run against the run’s payload — a JSON object the trigger supplies at spawn (arbe wf spawn <id> --payload '{"branch":"main"}', or payload on POST /api/wf). The payload is run-scoped, not part of the recipe, so editing it never touches the row; cron and bare spawns pass {}. An unresolved {{path}} fails the step early with that path named, rather than posting a half-blank instruction.

What workflows add to threads is time and durability:

  • a run can fire on its own (schedule or event), with no one watching;
  • it can sleep for hours or days, or gate on a human reply, at zero compute and holding no process open;
  • every finished step is checkpointed, so a crashed run resumes where it left off and never repeats a step;
  • a retried run keeps its identity — same run id, the attempts count ticks up.

Author and watch them in the console at /workflows: create at /workflows/new, edit at /workflows/<id>/edit, delete from the workflows page. Spawn a workflow, follow the live stepper, and reply to a gate in the embedded run thread. The same surface is GET/POST /api/workflows and GET/PATCH/DELETE /api/workflows/<id>. From the CLI:

Terminal window
printf '%s' '{"name":"nightly","steps":[]}' | arbe wf create --stdin # create a recipe
arbe wf spawn <workflow_id> # fire a run
arbe wf runs # recent runs, newest first (+ conductor heartbeat)
arbe wf show <run_id> # one run: steps + events (short-id prefix ok)
arbe wf proof <workflow_id> # spawn + verify it completes — e2e health check, ~30s–3m

Each run gets its own thread, auto-named <workflow> <YYYY-MM-DD> — a nightly workflow opens a new one every night. The thread is the run’s log and its control surface: open it, read it, interrupt it, reply to unblock a gate.

A real recipe, end to end — sweep the website’s public routes every night and report (POST /api/workflows):

{
"houseId": "<house>",
"agentId": "<bot>",
"name": "nightly-route-health",
"schedule": "0 5 * * *",
"steps": [
{
"kind": "run_command",
"name": "sweep",
"command": "for p in / /about /login /guide; do curl -s -o /dev/null -w \"%{http_code} %{time_total}s $p\\n\" \"https://arbe.0sk.ar$p\"; done"
},
{
"kind": "dispatch_task",
"name": "report",
"prompt": "The previous step swept the public routes (status, total time, path per line). Write a one-paragraph health report: flag any non-200 or any route slower than 1.5s as a regression; otherwise say all is well and name the slowest route."
}
]
}

Every night a fresh thread opens, the sandbox runs the sweep, the bot reads the numbers and writes the verdict — the thread is the report. Add a human_gate step after report and the run parks until someone replies; that’s an approval flow, same recipe shape.

Caveat (2026-06): sandbox egress is allowlisted at our Daytona tier — github, npm, and LLM APIs work, but arbitrary hosts (including arbe.0sk.ar) get a TLS reset, so this exact sweep returns 000s today (arbe-5783). The recipe exists as nightly-route-health, unscheduled until egress is solved.

Four gotchas:

  • a run_command step is one bot turn, so it must finish inside the pi turn cap (~2 min). A longer command loses the turn; the run then fails with a reason telling you to restructure it (arbe-5041), rather than parking silently. For slow work (installs, builds, suites), background it: one step starts it with nohup … & echo started writing to a log, a sleep step waits, a third step tails the log;
  • a step’s name is its checkpoint identity — renaming a step makes it run again, and names must be unique within a workflow;
  • steps are snapshotted at spawn, so editing or deleting a recipe never touches a run already in flight (historical runs carry their own snapshot);
  • runs sitting pending mean the conductor daemon is down; arbe wf runs and the console’s health badge both say so.

Triggers

Every trigger does one thing: call wf_spawn(id, payload). So there is one door, not a growing list of trigger types — cron is the door with no caller, a manual spawn is a human calling it, and an inbound webhook (future) is an external system calling it with its body as payload. A new integration is “point a caller at the door and map its body into the recipe’s {{...}}”, not new engine code.

A schedule is part of the recipe: set schedule on the workflow row to a cron expression (UTC) and each slot spawns a run with an empty payload, exactly as arbe wf spawn. Clear it to stop; an invalid expression is rejected at write time. A trigger mirrors the column into one pg_cron job per workflow (wf:<id>), so there is no scheduler daemon — Postgres fires, the conductor executes.

Under the hood

Absurd — Postgres-native durable execution — is the engine, driven by the conductor (apps/workflow-conductor), a daemon that polls the queue. The wrapper is one word each way, and engine vocabulary stays below this line:

arbe saysAbsurd says
runtask (the stable identity)
attemptrun (one try at it)
stepcheckpoint
sleep / gatesleepFor / awaitEvent

Design and decisions: thinking/durable-workflows.