Daytona runtime
delegate_task (in @arbe/core/dispatch/agent-tools) is arbe’s one coding tool. It
runs a coding agent inside ctx.sandbox. ctx.sandbox is one interface with two
implementations, sprite and daytona; environment.runtime picks one per environment,
default daytona.
This doc is the daytona implementation. It runs pi in a Daytona sandbox, mirrors pi’s
session into one durable arbe thread, and resumes a run from that thread. The code is in
packages/sandbox/src/daytona/. Each layer has a runnable proof at proofs/prove-*.ts
(bun --env-file=.env proofs/<file> from that dir).
The shape
The thread is the only durable handle, and the source of truth. A thread is a durable stream of entries. Everything else is a cache you can lose — pi’s session files on disk, the in-process sandbox handle, any real-time watcher — and every cache writes to the thread or reads from it.
The parts:
thread— the durable stream and source of truth. Entry types:chat,pi.*,signal.*.- finished-signal — the
signal.thread.*entry that ends a run. Status is one ofstatus_changed:completed,pi_failed, orpi_session_orphaned. - pi — the coding-agent process.
CodingAgent— the harness descriptor (binary, file extension, how a run ends) that selects pi, codex, or claude-code. Not arbe’sAgent, which is the persona (model plus instructions) composed onto aCodingAgent.arbe-pi-runner— starts pi inside the sandbox, owns pi’s exit code, and posts the finished-signal when pi exits. One per harness; lives insrc/runner.ts.- the mirror — a pi extension that posts
pi.*entries plus a heartbeat every ~5s. One extension, posting directly; no relay process. stream-proxy— a Supabase Edge Function the sandbox posts to because it can’t reach the CF worker. Covered under egress below.
A run, end to end:
launchCodingAgent(pi, sandbox, task, { thread? }) spawns a session (or uses thread), │ provision + fire DETACHED returns the thread id at once ▼ ╭──────────────────── Daytona sandbox ────────────────────╮ │ arbe-pi-runner ─▶ pi ─▶ tool calls │ │ owns exit code ╰─ pi extension (the mirror) │ │ posts terminal on exit │ ╰─────────────────────────────┬───────────────────────────╯ │ pi.* + heartbeat, via scoped JWT ▼ edge fn (stream-proxy) │ ▼ ╭───────────────────── thread ──────────────────────╮ │ source of truth · entries: chat | pi.* | signal.* │ │ a finished-signal ends a run │ ╰────────────────────────────────────────────────────╯A run is never silently terminal. Whoever is still alive when the run ends posts the finished-signal:
the mirror soft error (pi emits stopReason:"error") inside pi arbe-pi-runner pi crashed, sandbox alive (has exit code) → pi_failed pull-confirm the sandbox itself died (no exit code) → pi_session_orphanedThe heartbeat makes “dead” decidable from “still thinking” using the thread alone, so no watcher is needed.
The thread’s Postgres row is a lazy cache of the stream. Dispatch claims the child
idle → running when it fires the runner (launchOnDaytona posts the status_changed
readiness milestone), and that claim arms the read-path reconcile (reconcileStuckThread,
run on every thread GET): it adopts the stream’s finished-signal onto the row, or — when the
stream has gone silent past the orphan threshold — flips the row to failed with
pi_session_orphaned. Without the claim the row stays idle and reconcile does nothing, so
a run that dies in bootstrap, before pi ever mirrors, would hang dark forever. For a stuck
running thread, reconcile also asks Daytona whether the box still exists, so a dead box is
settled at once instead of waiting out the threshold — see box lifecycle.
When the reconciled thread is a delegated child (a delegate_task run,
parent.kind:'thread'), that same reconcile notifies the parent: it posts a
signal.thread.child_finished plus a brain-authored chat carrying the child’s result onto
the parent thread, with no dispatch re-fire. See dispatch.
Resume rebuilds the same handles from the thread, not from disk. pi’s disk --continue is
a cache that goes stale once the thread is continued on another device. The pi session
format and what reconstruction can and can’t recover are worked out in
resume notes.
same sandbox { resume } pi --continue <disk cache> L5 fresh sandbox { resume, hydrate } rebuild session from thread, L7 upload, pi --session <file> (no disk --continue)pi’s session is one portable JSONL file, but the mirror posts a lossy projection of it:
user turns become chat, assistant turns become pi.assistant, and tree ids, labels, and
model entries are dropped. So a fresh-sandbox resume can’t replay the thread directly. It
rebuilds a loadable linear session with pi.reconstructSession and loads it with
--session (not --continue, which scans a directory filtered by cwd and raced by mtime).
The prompt is canonical on the thread: run() posts it as a chat entry before driving
pi, so a rebuild reads it from the thread.
Library (src/)
thread.ts— the write side.openThread()mints a scoped grant and returns the stream endpoint; create, post, and read delegate to@arbe/core/entries.sandbox.ts— the compute side.createSandbox()returns a fresh Daytona sandbox with.execand.provision(agent).coding-agent.ts— the piCodingAgentdescriptor.decide-pi-outcome.ts—decidePiOutcomemaps pi’s stopReason and exit code to an outcome plus thesignal.thread.*entries to post.run.ts—run(), the synchronous host driver: provision, drive pi, read the thread back; the host posts the terminal entry. Used by the proofs andcli.ts.launch-coding-agent.ts—launchCodingAgent(), the daytona body ofdelegate_task: spawn a session (or use a given thread), firearbe-pi-runnerdetached, return the thread id at once. The CF worker can’t block for minutes, so dispatch needs the detached shape. Its default thread is anopenThread()grant and stream with no DB row; the parented child row withenvironmentIdis corecreateThread’s job.runner.ts—arbe-pi-runner, running in the sandbox: owns pi’s exit code, reads the thread back, runsdecidePiOutcome, and posts the terminal entry on exit.
run and launchCodingAgent are two callers of the same handles. run blocks on exec
and the host posts the terminal; launchCodingAgent returns at once and the in-sandbox
runner posts the terminal. Both feed decidePiOutcome the same two inputs, the thread and
the exit code. A second harness (codex, claude-code) is a second CodingAgent with its own
decideMessageType; the orchestrator does not change.
Egress and env
Daytona egress is whitelist-only and matched by domain. The sandbox can reach npm,
github*, cloudflare.com, and our Supabase host *.supabase.co (plus the other
default-allowlisted services: package
managers, git hosts, container registries, LLM APIs). It cannot reach the CF worker
arbe.0sk.ar, workers.dev, or any other host, and a tunnel doesn’t help because the
block is by domain. At our org tier the restriction cannot be overridden per sandbox;
lifting it means Daytona tier 3 (~400 EUR/mo), which we’ve decided against. So: the open
web is reachable only via a proxy on an allowlisted host, if a workload ever truly needs
it — none does today, since the harness itself only needs the allowlist (arbe-5783).
The signature of a blocked host: HTTPS gets a TLS reset (curl exit 35, 000 status),
plain HTTP gets a proxy 403.
That is why the stream-write path runs through a shim. *.supabase.co is reachable, so the
proxy is deployed as a Supabase Edge Function at supabase/functions/stream-proxy. It is a
dumb transport proxy: it forwards the request (method, path, query, body, and the original
Authorization: Bearer <scoped-jwt>) verbatim to the CF worker at arbe.0sk.ar/api/stream,
which owns all JWT verification, thread-scope enforcement, and secret-swap logic. The edge fn
requires one secret — ARBE_WORKER_URL=https://arbe.0sk.ar/api/stream — and no longer holds
DURABLE_STREAMS_SECRET. Deploy with bunx supabase functions deploy stream-proxy (config
pins verify_jwt=false).
Two more sandbox facts: pi installs in-sandbox with
npm i -g @earendil-works/pi-coding-agent@0.78.0 (node is present via
language: 'typescript'), and the sandbox runs as a non-root user, so upload artifacts to
a HOME-relative path, not /root.
The mirror’s write path:
mint: mintStreamWriteJwt(houseId, threadId, DURABLE_STREAMS_SECRET) -> { jwt, expiresAt } (2h TTL)write: POST <url>/api/stream/arbe-thread-<threadId> Authorization: Bearer <jwt> body = { id, ts, authorId?, payload } (payload: chat | pi.* | signal.*)read: GET <url>/api/stream/arbe-thread-<threadId>?offset=0 Authorization: Bearer <jwt> (NDJSON)A scoped token can only touch its own thread; a cross-thread write returns 403.
The pi extension reads its config from these environment variables (via
readPiThreadMirrorEnv in packages/sandbox/src/pi-extension/thread-mirror.ts). The
sandbox must export the same canonical names dispatch uses:
| Env var | Required | Meaning |
|---|---|---|
ARBE_THREAD_ID | yes | target thread |
ARBE_STREAM_URL | yes | stream-write base URL — the Supabase edge fn from the sandbox; the CF worker only off-sandbox |
ARBE_STREAM_TOKEN | yes | scoped stream:write JWT (the jwt from mintStreamWriteJwt) |
ARBE_AUTHOR_ID | no | author stamped on entries |
ARBE_PI_MIRROR_NEXT_INDEX | no | resume offset |
Box lifecycle
A sandbox is a machine, with a lifecycle independent of any run. pi is one process inside it; many threads, commands, and runs share a box, and the box outlives any single run (schema-v2 req 4 & 6). A pi terminal ends a process, never the machine — box teardown is never keyed on a run’s terminal, and never on a wall-clock timeout against the run.
There is no run timeout. A coding agent runs until it finishes; ARBE_PI_TIMEOUT is a
~3-day runaway guard for a hung process, not a run length.
Reaping is arbe-owned and idle-based. A box stays up while any thread on it
(threads.sandbox_id) is running; once none is, the reconcile/prune sweep
(reconcileStuckThread / pruneStuckThreads) deletes it — the same seam that clears stale
threads. When a run finishes there, reapSandboxIfIdle deletes the box (via an injected
reapBox, so @arbe/core stays provider-free) and tombstones its row. Resume re-resolves
to the environment’s live box, or makes a fresh one, so idle means delete; a stopped box
buys nothing. The discriminator is sandboxes.ephemeral: delegate_task boxes are
ephemeral=true and reapable, while an environment’s shared box (inline run_command) is
ephemeral=false and never touched.
Death is pull-confirmed, never webhook-driven — one org-wide webhook is a forged-event risk
against other houses’ boxes. confirmSandboxLiveness(sandboxId, probeBox) asks Daytona
about a house-scoped provider_ref through that house’s own runtime: a missing box (404) or
terminal-fault state (error/build_failed/removing) tombstones the sandboxes row to
dead and orphans the threads on it; a live/unknown verdict is a no-op, so a network
blip never tombstones a real box. It rides the reaper’s two seams: the read-path/prune
reconcile (the current thread’s box) and arbe sandbox list --reconcile (the cold-row sweep
— there is no cron). probeBox is injected by the worker, mirroring reapBox.
Daytona’s own auto-stop/delete is the dumb backstop. createSandbox sets autoStopInterval
from the runaway guard plus a buffer (Daytona counts only API calls as activity, and a
detached run makes none, so the clock runs from launch — the value must clear the guard or
it would stop a live run) and autoDeleteInterval: 0. This catches an abandoned box (worker
dead, runner crashed) after ~3 days; it is not the primary reaper.
Spawned boxes are labelled arbe.house / arbe.thread / arbe.environment on create, so
arbe sandbox list --runtime daytona shows which run owns a box. Driving the broader
lifecycle is the Daytona epic (arbe-7959).
Once open, now answered (2026-06): the egress whitelist cannot include arbe.0sk.ar at our
org tier (tier 1/2 restrictions are not overridable, and tier 3/4 — networkAllowList, max
10 IPv4 CIDRs — is out of budget). So all sandbox-to-thread traffic flows through the edge
function permanently and the two proxies must be kept from drifting.
See sprite runtime, dispatch, environments, secrets, runtime.