# Daytona runtime

`delegate_task` (in `@arbe/core/dispatch/agent-tools`) is arbe's one coding tool. It
runs a coding agent inside `ctx.sandbox`. `ctx.sandbox` is one interface with two
implementations, `sprite` and `daytona`; `environment.runtime` picks one per environment,
default `daytona`.

This doc is the daytona implementation. It runs pi in a Daytona sandbox, mirrors pi's
session into one durable arbe thread, and resumes a run from that thread. The code is in
`packages/sandbox/src/daytona/`. Each layer has a runnable proof at `proofs/prove-*.ts`
(`bun --env-file=.env proofs/<file>` from that dir).

## The shape

The thread is the only durable handle, and the source of truth. A thread is a durable
stream of entries. Everything else is a cache you can lose — pi's session files on disk,
the in-process sandbox handle, any real-time watcher — and every cache writes to the
thread or reads from it.

The parts:

- `thread` — the durable stream and source of truth. Entry types: `chat`, `pi.*`,
  `signal.*`.
- finished-signal — the `signal.thread.*` entry that ends a run. Status is one of
  `status_changed:completed`, `pi_failed`, or `pi_session_orphaned`.
- pi — the coding-agent process.
- `CodingAgent` — the harness descriptor (binary, file extension, how a run ends) that
  selects pi, codex, or claude-code. Not arbe's `Agent`, which is the persona (model plus
  instructions) composed onto a `CodingAgent`.
- `arbe-pi-runner` — starts pi inside the sandbox, owns pi's exit code, and posts the
  finished-signal when pi exits. One per harness; lives in `src/runner.ts`.
- the mirror — a pi extension that posts `pi.*` entries plus a heartbeat every ~5s. One
  extension, posting directly; no relay process.
- `stream-proxy` — a Supabase Edge Function the sandbox posts to because it can't reach
  the CF worker. Covered under egress below.

A run, end to end:

```diagram
  launchCodingAgent(pi, sandbox, task, { thread? })   spawns a session (or uses thread),
        │  provision + fire DETACHED                       returns the thread id at once
        ▼
  ╭──────────────────── Daytona sandbox ────────────────────╮
  │   arbe-pi-runner ─▶ pi ─▶ tool calls                     │
  │     owns exit code  ╰─ pi extension (the mirror)         │
  │     posts terminal on exit                               │
  ╰─────────────────────────────┬───────────────────────────╯
                                │  pi.* + heartbeat, via scoped JWT
                                ▼
                     edge fn (stream-proxy)
                                │
                                ▼
        ╭───────────────────── thread ──────────────────────╮
        │  source of truth · entries: chat | pi.* | signal.* │
        │  a finished-signal ends a run                      │
        ╰────────────────────────────────────────────────────╯
```

A run is never silently terminal. Whoever is still alive when the run ends posts the
finished-signal:

```
  the mirror       soft error (pi emits stopReason:"error")   inside pi
  arbe-pi-runner   pi crashed, sandbox alive (has exit code)  → pi_failed
  pull-confirm     the sandbox itself died (no exit code)     → pi_session_orphaned
```

The heartbeat makes "dead" decidable from "still thinking" using the thread alone, so no
watcher is needed.

The thread's Postgres row is a lazy cache of the stream. Dispatch claims the child
`idle → running` when it fires the runner (`launchOnDaytona` posts the `status_changed`
readiness milestone), and that claim arms the read-path reconcile (`reconcileStuckThread`,
run on every thread GET): it adopts the stream's finished-signal onto the row, or — when the
stream has gone silent past the orphan threshold — flips the row to `failed` with
`pi_session_orphaned`. Without the claim the row stays `idle` and reconcile does nothing, so
a run that dies in bootstrap, before pi ever mirrors, would hang dark forever. For a stuck
`running` thread, reconcile also asks Daytona whether the box still exists, so a dead box is
settled at once instead of waiting out the threshold — see [box lifecycle](#box-lifecycle).

When the reconciled thread is a delegated child (a `delegate_task` run,
`parent.kind:'thread'`), that same reconcile notifies the parent: it posts a
`signal.thread.child_finished` plus a brain-authored `chat` carrying the child's result onto
the parent thread, with no dispatch re-fire. See [dispatch](./dispatch.md).

Resume rebuilds the same handles from the thread, not from disk. pi's disk `--continue` is
a cache that goes stale once the thread is continued on another device. The pi session
format and what reconstruction can and can't recover are worked out in
[resume notes](../thinking/pi-resume.md).

```
  same sandbox   { resume }           pi --continue <disk cache>     L5
  fresh sandbox  { resume, hydrate }  rebuild session from thread,   L7
                                      upload, pi --session <file>    (no disk --continue)
```

pi's session is one portable JSONL file, but the mirror posts a lossy projection of it:
user turns become `chat`, assistant turns become `pi.assistant`, and tree ids, labels, and
model entries are dropped. So a fresh-sandbox resume can't replay the thread directly. It
rebuilds a loadable linear session with `pi.reconstructSession` and loads it with
`--session` (not `--continue`, which scans a directory filtered by cwd and raced by mtime).
The prompt is canonical on the thread: `run()` posts it as a `chat` entry before driving
pi, so a rebuild reads it from the thread.

## Library (`src/`)

- `thread.ts` — the write side. `openThread()` mints a scoped grant and returns the stream
  endpoint; create, post, and read delegate to `@arbe/core/entries`.
- `sandbox.ts` — the compute side. `createSandbox()` returns a fresh Daytona sandbox with
  `.exec` and `.provision(agent)`.
- `coding-agent.ts` — the pi `CodingAgent` descriptor.
- `decide-pi-outcome.ts` — `decidePiOutcome` maps pi's stopReason and exit code to an
  outcome plus the `signal.thread.*` entries to post.
- `run.ts` — `run()`, the synchronous host driver: provision, drive pi, read the thread
  back; the host posts the terminal entry. Used by the proofs and `cli.ts`.
- `launch-coding-agent.ts` — `launchCodingAgent()`, the daytona body of `delegate_task`:
  spawn a session (or use a given thread), fire `arbe-pi-runner` detached, return the
  thread id at once. The CF worker can't block for minutes, so dispatch needs the detached
  shape. Its default thread is an `openThread()` grant and stream with no DB row; the
  parented child row with `environmentId` is core `createThread`'s job.
- `runner.ts` — `arbe-pi-runner`, running in the sandbox: owns pi's exit code, reads the
  thread back, runs `decidePiOutcome`, and posts the terminal entry on exit.

`run` and `launchCodingAgent` are two callers of the same handles. `run` blocks on `exec`
and the host posts the terminal; `launchCodingAgent` returns at once and the in-sandbox
runner posts the terminal. Both feed `decidePiOutcome` the same two inputs, the thread and
the exit code. A second harness (codex, claude-code) is a second `CodingAgent` with its own
`decideMessageType`; the orchestrator does not change.

## Egress and env

Daytona egress is whitelist-only and matched by domain. The sandbox can reach `npm`,
`github*`, `cloudflare.com`, and our Supabase host `*.supabase.co` (plus the other
[default-allowlisted services](https://www.daytona.io/docs/en/network-limits.md): package
managers, git hosts, container registries, LLM APIs). It cannot reach the CF worker
`arbe.0sk.ar`, `workers.dev`, or any other host, and a tunnel doesn't help because the
block is by domain. At our org tier the restriction cannot be overridden per sandbox;
lifting it means Daytona tier 3 (~400 EUR/mo), which we've decided against. So: the open
web is reachable only via a proxy on an allowlisted host, if a workload ever truly needs
it — none does today, since the harness itself only needs the allowlist (arbe-5783).
The signature of a blocked host: HTTPS gets a TLS reset (curl exit 35, `000` status),
plain HTTP gets a proxy 403.

That is why the stream-write path runs through a shim. `*.supabase.co` is reachable, so the
proxy is deployed as a Supabase Edge Function at `supabase/functions/stream-proxy`. It is a
dumb transport proxy: it forwards the request (method, path, query, body, and the original
`Authorization: Bearer <scoped-jwt>`) verbatim to the CF worker at `arbe.0sk.ar/api/stream`,
which owns all JWT verification, thread-scope enforcement, and secret-swap logic. The edge fn
requires one secret — `ARBE_WORKER_URL=https://arbe.0sk.ar/api/stream` — and no longer holds
`DURABLE_STREAMS_SECRET`. Deploy with `bunx supabase functions deploy stream-proxy` (config
pins `verify_jwt=false`).

Two more sandbox facts: pi installs in-sandbox with
`npm i -g @earendil-works/pi-coding-agent@0.78.0` (node is present via
`language: 'typescript'`), and the sandbox runs as a non-root user, so upload artifacts to
a HOME-relative path, not `/root`.

The mirror's write path:

```
mint:   mintStreamWriteJwt(houseId, threadId, DURABLE_STREAMS_SECRET) -> { jwt, expiresAt }  (2h TTL)
write:  POST <url>/api/stream/arbe-thread-<threadId>   Authorization: Bearer <jwt>
        body = { id, ts, authorId?, payload }          (payload: chat | pi.* | signal.*)
read:   GET  <url>/api/stream/arbe-thread-<threadId>?offset=0   Authorization: Bearer <jwt>  (NDJSON)
```

A scoped token can only touch its own thread; a cross-thread write returns 403.

The pi extension reads its config from these environment variables (via
`readPiThreadMirrorEnv` in `packages/sandbox/src/pi-extension/thread-mirror.ts`). The
sandbox must export the same canonical names dispatch uses:

| Env var | Required | Meaning |
|---------|----------|---------|
| `ARBE_THREAD_ID` | yes | target thread |
| `ARBE_STREAM_URL` | yes | stream-write base URL — the Supabase edge fn from the sandbox; the CF worker only off-sandbox |
| `ARBE_STREAM_TOKEN` | yes | scoped `stream:write` JWT (the `jwt` from `mintStreamWriteJwt`) |
| `ARBE_AUTHOR_ID` | no | author stamped on entries |
| `ARBE_PI_MIRROR_NEXT_INDEX` | no | resume offset |

## Box lifecycle

A sandbox is a machine, with a lifecycle independent of any run. pi is one process inside
it; many threads, commands, and runs share a box, and the box outlives any single run
(schema-v2 req 4 & 6). A pi terminal ends a process, never the machine — box teardown is
never keyed on a run's terminal, and never on a wall-clock timeout against the run.

There is no run timeout. A coding agent runs until it finishes; `ARBE_PI_TIMEOUT` is a
~3-day runaway guard for a hung process, not a run length.

Reaping is arbe-owned and idle-based. A box stays up while any thread on it
(`threads.sandbox_id`) is `running`; once none is, the reconcile/prune sweep
(`reconcileStuckThread` / `pruneStuckThreads`) deletes it — the same seam that clears stale
threads. When a run finishes there, `reapSandboxIfIdle` deletes the box (via an injected
`reapBox`, so `@arbe/core` stays provider-free) and tombstones its row. Resume re-resolves
to the environment's live box, or makes a fresh one, so idle means delete; a stopped box
buys nothing. The discriminator is `sandboxes.ephemeral`: `delegate_task` boxes are
`ephemeral=true` and reapable, while an environment's shared box (inline `run_command`) is
`ephemeral=false` and never touched.

Death is pull-confirmed, never webhook-driven — one org-wide webhook is a forged-event risk
against other houses' boxes. `confirmSandboxLiveness(sandboxId, probeBox)` asks Daytona
about a house-scoped `provider_ref` through that house's own runtime: a missing box (404) or
terminal-fault state (`error`/`build_failed`/`removing`) tombstones the `sandboxes` row to
`dead` and orphans the threads on it; a `live`/`unknown` verdict is a no-op, so a network
blip never tombstones a real box. It rides the reaper's two seams: the read-path/prune
reconcile (the current thread's box) and `arbe sandbox list --reconcile` (the cold-row sweep
— there is no cron). `probeBox` is injected by the worker, mirroring `reapBox`.

Daytona's own auto-stop/delete is the dumb backstop. `createSandbox` sets `autoStopInterval`
from the runaway guard plus a buffer (Daytona counts only API calls as activity, and a
detached run makes none, so the clock runs from launch — the value must clear the guard or
it would stop a live run) and `autoDeleteInterval: 0`. This catches an abandoned box (worker
dead, runner crashed) after ~3 days; it is not the primary reaper.

Spawned boxes are labelled `arbe.house` / `arbe.thread` / `arbe.environment` on create, so
`arbe sandbox list --runtime daytona` shows which run owns a box. Driving the broader
lifecycle is the Daytona epic (arbe-7959).

Once open, now answered (2026-06): the egress whitelist cannot include `arbe.0sk.ar` at our
org tier (tier 1/2 restrictions are not overridable, and tier 3/4 — `networkAllowList`, max
10 IPv4 CIDRs — is out of budget). So all sandbox-to-thread traffic flows through the edge
function permanently and the two proxies must be kept from drifting.

See [sprite runtime](./sandbox-sprite.md), [dispatch](./dispatch.md),
[environments](./environments.md), [secrets](./secrets.md), [runtime](./runtime.md).
