Skip to content
View as .md

Inference build

Patrick Collison, paraphrased: “I want GNU Autotools × Notion. Input files + context, real-time collab, snapshots/VCS, managed inference workflows + stored prompts, general-purpose coding agents (not just chat), and compiled outputs/inference results worth saving and sharing.”

That is a description of arbe’s substrate from the outside. Worth recording because external pull validates the thesis — and because it sharpens which seam is actually missing.

The collapse

The honest reduction: it’s a git repo + multiplayer coding agents persisted in threads. Git already gives files, context, VCS, snapshots, and addressable artifacts (path + sha) for free. Coding agents write anything into the repo. Threads are the multiplayer inference log. Most of Patrick’s wishlist falls out of what arbe already has.

Patrick wants arbe has it as
────────────────────── ─────────────────────────────────────
input files + context repo files (in a bound environment)
real-time collab threads (multiplayer, cloud, portable)
snapshots / VCS git in the repo; threads as run history
coding agents not chat delegate_task → pi on a sandbox
stored prompts agent record (system_prompt) / team recipe
inference workflows activation policy + tools + permissions
compiled outputs ← the seam (see below)

Run it through the primitive test

The house rule (what-not-to-build.md): no seventh primitive, no workflow DSL. So nothing here is a new entity — it’s records, activation paths, and one provenance edge.

  • recipe = a record. Agent system_prompt today; a team bundle is the nearest reusable form. Prompt + inputs + tools, stored. Not a new primitive — “an agent is what it does.”
  • trigger = an activation policy. activation.md already covers it: cron, interval, webhook, mention, ambient, event — same category, configuration not ontology. dispatch(policy, event) → run.
  • run = a thread (child room). delegate_task already spawns one: child thread + fresh sandbox, parent_thread_id edge back (packages/core/dispatch/agent-tools.ts).
  • provenance = partly there. thread.config snapshots agent/model/tool config at creation; the parent_thread_id chain is the delegation tree.
  • artifact = the genuinely open seam.

prompt → cronjob → delegate_task → artifact is therefore not four features. It’s one recipe record fired through one activation path producing one run — three of which arbe already has.

The missing seam: artifact + derivation

git tracks state, not what produced what. A commit says “this file changed”. It does not say “output Z = prompt D over inputs B, C, via run R”. That edge is the entire make/autotools half of Patrick’s ask, and it is what arbe lacks today.

input files (repo) ──┐
├─▶ run (thread) ──▶ artifact (commit + ref)
stored prompt ─┘ │ │
└── derivation edge: artifact ← {inputs[], prompt, run}

What the edge unlocks — none of it free from git alone:

  • Provenance: which run + prompt built this file. Replay, diff, “what made this”.
  • Incremental rebuild: input sha changed → target stale → rerun that recipe only. make over inference.
  • Targets: make report.md knowing report depends on data/ + a prompt.

Design pull: an artifact is not a new content store. It is an addressable ref into the repo (commit sha + path) plus a derivation record. The repo is the artifact store (Patrick’s instinct, and ours). The substrate’s job is the edge, not the bytes. This stays inside the primitives: the artifact ref + derivation live as records/signals; the bytes live in git on the bound environment; the run is a thread.

activation.md already names “a result artifact with the remote message ID and timestamp” in passing — but only for the side-effect case (posted-to-Discord). It does not develop the re-derivable, dependency-tracked build artifact. That is the white space this doc marks.

A candidate substrate: Cloudflare Artifacts

Artifacts (Cloudflare, beta 2026) is the same bet as this doc — “git’s data model is good for anything where you need to track state, time-travel, and persist large amounts of small data” — shipped as a managed service. It is worth understanding precisely, because at first glance it reads as “git, but Cloudflare’s” and the interesting part is what makes that framing wrong.

What it is. A git server you never run, offered as millions of cheap programmatic repos. You mint a repo with an API call, hand out a short-lived read or write token, and talk to it with any standard git client. Under the hood each repo is a Durable Object (history in SQLite, snapshots in R2, tokens in KV); the git protocol engine is a ~100KB Zig→WASM binary. The unit you pay for is operations + stored GB, not servers.

Why over plain git — the honest answer is operational, not semantic. The data model is git; you don’t reach for it because it versions differently. You reach for it when you want git semantics (fork, diff, revert, time-travel) at a scale and cadence where running git yourself is the wrong shape:

plain git Artifacts
─────────────────────────────────── ─────────────────────────────────────
a repo = a host + a disk you manage a repo = an API call; mint millions
fork = clone (copy bytes, plan ahead) fork = O(1) branch from a baseline
access = SSH keys / a forge's authz per-repo scoped tokens, auto-expiring
cold start = full clone, then work ArtifactFS: blobless mount, hydrate
files on demand (manifests first)
time-travel = only what was committed time-travel over uncommitted state too
(file state + session/prompt state)

The two rows that matter for arbe: repo-per-anything is cheap (no host to provision per run/agent/session), and time-travel covers uncommitted state — you can rewind a run’s filesystem and its prompt/session state even if nothing was ever git commited. That second one plain git does not do.

What it changes here. It removes the cost objection behind this doc’s central assumption. “git is the artifact store, refs not bytes” only holds if binding a repo to every run is cheap — and the open question below (“do env-less threads force an earlier repo binding?”) was really a cost question. Artifacts answers it: a repo per thread is an API call, so the artifact plane can be universal instead of opt-in. It maps onto primitives we already have:

  • run → fork. delegate_task forks a branch from a baseline, pi commits its work, we diff the run against the original. The “run = thread + sandbox” plane gains a durable, branchable file plane for near-free.
  • provenance → git-notes. Artifacts supports native git-notes (metadata attached to a commit without mutating it). The derivation record — artifact ← {inputs[], prompt, run, target} — hangs off the commit as a note. No new content store; the “records, not a bytes table” instinct survives intact.
  • cold start → ArtifactFS. Blobless on-demand hydration is the part that actually earns its keep for sandbox startup — the run mounts the repo and pulls file contents lazily instead of blocking on a full clone.

What it is NOT. It tracks state, exactly like git — it does not track what produced what. The missing seam above (the derivation edge, incremental rebuild, make report.md) is still entirely ours to build. Artifacts is a better-scaling place to attach that edge, not the edge. So: substrate, not seam. It makes the artifact-store assumption cheap enough to always hold and gives git-notes as the natural home for derivation — but make for inference is still the thing we write.

Open before committing. It’s beta (public May 2026), usage-priced ($0.15/1k ops, $0.50/GB-mo), and the whole value rests on ArtifactFS hydration being fast enough for cold-start — that is the first thing a spike should measure, not the git API.

Idea to explore — working-tree reuse across sandbox runtimes. Possibly the most concrete payoff, ahead of the derivation edge. If a run’s working tree lived in Artifacts, three current pains soften at once: per-run upload (arbe-3743) becomes a blobless ArtifactFS mount that hydrates on demand; working-tree restore on resume (arbe-ba40) gets time-travel over uncommitted state, not just the last commit; and delegate_task forks the parent tree O(1) so a child box mounts its own branch with no byte-copying and a free diff back. It also rhymes with the runtime seam (arbe-7759): a tree in Artifacts is one both sprite and daytona mount identically, so the working tree stops being per-adapter. Not free, and not yet thought through — it needs (1) the tree to live in Artifacts as source of truth (the “earlier repo binding” question above), and (2) ArtifactFS to actually mount inside both runtimes (FUSE/permissions), which is the real risk and the thing to prove first.

New assumptions to test

  • git is the artifact store. No artifact table holding bytes — only refs + derivation. Holds as long as every run binds an environment with a repo. Env-less threads (the create_environment offer path) have no artifact plane; is that fine, or does the artifact concept force an earlier repo binding?
  • derivation is recordable without a DSL. The edge is {inputs[], prompt/recipe, run, target} — data, not a graph language. n8n draws the DAG; we record edges and let the graph be emergent (same move as records → scopes). Test: can incremental rebuild be computed by walking derivation edges + comparing input shas, with no declared build file?
  • a recipe is just an agent + bound inputs. No WorkflowRecipe table. A reusable inference recipe = agent record + activation policy + declared input refs. Test against the hard case: a multi-step pipeline where run B consumes run A’s artifact — does edge-walking suffice, or does that secretly need an ordered plan?
  • trigger taxonomy is already complete. cron + webhook + event + chat covers Patrick. Nothing new to invent there beyond what activation.md schedules.

What goal are we chasing

make for inference, over git, multiplayer. A place where stuff accumulates — files, prompts, context — and you compute over it in an iterated way, with the build artifacts worth saving and sharing, and the system knowing what produced what so it can rebuild only what went stale.

arbe already owns the rare, hard parts: shared identity/permissions, multiplayer durable threads, portable cloud runs, real coding agents (not chat). The thin missing layer is the derivation edge — artifact ← inputs + prompt + run — and the non-chat activation paths (cron, webhook) that activation.md has specced but not yet built. Ship those two and arbe is the tool Patrick is describing, without a seventh primitive and without a workflow DSL.

See: thinking/activation, thinking/capabilities, thinking/primitives, thinking/what-not-to-build, thinking/thesis, surfaces.