View as .md

Anthropic Managed Agents vs arbe

Anthropic’s Managed Agents (April 2026) and arbe converge on the same design move: decouple the brain (the loop that calls Claude) from the hands (the sandbox where code runs) from the session (the append-only log of what happened). Their Session ↔ arbe’s stream. Their execute(name, input) -> string interface ↔ arbe’s “the DO is the durable brain, the sandbox is the temporary body.” Their “harness assumptions go stale as models improve” ↔ arbe’s “primitives outlast the surface.”

Session   = append-only log (replayable, lives outside the harness)
Harness   = the loop calling Claude; cattle, not pets — recoverable via wake(sessionId)→getSession(id)
Sandbox   = execute(name, input) -> string; container is cattle, provisioned via tool call
Vault     = credentials never live in the sandbox; MCP proxy fetches per call

Their evolution: started with everything in one container (session, harness, sandbox sharing an environment); pet infra, debugging blind spots, customers couldn’t bring their own VPC. Decoupling made each an interface that can fail independently. p50 TTFT dropped ~60%, p95 dropped >90% once sandbox provisioning became lazy. Security: tokens stored in a vault, MCP proxy fetches credentials per call so the harness never sees them; git access tokens clone repos at sandbox init so push/pull work without the agent handling the token.

API surface: four concepts (Agent = model + system prompt + tools + MCP + skills, versioned; Environment = container template; Session = running agent in an env; Events = messages exchanged). Endpoints: POST /v1/agents, POST /v1/environments, POST /v1/sessions, POST/GET /v1/sessions/:id/events, GET /v1/sessions/:id/stream (SSE). Event types include user.message, user.interrupt, user.tool_confirmation, user.custom_tool_result, and agent-side agent.message / agent.thinking / agent.tool_use / agent.custom_tool_use / agent.mcp_tool_use / session.status_idle. Built-in tools: bash, read, write, edit, glob, grep, web_fetch, web_search.

Where they diverge: Anthropic’s scope is hosted Claude-as-an-agent; arbe’s is a general substrate. Their permission model is tool-level allow/ask scoped to a session; arbe’s current model is house membership inherited by threads, environments, and configs. Their multi-agent is a research preview; arbe’s threads are first-class multi-writer. They give you agent / environment / session / events (4 concepts); arbe has records / streams / agents / permissions / mutations / signals (6 primitives). Their hosting is opaque; arbe’s storage is explicit (Postgres, Durable Streams, DO SQLite, sandbox FS). They’re single-vendor (Claude); arbe is model-agnostic.

What arbe has that Managed Agents doesn’t: shared scope membership (humans and bots participate through the same identity model); first-class activation policies (mention / ambient / cron / webhook / event — Managed Agents only activates via explicit API call); local-first + remote (arbe works offline and syncs); domain-specific orchestration (task graphs, loop iterations, stuck detection).

Worth borrowing: the execute(name, input) interface as the universal hand abstraction (arbe’s sandbox package is heading there but could be more explicit); session-as-context-object with positional slicing (getEvents(from, to, filter) for harness-driven context engineering — connects to memory.md’s working-vs-durable split); lazy sandbox provisioning (containers spin up via tool call only when needed); session interrupt as a first-class stream event; client-executed custom tools over the event stream (browser-side capabilities to an agent); versioned agent configurations (created once, referenced by ID + version across sessions).

Don’t borrow: hosting opacity (arbe’s explicit storage boundaries are a feature), single-vendor lock, shallow per-tool permissions, branding restrictions.

See: thinking/layers, thinking/primitives, thinking/electric-agents (sister comparison — both converge on the same brain/hands/log decoupling), thinking/memory, thinking/capabilities.