# Anthropic Managed Agents vs arbe

Anthropic's Managed Agents (April 2026) and arbe converge on the same design move: decouple the brain (the loop that calls Claude) from the hands (the sandbox where code runs) from the session (the append-only log of what happened). Their `Session` ↔ arbe's stream. Their `execute(name, input) -> string` interface ↔ arbe's "the DO is the durable brain, the sandbox is the temporary body." Their "harness assumptions go stale as models improve" ↔ arbe's "primitives outlast the surface."

```
Session   = append-only log (replayable, lives outside the harness)
Harness   = the loop calling Claude; cattle, not pets — recoverable via wake(sessionId)→getSession(id)
Sandbox   = execute(name, input) -> string; container is cattle, provisioned via tool call
Vault     = credentials never live in the sandbox; MCP proxy fetches per call
```

Their evolution: started with everything in one container (session, harness, sandbox sharing an environment); pet infra, debugging blind spots, customers couldn't bring their own VPC. Decoupling made each an interface that can fail independently. p50 TTFT dropped ~60%, p95 dropped >90% once sandbox provisioning became lazy. Security: tokens stored in a vault, MCP proxy fetches credentials per call so the harness never sees them; git access tokens clone repos at sandbox init so push/pull work without the agent handling the token.

API surface: four concepts (`Agent` = model + system prompt + tools + MCP + skills, versioned; `Environment` = container template; `Session` = running agent in an env; `Events` = messages exchanged). Endpoints: `POST /v1/agents`, `POST /v1/environments`, `POST /v1/sessions`, `POST/GET /v1/sessions/:id/events`, `GET /v1/sessions/:id/stream` (SSE). Event types include `user.message`, `user.interrupt`, `user.tool_confirmation`, `user.custom_tool_result`, and agent-side `agent.message` / `agent.thinking` / `agent.tool_use` / `agent.custom_tool_use` / `agent.mcp_tool_use` / `session.status_idle`. Built-in tools: bash, read, write, edit, glob, grep, web_fetch, web_search.

Where they diverge: Anthropic's scope is hosted Claude-as-an-agent; arbe's is a general substrate. Their permission model is tool-level allow/ask scoped to a session; arbe's current model is house membership inherited by threads, environments, and configs. Their multi-agent is a research preview; arbe's threads are first-class multi-writer. They give you `agent / environment / session / events` (4 concepts); arbe has `records / streams / agents / permissions / mutations / signals` (6 primitives). Their hosting is opaque; arbe's storage is explicit (Postgres, Durable Streams, DO SQLite, sandbox FS). They're single-vendor (Claude); arbe is model-agnostic.

What arbe has that Managed Agents doesn't: shared scope membership (humans and bots participate through the same identity model); first-class activation policies (mention / ambient / cron / webhook / event — Managed Agents only activates via explicit API call); local-first + remote (arbe works offline and syncs); domain-specific orchestration (task graphs, loop iterations, stuck detection).

**Worth borrowing:** the `execute(name, input)` interface as the universal hand abstraction (arbe's sandbox package is heading there but could be more explicit); **session-as-context-object** with positional slicing (`getEvents(from, to, filter)` for harness-driven context engineering — connects to memory.md's working-vs-durable split); **lazy sandbox provisioning** (containers spin up via tool call only when needed); **session interrupt as a first-class stream event**; **client-executed custom tools** over the event stream (browser-side capabilities to an agent); **versioned agent configurations** (created once, referenced by ID + version across sessions).

**Don't borrow:** hosting opacity (arbe's explicit storage boundaries are a feature), single-vendor lock, shallow per-tool permissions, branding restrictions.

**See:** [thinking/layers](./layers.md), [thinking/primitives](./primitives.md), [thinking/electric-agents](./electric-agents.md) (sister comparison — both converge on the same brain/hands/log decoupling), [thinking/memory](./memory.md), [thinking/capabilities](./capabilities.md).