# LLM models and keys

Users pick **models**, never providers. A model ref is `provider/model-id` (pi format, e.g. `openrouter/anthropic/claude-haiku-4.5`). The provider prefix is only the join to a key — it decides which secret to look up.

OpenRouter is the default and recommended provider: one key covers every model namespace. Direct-provider refs (`anthropic/<id>`, `google/<id>`) work only if a key for that provider resolves.

## Model resolution

For every LLM turn (in-server reply, ambient gate, sandbox pi turn):

```
thread config `model` → agent `model` → `defaultModelRef` (cheap OpenRouter preset)
```

Thread = "try this model for this conversation". Agent = durable preference. The thread override is an optional lever, set rarely — it wins because it's the most deliberate, most local choice, and in a multi-agent thread it pins every participant to one model.

## Key resolution

One precedence, both the reply path and the sandbox path:

```
env-binding secret → house secret → worker env → fail loudly
```

- **App supplies** (worker env): `OPENROUTER_API_KEY` only. This is the base key that makes arbe work without BYOK.
- **House supplies** (house secrets, same names): `OPENROUTER_API_KEY` to replace the app key, or direct-provider keys (`ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, ...) to unlock direct refs.
- **Env binding** (sandbox environments): secrets bound to the environment win over house secrets.

A failed secret resolve logs and falls back to the next tier; failures are never cached. Resolved overlays are cached ~60s per isolate (keyed house + agent + env), so key rotation lands within a minute.

Code: `resolveTurnSecrets` in `packages/core/dispatch/turn-secrets.ts` — the one resolver behind in-server replies, the gate, `run_command`, and `delegate_task`. It returns `{values, sources}`: per key, which tier supplied it. The LLM read on top is `replyKeysFromSecrets` (`dispatch.ts`).

## Spend and budget

Two separate concerns: **whose key** (resolution above) and **whether arbe subsidises** (house budget).

### Attribution (`key_source`)

Every paid seam calls `recordUsage()` (`packages/core/usage.ts`) after the spend, stamping `key_source` from the resolver:

| `key_source` | Who pays | Counts toward `spend_cap_usd`? |
|---|---|---|
| `worker` | arbe (app env keys) | yes |
| `house` | house (BYOK secret) | no |
| `env` | house (env-bound secret) | no |

The ledger is per house, per event — no second table. Split totals with `where key_source = 'worker'` (see `budget_check`). Details: [analytics → usage](./analytics.md#usage--money).

### House budget (arbe subsidy only)

Each house has `houses.spend_cap_usd` — a **lifetime** cap on arbe-funded spend (`key_source = 'worker'`). New houses default to **$5** (column default). `NULL` = uncapped (legacy only).

- **arbe sets the cap** — service role / SQL during alpha. The column is not in authenticated `houses` update grants; house owners cannot raise it.
- **BYOK is the owner escape hatch** — add provider keys under [house secrets](/system/secrets/); LLM spend on those keys is the house's bill and ignores the cap.
- **BYOK ≠ zero arbe spend** — some seams always use worker keys (file indexing, parts of sandbox). Those still count against the cap.

### Enforcement

Cap checks run **only before a worker-key spend** — gate, reply, sandbox, file index — not as a blanket "pause all dispatch" when over cap. A house over budget but on BYOK for LLM keeps getting LLM turns; worker-only seams refuse when the cap is exhausted. Code: `packages/core/worker-budget.ts`, wired from `dispatch.ts` and worker-key tools.

Over cap on a worker seam → `signal.dispatch.skipped { reason: 'budget_exceeded' }` for that path. Design detail: [house budget](../design/house-budget.md).

## Exceptions

- **ragthis file indexing** is single-tenant by design: worker env key only, not house-overridable (see `docs/thinking/files.md`).
- `GEMINI_API_KEY` is **not** a worker-env key. Google models require a house or env-binding secret.