# House budget design

**Status:** shipped — per worker-key seam enforcement (`worker-budget.ts`); migrations `20260611140000` + `20260611150000` + `20260612120000` (default/backfill) + `20260612130000` (usage split).
**Prerequisite:** arbe-d9b7 ledger (usage_events) + item-4 finally-block tally (errored/stalled turns now record spend)

> **Implementation gotcha (don't relearn this):** `budget_check` must be
> `SECURITY DEFINER`. The gate calls it through the request user's RLS-scoped
> client, and `usage_events` has RLS enabled with **no select policy** — an
> invoker-rights function sees zero rows there and the sum returns 0, so the cap
> never trips. Definer rights (with a pinned empty `search_path`) let the trusted
> gate read the whole ledger. `resolve_scope_context` is `SECURITY DEFINER` for
> the same reason. See migration `20260611150000`.

---

## Grounding: what the ledger actually looks like

`usage_events` is an append-only table (migration `20260611100000_usage_events.sql`):

```
house_id, agent_id, thread_id, trace_id,
capability (llm|sandbox|file_index|…),
seam (gate|reply|sandbox_provision|sandbox_exec|…),
key_source (worker|house|env),
model_ref, input_tokens, output_tokens,
cache_read_tokens, cache_write_tokens,
cost_usd,
created_at
```

Index: `(house_id, created_at desc)` — the gate query shape below hits it.

`recordUsage()` (`packages/core/usage.ts`) is fire-and-forget into Postgres + PostHog. It never throws, never blocks the caller. It is called at three seams:

- **`reply`** — `runBotTurn` in a `finally` block, from the caller-owned tally; fires whether the turn finished clean, threw, or returned an error. (`dispatch.ts` lines 806–831)
- **`gate`** — ambient relevance gate before a turn fires. (`dispatch.ts` lines 269–283)
- **`sandbox_provision` / `sandbox_exec`** — sandbox tools (`sandbox-tools.ts`)

`flushUsage()` drains all in-flight writes in the outermost dispatch `finally` so workerd doesn't drop them when the response returns.

The existing `usage_house_overview` RPC aggregates over `usage_events` in a single SQL pass (migration `20260611130000`). `by_capability`, `by_model`, and `by_thread` sections are already used by the usage UI. No cached counter table exists today.

`keySource` attribution matters: `worker` = arbe pays; `house`/`env` = the house is using its own BYOK. A cap on arbe-funded spend (`key_source = 'worker'`) means something different from a cap on total spend. **Policy must say which.**

---

## Decision 1 — Policy model

### Options

| Shape | Description |
|---|---|
| Monthly rolling | Cap resets on a calendar-month boundary. Most natural billing analogy. |
| One-time / lifetime | Bucket of credit that depletes and never refills without manual action. |
| Both — pick per house | A `cap_type` field + `cap_amount`; default null = uncapped. |

**Recommendation: one-time (lifetime) cap for v1, stored as a single nullable column on `houses`. Monthly is a later add-on.**

Rationale:
- One-time is the simplest correct thing and it *removes* a whole decision: with no reset window there is no calendar-vs-rolling question (OQ-2 dissolves), no `created_at >= window_start` clause in the gate query, and no reset cron/trigger ever. The gate is just "total arbe-funded spend so far vs the cap."
- A refill is a manual bump of the column. At alpha (no users but us) that is the right amount of machinery, not a gap.
- Monthly is a strict superset: when a real billing-cycle need appears, add a `window_start` back to the query (and a `cap_period` field) without changing the column's meaning. Nothing here forecloses it.
- A nullable column on `houses` is the smallest correct change: no new table, no new FK, no join in the enforcement query. The house row already has `id`, `name`, `created_at`; a `spend_cap_usd numeric(12,6)` column fits naturally.

**Decided (OQ-1) — cap arbe-funded spend only.** v1 enforces against `key_source = 'worker'` rows (turns arbe subsidises). BYOK spend (`key_source IN ('house','env')`) is the house's own provider bill and is never capped by default. The gate query therefore filters `key_source = 'worker'`.

Forward-compat: a house may later want to cap its *own* BYOK spend too. The single nullable `spend_cap_usd` column does not preclude this — when we add it, introduce a `cap_scope` field (`worker` default vs `all`) rather than changing the existing column's meaning. Out of scope for v1; noted so the column choice stays compatible.

**Resolved (OQ-2) — no reset window in v1.** The cap is one-time/lifetime, so there is no anchor to choose. If monthly is added later, the anchor question returns then (calendar-month `date_trunc('month', now())` vs rolling `now() - interval '30 days'`) — not now.

---

## Decision 2 — Enforcement seam

### Why pre-dispatch is impractical

A turn's cost is only known **after** the model call returns (usage tokens are in the response, not the request). The gate call itself also spends tokens. There is no way to know the exact cost before starting a turn.

### Options

| Seam | Description | Trade-off |
|---|---|---|
| Pre-dispatch gate | Refuse to start a turn if running total already exceeds cap. | May refuse a turn that would have cost $0.001 while the house is $0.0001 over; still lets the current-in-flight turn complete first. |
| Post-turn block | Finish the turn, block the **next** one. | Lets one turn overshoot the cap by its own cost. For a $1/month cap and a $0.05 turn, that's a 5% overshoot max. |

**Recommendation: pre-dispatch gate (check before starting a new turn, not after).**

Rationale: the check is "is the running total already over the cap?", not "will this turn put us over?". The answer is known from committed `usage_events` rows before the turn starts. A house that is over budget at turn-start gets a clear refusal. A house that goes over *during* a turn has already committed the spend — accepting that one overshoot is unavoidable (and is bounded by a single turn's cost). The pre-dispatch check is the right place to insert the guard: `createThreadDispatcher` loads `scope.houseId` before the bot loop, and a single RPC call there answers the question.

The enforcement seam is `dispatch.ts` just after `resolveScopeContext` returns `scope.houseId` (line ~165), before the bot classification loop. The gate is synchronous from dispatch's perspective (one `await`), and failure short-circuits with a `signal.dispatch.skipped` reason `budget_exceeded`.

---

## Decision 3 — Checking the cap cheaply

No cached counter exists. The right query is a live aggregate over `(house_id, created_at)`, hitting the existing index.

### Query shape (RPC)

```sql
-- v1: lifetime arbe-funded spend total for a house (no time window).
-- Proposed as a new RPC: budget_check(p_house_id).
select coalesce(sum(cost_usd), 0) as total_cost_usd
from public.usage_events
where house_id = p_house_id
  and key_source = 'worker';  -- cap arbe-subsidised spend only (OQ-1)
```

No `created_at` bound in v1 — the cap is lifetime (OQ-2 resolved). When monthly is added later, a `p_window_start` param returns and the clause becomes `and created_at >= p_window_start`.

This is a single index scan on `(house_id, created_at desc)` keyed by `house_id` — the index already exists. For the alpha phase with a small event count per house, this is cheap enough to call on every dispatch turn. At scale a materialized running counter is the upgrade path; no need to add it now.

### Alternative: cached counter

A `house_spend` table with a `current_month_usd` column, incremented by a Postgres trigger on `usage_events` insert. Pros: O(1) read. Cons: trigger complexity, drift risk if triggers misfire, and the counter needs to reset on month rollover (a cron or a conditional in the trigger). The live aggregate is simpler and correct-by-construction. Defer the cache until the aggregate proves too slow.

---

## Decision 4 — Failure UX

When the gate query shows `total_cost_usd >= spend_cap_usd`:

**Agent sees (on the stream):**

```
signal.dispatch.skipped   reason: budget_exceeded
```

This is already the standard `skipped` shape; `budget_exceeded` is a new reason value. No new entry type needed.

**Human sees (in the UI):**

A thread status chip or inline notice: "Budget limit reached — this house has spent $X of its $Y cap. Turns paused until an owner raises the cap." (The cap is one-time/lifetime in v1, so there is no reset to wait for — only a manual bump.) The exact copy is a product decision; the data needed (`total_cost_usd`, `spend_cap_usd`) is available from the gate check result.

**Agent that attempted to reply:**

The bot turn never starts. No `signal.dispatch.started` is emitted. The `skipped` signal carries the reason, so a client rendering dispatch signals can surface it to the bot's author without embedding budget details in the thread narrative.

**No new error code** is needed in `@arbe/errors` for MVP; the existing `signal.dispatch.skipped` reason field is the observable surface. An `ArbeError` subclass is the right extension if the HTTP API needs to return a structured error (e.g. `POST /api/threads/:id/entries` → 402 Payment Required), but that is a separate surface decision.

---

## Decision 5 — Worker-hang residual

### What item-4 fixed

Before item-4, an errored or stalled turn might not flush its token tally. The fix wraps `runBotTurn` in a `finally` that calls `recordUsage()` with the caller-owned tally, so the spend is recorded even if the turn threw or returned an error final.

### What remains: worker-process kill (no finally)

If the Cloudflare Worker process is killed mid-turn — OOM, platform-forced shutdown, `waitUntil` budget exhausted — the `finally` never runs. The tally is in memory and is lost. The model call may have already streamed tokens to the provider, but the ledger row is never written.

**Implication for budget accuracy:** The cap check reads `usage_events`. A killed-process turn's tokens are not in `usage_events`. The house appears to have spent less than it actually did. A house at $0.98 of a $1.00 cap could have one killed turn whose $0.05 of spend is missing, making the next turn appear safe when it is not.

**Recommendation: accept this as a known residual with documented bounds, not a blocker.**

Rationale:
- Worker kills are rare on Cloudflare Workers (the platform is designed to avoid them; `waitUntil` gives 30 seconds for background work).
- The overshoot is bounded: one turn's cost. For modest caps ($5–$50/month) and typical turn costs ($0.01–$0.10), the overshoot is 0.2–2% of the cap.
- A 10% safety buffer on the cap (enforce at 90% of stated cap) eliminates the practical risk entirely without requiring infra changes.
- The right long-term fix is a provider-side webhook or usage API (Anthropic/OpenRouter reports actual charges, not what we recorded) — that is a separate workstream.

**Design note:** document the residual in the enforcement comment so the next engineer does not misread a missing event as a ledger bug.

---

## Schema change (shipped)

```sql
alter table public.houses
  add column spend_cap_usd numeric(12,6);
-- null = uncapped (legacy rows created before the default cap)
-- positive value = one-time/lifetime cap in USD
```

`createHouse` stamps `DEFAULT_HOUSE_SPEND_CAP_USD` (`5`, in `packages/core/houses.ts`) on every new house. The cap does not renew; bump it manually during alpha.

Plus the RPC `budget_check(p_house_id text) returns numeric` from Decision 3 —
no `p_window_start` in v1 (lifetime cap). It must be `SECURITY DEFINER` (see the
gotcha note at the top). Both live in migrations `20260611140000` (column + RPC)
and `20260611150000` (definer fix).

---

## Summary of recommendations

| Decision | Recommendation |
|---|---|
| Policy model | One-time/lifetime cap (v1), nullable `spend_cap_usd` column on `houses`; monthly is a later add-on |
| Cap scope | arbe-funded only (`key_source = 'worker'`); BYOK uncapped — decided OQ-1 |
| Reset window | None in v1 (lifetime cap) — OQ-2 dissolved |
| Enforcement seam | Pre-dispatch gate, after `resolveScopeContext`, before bot loop |
| Check method | Live aggregate RPC; no cached counter yet |
| Failure UX | `signal.dispatch.skipped { reason: 'budget_exceeded' }` + UI surface |
| Hang residual | Accept as known; add 10% safety buffer on cap if needed |

## Open questions for the human

**OQ-1 — Cap scope: DECIDED.** v1 caps `key_source = 'worker'` (arbe-subsidised) spend only; BYOK is the house's own bill and stays uncapped. A future opt-in BYOK cap is anticipated via a `cap_scope` field — see Decision 1.

**OQ-2 — Reset anchor: DISSOLVED.** v1 uses a one-time/lifetime cap, so there is no reset window to anchor. The calendar-vs-rolling question only returns if monthly budgets are added later.
