House budget design
Status: shipped — per worker-key seam enforcement (worker-budget.ts); migrations 20260611140000 + 20260611150000 + 20260612120000 (default/backfill) + 20260612130000 (usage split).
Prerequisite: arbe-d9b7 ledger (usage_events) + item-4 finally-block tally (errored/stalled turns now record spend)
Implementation gotcha (don’t relearn this):
budget_checkmust beSECURITY DEFINER. The gate calls it through the request user’s RLS-scoped client, andusage_eventshas RLS enabled with no select policy — an invoker-rights function sees zero rows there and the sum returns 0, so the cap never trips. Definer rights (with a pinned emptysearch_path) let the trusted gate read the whole ledger.resolve_scope_contextisSECURITY DEFINERfor the same reason. See migration20260611150000.
Grounding: what the ledger actually looks like
usage_events is an append-only table (migration 20260611100000_usage_events.sql):
house_id, agent_id, thread_id, trace_id,capability (llm|sandbox|file_index|…),seam (gate|reply|sandbox_provision|sandbox_exec|…),key_source (worker|house|env),model_ref, input_tokens, output_tokens,cache_read_tokens, cache_write_tokens,cost_usd,created_atIndex: (house_id, created_at desc) — the gate query shape below hits it.
recordUsage() (packages/core/usage.ts) is fire-and-forget into Postgres + PostHog. It never throws, never blocks the caller. It is called at three seams:
reply—runBotTurnin afinallyblock, from the caller-owned tally; fires whether the turn finished clean, threw, or returned an error. (dispatch.tslines 806–831)gate— ambient relevance gate before a turn fires. (dispatch.tslines 269–283)sandbox_provision/sandbox_exec— sandbox tools (sandbox-tools.ts)
flushUsage() drains all in-flight writes in the outermost dispatch finally so workerd doesn’t drop them when the response returns.
The existing usage_house_overview RPC aggregates over usage_events in a single SQL pass (migration 20260611130000). by_capability, by_model, and by_thread sections are already used by the usage UI. No cached counter table exists today.
keySource attribution matters: worker = arbe pays; house/env = the house is using its own BYOK. A cap on arbe-funded spend (key_source = 'worker') means something different from a cap on total spend. Policy must say which.
Decision 1 — Policy model
Options
| Shape | Description |
|---|---|
| Monthly rolling | Cap resets on a calendar-month boundary. Most natural billing analogy. |
| One-time / lifetime | Bucket of credit that depletes and never refills without manual action. |
| Both — pick per house | A cap_type field + cap_amount; default null = uncapped. |
Recommendation: one-time (lifetime) cap for v1, stored as a single nullable column on houses. Monthly is a later add-on.
Rationale:
- One-time is the simplest correct thing and it removes a whole decision: with no reset window there is no calendar-vs-rolling question (OQ-2 dissolves), no
created_at >= window_startclause in the gate query, and no reset cron/trigger ever. The gate is just “total arbe-funded spend so far vs the cap.” - A refill is a manual bump of the column. At alpha (no users but us) that is the right amount of machinery, not a gap.
- Monthly is a strict superset: when a real billing-cycle need appears, add a
window_startback to the query (and acap_periodfield) without changing the column’s meaning. Nothing here forecloses it. - A nullable column on
housesis the smallest correct change: no new table, no new FK, no join in the enforcement query. The house row already hasid,name,created_at; aspend_cap_usd numeric(12,6)column fits naturally.
Decided (OQ-1) — cap arbe-funded spend only. v1 enforces against key_source = 'worker' rows (turns arbe subsidises). BYOK spend (key_source IN ('house','env')) is the house’s own provider bill and is never capped by default. The gate query therefore filters key_source = 'worker'.
Forward-compat: a house may later want to cap its own BYOK spend too. The single nullable spend_cap_usd column does not preclude this — when we add it, introduce a cap_scope field (worker default vs all) rather than changing the existing column’s meaning. Out of scope for v1; noted so the column choice stays compatible.
Resolved (OQ-2) — no reset window in v1. The cap is one-time/lifetime, so there is no anchor to choose. If monthly is added later, the anchor question returns then (calendar-month date_trunc('month', now()) vs rolling now() - interval '30 days') — not now.
Decision 2 — Enforcement seam
Why pre-dispatch is impractical
A turn’s cost is only known after the model call returns (usage tokens are in the response, not the request). The gate call itself also spends tokens. There is no way to know the exact cost before starting a turn.
Options
| Seam | Description | Trade-off |
|---|---|---|
| Pre-dispatch gate | Refuse to start a turn if running total already exceeds cap. | May refuse a turn that would have cost $0.001 while the house is $0.0001 over; still lets the current-in-flight turn complete first. |
| Post-turn block | Finish the turn, block the next one. | Lets one turn overshoot the cap by its own cost. For a $1/month cap and a $0.05 turn, that’s a 5% overshoot max. |
Recommendation: pre-dispatch gate (check before starting a new turn, not after).
Rationale: the check is “is the running total already over the cap?”, not “will this turn put us over?”. The answer is known from committed usage_events rows before the turn starts. A house that is over budget at turn-start gets a clear refusal. A house that goes over during a turn has already committed the spend — accepting that one overshoot is unavoidable (and is bounded by a single turn’s cost). The pre-dispatch check is the right place to insert the guard: createThreadDispatcher loads scope.houseId before the bot loop, and a single RPC call there answers the question.
The enforcement seam is dispatch.ts just after resolveScopeContext returns scope.houseId (line ~165), before the bot classification loop. The gate is synchronous from dispatch’s perspective (one await), and failure short-circuits with a signal.dispatch.skipped reason budget_exceeded.
Decision 3 — Checking the cap cheaply
No cached counter exists. The right query is a live aggregate over (house_id, created_at), hitting the existing index.
Query shape (RPC)
-- v1: lifetime arbe-funded spend total for a house (no time window).-- Proposed as a new RPC: budget_check(p_house_id).select coalesce(sum(cost_usd), 0) as total_cost_usdfrom public.usage_eventswhere house_id = p_house_id and key_source = 'worker'; -- cap arbe-subsidised spend only (OQ-1)No created_at bound in v1 — the cap is lifetime (OQ-2 resolved). When monthly is added later, a p_window_start param returns and the clause becomes and created_at >= p_window_start.
This is a single index scan on (house_id, created_at desc) keyed by house_id — the index already exists. For the alpha phase with a small event count per house, this is cheap enough to call on every dispatch turn. At scale a materialized running counter is the upgrade path; no need to add it now.
Alternative: cached counter
A house_spend table with a current_month_usd column, incremented by a Postgres trigger on usage_events insert. Pros: O(1) read. Cons: trigger complexity, drift risk if triggers misfire, and the counter needs to reset on month rollover (a cron or a conditional in the trigger). The live aggregate is simpler and correct-by-construction. Defer the cache until the aggregate proves too slow.
Decision 4 — Failure UX
When the gate query shows total_cost_usd >= spend_cap_usd:
Agent sees (on the stream):
signal.dispatch.skipped reason: budget_exceededThis is already the standard skipped shape; budget_exceeded is a new reason value. No new entry type needed.
Human sees (in the UI):
A thread status chip or inline notice: “Budget limit reached — this house has spent $X of its $Y cap. Turns paused until an owner raises the cap.” (The cap is one-time/lifetime in v1, so there is no reset to wait for — only a manual bump.) The exact copy is a product decision; the data needed (total_cost_usd, spend_cap_usd) is available from the gate check result.
Agent that attempted to reply:
The bot turn never starts. No signal.dispatch.started is emitted. The skipped signal carries the reason, so a client rendering dispatch signals can surface it to the bot’s author without embedding budget details in the thread narrative.
No new error code is needed in @arbe/errors for MVP; the existing signal.dispatch.skipped reason field is the observable surface. An ArbeError subclass is the right extension if the HTTP API needs to return a structured error (e.g. POST /api/threads/:id/entries → 402 Payment Required), but that is a separate surface decision.
Decision 5 — Worker-hang residual
What item-4 fixed
Before item-4, an errored or stalled turn might not flush its token tally. The fix wraps runBotTurn in a finally that calls recordUsage() with the caller-owned tally, so the spend is recorded even if the turn threw or returned an error final.
What remains: worker-process kill (no finally)
If the Cloudflare Worker process is killed mid-turn — OOM, platform-forced shutdown, waitUntil budget exhausted — the finally never runs. The tally is in memory and is lost. The model call may have already streamed tokens to the provider, but the ledger row is never written.
Implication for budget accuracy: The cap check reads usage_events. A killed-process turn’s tokens are not in usage_events. The house appears to have spent less than it actually did. A house at $0.98 of a $1.00 cap could have one killed turn whose $0.05 of spend is missing, making the next turn appear safe when it is not.
Recommendation: accept this as a known residual with documented bounds, not a blocker.
Rationale:
- Worker kills are rare on Cloudflare Workers (the platform is designed to avoid them;
waitUntilgives 30 seconds for background work). - The overshoot is bounded: one turn’s cost. For modest caps ($5–$50/month) and typical turn costs ($0.01–$0.10), the overshoot is 0.2–2% of the cap.
- A 10% safety buffer on the cap (enforce at 90% of stated cap) eliminates the practical risk entirely without requiring infra changes.
- The right long-term fix is a provider-side webhook or usage API (Anthropic/OpenRouter reports actual charges, not what we recorded) — that is a separate workstream.
Design note: document the residual in the enforcement comment so the next engineer does not misread a missing event as a ledger bug.
Schema change (shipped)
alter table public.houses add column spend_cap_usd numeric(12,6);-- null = uncapped (legacy rows created before the default cap)-- positive value = one-time/lifetime cap in USDcreateHouse stamps DEFAULT_HOUSE_SPEND_CAP_USD (5, in packages/core/houses.ts) on every new house. The cap does not renew; bump it manually during alpha.
Plus the RPC budget_check(p_house_id text) returns numeric from Decision 3 —
no p_window_start in v1 (lifetime cap). It must be SECURITY DEFINER (see the
gotcha note at the top). Both live in migrations 20260611140000 (column + RPC)
and 20260611150000 (definer fix).
Summary of recommendations
| Decision | Recommendation |
|---|---|
| Policy model | One-time/lifetime cap (v1), nullable spend_cap_usd column on houses; monthly is a later add-on |
| Cap scope | arbe-funded only (key_source = 'worker'); BYOK uncapped — decided OQ-1 |
| Reset window | None in v1 (lifetime cap) — OQ-2 dissolved |
| Enforcement seam | Pre-dispatch gate, after resolveScopeContext, before bot loop |
| Check method | Live aggregate RPC; no cached counter yet |
| Failure UX | signal.dispatch.skipped { reason: 'budget_exceeded' } + UI surface |
| Hang residual | Accept as known; add 10% safety buffer on cap if needed |
Open questions for the human
OQ-1 — Cap scope: DECIDED. v1 caps key_source = 'worker' (arbe-subsidised) spend only; BYOK is the house’s own bill and stays uncapped. A future opt-in BYOK cap is anticipated via a cap_scope field — see Decision 1.
OQ-2 — Reset anchor: DISSOLVED. v1 uses a one-time/lifetime cap, so there is no reset window to anchor. The calendar-vs-rolling question only returns if monthly budgets are added later.