# Observability

How the system witnesses its own operation: latencies, decisions, spend, failures. Not content (that's streams), not structural change (that's mutations) — see [primitives](../thinking/primitives.md) for where signals sit in the formal layer.

Four layers, picked by the question you're asking:

| Layer | Answers | Storage | Retention |
|---|---|---|---|
| Run state | what ran, what's running, what happened last | Postgres + thread streams, via HTTP API | durable |
| Usage | who spent what, on whose key, at what cost | `usage_events` + PostHog | durable / plan limits |
| Lifecycle | aggregate trends — signups, reply rates, activity | signals on threads + PostHog mirror | durable / plan limits |
| Cloudflare logs | live debugging | CF dashboard, `wrangler tail` | ~72h (free tier) |

Run state is remote-only: the CLI has no local database; Postgres and the thread's durable stream are the sole sources of truth.

Usage: every seam that spends money calls `recordUsage()` after the spend; one event lands in the `usage_events` ledger and in PostHog, joined by `trace_id`. Dispatch mints one `traceId` per turn and stamps it on the turn's `signal.dispatch.*` payloads and every `recordUsage` event the turn produces — so "activation started → spend → result" joins across the thread stream, the ledger, and PostHog. Call shape and columns: [analytics → usage](./analytics.md#usage--money); whose money it is: [llm-keys](./llm-keys.md).

Lifecycle: typed `signal.<entity>.<verb>` entries on threads, mirrored to PostHog by `track()` when the agent opted in. Call shape and vocabulary: [analytics](./analytics.md).

Cloudflare logs: console output streams to the CF dashboard and `wrangler tail` (`observability.logs.enabled` in wrangler config). Console statements use bracket prefixes: `[dispatch.gate]`, `[dispatch.turn]`, `[usage]`.

Dispatch activations are visible twice: durably as `signal.dispatch.*` on the thread, and in PostHog as `dispatch.started/completed/skipped/failed` (mirrored by `publishDispatchSignal`, always-on). Both carry the turn's `trace_id`.

PostHog receives operational data only — latencies, model usage, failure rates, token/cost counts, and app ids (UUIDs, sent raw — they're meaningless without DB access and keep cross-referencing easy). Server capture disables GeoIP enrichment and drops content/name/path fields. Never send content or identity (emails, names) or session data.

See [analytics](./analytics.md), [debugging](./debugging.md), [system/dispatch](./dispatch.md).
