Track E.1 — Claude Code session ingestion watcher #27

Open
opened 2026-05-25 16:45:25 +00:00 by mathias · 0 comments
Owner

Part of mathias/infra Track E umbrella. Captures the missing data source the volume gate exposed.

Goal

Scan ~/.claude/projects/*/<uuid>.jsonl on each host, parse turns, scrub secrets, classify, ingest into brain. Per-host instance: koala, flamingo, iguana.

Scope

New package ingestion/internal/claudewatcher/:

  1. Watcher loopclaudewatcher.Watch(ctx, cfg) polls Claude Code projects dir on configurable interval (default 60s). New or modified .jsonl files trigger parse.
  2. Parser — reads JSONL incrementally, tracks last-ingested offset per file in postgres (new table claude_session_cursors).
  3. Scrubber — runs each turn's content through the secret-hygiene rules (regex set: bearer tokens, postgres URIs, API keys, SOPS-encrypted blocks, BEGIN PRIVATE KEY). Fail-closed: if any rule matches, drop the turn + log a warning (don't try to redact + ingest).
  4. Classifier — uses existing brain_classify tool. Filter: ingest only entries where type ∈ {decision, lesson, correction, surprise}. Skip routine code / note / unknown to keep brain signal high.
  5. Ingestor — emits to brain_ingest_raw with structured pages: {title, content, frontmatter: {wing: 'claude-sessions', hall: hall-by-type, session_id, project_root, host}}.
  6. Wiringcmd/server/main.go starts the watcher when CLAUDE_SESSIONS_DIR env var is set. Defaults to $HOME/.claude/projects. CLAUDE_INGEST_INTERVAL controls poll (default 60s).
  7. Tests — table-driven: parser handles partial JSONL, dedup cursor advances, scrubber catches 10 known-bad fixtures, classifier filter rules.

Per-host deployment

  • koala: ingestion pod already runs here. Mount /root/.claude/projects (or wherever the act_runner / Claude Code session writes) as read-only into pod, set CLAUDE_SESSIONS_DIR to that mount.
  • flamingo: needs a small sidecar — either run a second ingestion-mode pod, or run claudewatcher as a launchd job that POSTs to brain HTTP API.
  • iguana: same shape as flamingo, launchd job.

Flamingo + iguana wiring is out of scope for this issue — land koala first as proof-of-life, file follow-ups for the other two hosts.

Acceptance

  • New skill_session, decision, correction entries appear in brain/wiki/claude-sessions/<hall>/*.md within 60s of a Claude session producing them on koala
  • Re-measured volume gate (wc -l brain/sessions/*.jsonl + classified-entries count) is ≥500/week within one month
  • Scrubber catches all 10 poisoned-fixture tests; zero credential strings reach the brain in spot-check of 100 ingested entries
  • task check green; trunk-based commit chain to main

Branch policy

If Track E.2 is being worked on simultaneously, use branch agent/track-e1-claudewatcher per CLAUDE.md TBD parallel-agent rule. Merge to main when complete + task check green.

  • infra Track E umbrella
  • hyperguild#NN E.2 (parallel: brain_context tool)
Part of mathias/infra Track E umbrella. Captures the missing data source the volume gate exposed. ## Goal Scan `~/.claude/projects/*/<uuid>.jsonl` on each host, parse turns, scrub secrets, classify, ingest into brain. Per-host instance: koala, flamingo, iguana. ## Scope New package `ingestion/internal/claudewatcher/`: 1. **Watcher loop** — `claudewatcher.Watch(ctx, cfg)` polls Claude Code projects dir on configurable interval (default 60s). New or modified `.jsonl` files trigger parse. 2. **Parser** — reads JSONL incrementally, tracks last-ingested offset per file in postgres (new table `claude_session_cursors`). 3. **Scrubber** — runs each turn's content through the secret-hygiene rules (regex set: bearer tokens, postgres URIs, API keys, SOPS-encrypted blocks, `BEGIN PRIVATE KEY`). Fail-closed: if any rule matches, drop the turn + log a warning (don't try to redact + ingest). 4. **Classifier** — uses existing `brain_classify` tool. Filter: ingest only entries where `type` ∈ {`decision`, `lesson`, `correction`, `surprise`}. Skip routine `code` / `note` / `unknown` to keep brain signal high. 5. **Ingestor** — emits to `brain_ingest_raw` with structured pages: `{title, content, frontmatter: {wing: 'claude-sessions', hall: hall-by-type, session_id, project_root, host}}`. 6. **Wiring** — `cmd/server/main.go` starts the watcher when `CLAUDE_SESSIONS_DIR` env var is set. Defaults to `$HOME/.claude/projects`. `CLAUDE_INGEST_INTERVAL` controls poll (default 60s). 7. **Tests** — table-driven: parser handles partial JSONL, dedup cursor advances, scrubber catches 10 known-bad fixtures, classifier filter rules. ## Per-host deployment - **koala**: ingestion pod already runs here. Mount `/root/.claude/projects` (or wherever the act_runner / Claude Code session writes) as read-only into pod, set `CLAUDE_SESSIONS_DIR` to that mount. - **flamingo**: needs a small sidecar — either run a second ingestion-mode pod, or run `claudewatcher` as a launchd job that POSTs to brain HTTP API. - **iguana**: same shape as flamingo, launchd job. Flamingo + iguana wiring is **out of scope for this issue** — land koala first as proof-of-life, file follow-ups for the other two hosts. ## Acceptance - New skill_session, decision, correction entries appear in `brain/wiki/claude-sessions/<hall>/*.md` within 60s of a Claude session producing them on koala - Re-measured volume gate (`wc -l brain/sessions/*.jsonl` + classified-entries count) is ≥500/week within one month - Scrubber catches all 10 poisoned-fixture tests; zero credential strings reach the brain in spot-check of 100 ingested entries - `task check` green; trunk-based commit chain to main ## Branch policy If Track E.2 is being worked on simultaneously, use branch `agent/track-e1-claudewatcher` per CLAUDE.md TBD parallel-agent rule. Merge to main when complete + task check green. ## Related - infra Track E umbrella - hyperguild#NN E.2 (parallel: brain_context tool)
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mathias/hyperguild#27