feat: brain tunnels — cross-wing concept links and embedding-based retrieval #2
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
This is the follow-up to #1 (Hall taxonomy). Once the Wing/Hall layout is stable, two retrieval gaps remain:
Cross-wing blindness — the same concept can appear in multiple Wings (e.g.
pex-copperin bothbathroom-plumbingandkoala-plumbing, orval-vol-r2in bothjepa-fxandhyperguild). Current retrieval treats Wings as isolated silos. There is no way to ask "what do I know about X across all Wings?" and get a ranked, deduplicated result.Term-frequency scoring is brittle —
search.Queryscores by raw term count. Synonyms, paraphrasing, and concept drift across sessions mean semantically relevant notes score zero and irrelevant notes score high. As the brain grows this degrades faster than linearly.This issue addresses both: Tunnels for explicit cross-wing links, and embedding-based retrieval as an opt-in replacement for term-frequency scoring.
Design
Tunnels
A Tunnel is a bidirectional wikilink between two notes in different Wings that share a concept. They are created in two ways:
Automatic — when
brain_writewrites a note withwing=A, it runs a lightweight concept-match pass over the new note's content against an in-memory index of existing Wing names and note titles. If a match is found in Wing B, it appends a## See alsosection with a wikilink to the matching note in Wing B, and appends a reciprocal link to that note.Manual — new MCP tool
brain_tunnelthat takessource(wing/hall/slug) andtarget(wing/hall/slug) and writes the bidirectional link explicitly.Tunnels are plain Obsidian wikilinks (
[[wing-b/hall/slug]]) — no special syntax, no database. Obsidian's graph view will show the cross-wing edges naturally.Embedding-based retrieval
Replace (or augment) term-frequency scoring in
search.Querywith cosine similarity over note embeddings, using the existing LiteLLM embedding endpoint on piguard.Storage — embeddings stored as a sidecar index at
brain/.embeddings/index.json— a flat map ofrelative_path → []float32. This file is gitignored (it's a derived artifact). Obsidian never sees it.Index maintenance — the watcher (already running at
INGEST_WATCH_INTERVAL) detects new/modified.mdfiles underbrain/wiki/and re-embeds them in the background. Initial full index built on first start ifbrain/.embeddings/index.jsonis absent.Query path — when
INGEST_EMBED_URLis set (pointing at piguard's embedding API),search.Queryembeds the query string and returns results ranked by cosine similarity. When unset, falls back to term-frequency (current behaviour). This makes embeddings opt-in with zero breaking changes.Hybrid scoring — when both term-frequency and embedding scores are available, combine with a configurable weight:
score = α * tf_score + (1-α) * embed_score. Defaultα=0.3(favour semantic). Configurable viaINGEST_HYBRID_ALPHA.Implementation
Tunnels
ingestion/internal/brain/tunnel.goAuto-tunnel runs after every
brain_writewithwing+hall. Candidates are written automatically only when confidence is high (exact title match). Fuzzy matches are written tobrain/raw/tunnel-candidates-<date>.mdfor human review.New MCP tool:
brain_tunnelEmbedding retrieval
ingestion/internal/embed/embed.goWatcher integration (
ingestion/internal/watcher/watcher.go)On each tick, diff the file list against the embedding index. For new/modified files, call
Index.Upsert. CallIndex.Saveafter each batch.search.QueryextensionNew env vars
INGEST_EMBED_URL""INGEST_EMBED_MODELtext-embedding-3-smallINGEST_EMBED_DIM1536INGEST_HYBRID_ALPHA0.3Acceptance criteria
Tunnels
brain_tunnel source=jepa-fx/decisions/val-vol-r2 target=hyperguild/decisions/routing-floorwrites wikilinks in both files and is idempotent on second callbrain_writecreates a link when an exact Wing/note title match is found in contentbrain/raw/tunnel-candidates-<date>.md, not written automaticallyEmbeddings
INGEST_EMBED_URLis unset, behaviour is identical to pre-issue (term-frequency, no regressions)brain_queryresults are ranked by hybrid scorebrain/.embeddings/index.jsonis populated on watcher tick for new notesbrain_querywithwingfilter still scopes embedding search to that Wing's paths onlyGeneral
Dependencies
llm.ClientpatternBranch
feat/brain-tunnels-and-embeddingsfromfeat/brain-halls(rebase onto main after #1 merges)Out of scope
Created via git-mcp on behalf of @mathiasbq
Restructuring this. Two reasons:
Embedding section conflicts with #8. This issue specifies a flat-JSON sidecar (
brain/.embeddings/index.json) as the vector store. DECISIONS.md (2026-04-08) commits to Qdrant for vectors, and #8 plans the hybrid BM25+Qdrant+nomic-embed-text path properly. Two parallel embedding designs in flight is worse than one.Tunnels are orthogonal to retrieval. Cross-wing wikilinks are a structural / navigation feature; embedding retrieval is a scoring feature. Bundling them was a packaging accident.
Plan:
Closing this one as restructured.