chore(brain): backfill pgvector from current wiki/ #18

Closed
opened 2026-05-19 11:10:30 +00:00 by mathias · 1 comment
Owner

Context

Hybrid retrieval (hyperguild#8) shipped and is active in the live pod (image 4af10364). The brain DB is bootstrapped, the embedder is wired, the sync loop runs every 300s — but the existing wiki/ corpus has never been embedded. Until backfill runs, brain_query cosine results stay empty and Reciprocal Rank Fusion silently reduces to BM25-only.

Action

curl -s -X POST \
  -H "Authorization: Bearer $BRAIN_MCP_TOKEN" \
  https://brain-mcp.d-ma.be/backfill-embeddings | jq

Expected response:

{"added": N, "deleted": 0, "errors": []}

Run from anywhere reachable — endpoint sits behind the static Bearer.

Verification

# Row count should match `find brain/wiki -name '*.md' -not -name '_index.md' | wc -l`
kubectl exec -n databases postgres18-0 -- \
  psql -U postgres -d brain -c "SELECT count(*) FROM brain_embeddings;"

# Smoke a hybrid query
curl -s -X POST -H "Authorization: Bearer $BRAIN_MCP_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query":"static Bearer","limit":5}' \
  https://brain-mcp.d-ma.be/query | jq

Acceptance criteria

  • /backfill-embeddings returns {added>0, errors:[]} (or non-empty errors documented)
  • brain_embeddings row count matches the wiki/ markdown file count (minus _index.md)
  • A query whose phrasing only matches semantically (no shared keywords) surfaces the right note

Out of scope

Re-embedding on file edit — tracked separately.

## Context Hybrid retrieval (hyperguild#8) shipped and is active in the live pod (image `4af10364`). The brain DB is bootstrapped, the embedder is wired, the sync loop runs every 300s — but the existing wiki/ corpus has never been embedded. Until backfill runs, `brain_query` cosine results stay empty and Reciprocal Rank Fusion silently reduces to BM25-only. ## Action ```bash curl -s -X POST \ -H "Authorization: Bearer $BRAIN_MCP_TOKEN" \ https://brain-mcp.d-ma.be/backfill-embeddings | jq ``` Expected response: ```json {"added": N, "deleted": 0, "errors": []} ``` Run from anywhere reachable — endpoint sits behind the static Bearer. ## Verification ```bash # Row count should match `find brain/wiki -name '*.md' -not -name '_index.md' | wc -l` kubectl exec -n databases postgres18-0 -- \ psql -U postgres -d brain -c "SELECT count(*) FROM brain_embeddings;" # Smoke a hybrid query curl -s -X POST -H "Authorization: Bearer $BRAIN_MCP_TOKEN" \ -H "Content-Type: application/json" \ -d '{"query":"static Bearer","limit":5}' \ https://brain-mcp.d-ma.be/query | jq ``` ## Acceptance criteria - [ ] `/backfill-embeddings` returns `{added>0, errors:[]}` (or non-empty errors documented) - [ ] `brain_embeddings` row count matches the wiki/ markdown file count (minus `_index.md`) - [ ] A query whose phrasing only matches semantically (no shared keywords) surfaces the right note ## Out of scope Re-embedding on file edit — tracked separately.
Author
Owner

Superseded. Backfill is no longer needed:

  • Initial state at issue filing: 0 wiki/* rows in brain_embeddings.
  • After infra#37 part 1 (commit 078ec02, hyperguild v0.7.0): sync walks wiki/ + knowledge/141 rows embedded.
  • After infra#38 (commit 37fdd33, hyperguild v0.8.0): chunk-before-embed covers the remaining 3 oversized files → 173 rows total, errors=0.

The /backfill-embeddings endpoint still exists for emergencies; not currently needed.

$ kubectl exec -n databases statefulset/postgres18 -- \
    psql -U brain_app -d brain -c "SELECT count(*) FROM brain_embeddings;"
 count
-------
   173

Closes.

Superseded. Backfill is no longer needed: - Initial state at issue filing: 0 wiki/* rows in `brain_embeddings`. - After infra#37 part 1 (commit `078ec02`, hyperguild v0.7.0): sync walks `wiki/` + `knowledge/` → **141 rows** embedded. - After infra#38 (commit `37fdd33`, hyperguild v0.8.0): chunk-before-embed covers the remaining 3 oversized files → **173 rows total, errors=0**. The `/backfill-embeddings` endpoint still exists for emergencies; not currently needed. ``` $ kubectl exec -n databases statefulset/postgres18 -- \ psql -U brain_app -d brain -c "SELECT count(*) FROM brain_embeddings;" count ------- 173 ``` Closes.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mathias/hyperguild#18