feat(brain): re-embed on file edit (Sync should respect mtime) #23

Closed
opened 2026-05-19 11:11:25 +00:00 by mathias · 1 comment
Owner

Context

vectorstore.Sync currently only embeds files it has never seen — once a path appears in brain_embeddings, edits to the underlying .md are invisible to the vector store. The code has a TODO marker for this:

// internal/vectorstore/sync.go
if _, ok := known[relSlash]; ok {
    // Already embedded — TODO: compare mtime once Store exposes
    // updated_at so we re-embed on edit. For now, skip.
    return nil
}

Result: any note revised after its first sync silently drifts from its embedding.

Proposed change

  1. Extend Store to expose KnownPathsWithMtime() map[string]time.Time — returns each path's updated_at from brain_embeddings. (Or KnownPaths returns a richer struct; small API choice.)
  2. In Sync, compare each file's mtime against the store's updated_at. If newer → re-embed + upsert.
  3. Cover with a unit test that mutates a file's mtime after first sync and asserts a second Upsert is invoked.

Acceptance criteria

  • Sync re-embeds files whose mtime > updated_at
  • Adds/deletes paths still behave as today
  • Unit test added (probably via stubStore with controllable known-mtime map)
  • No new round-trips when nothing changed (the index pass should be cheap)

Out of scope

Content-hash invalidation (mtime is enough for the brain's edit pattern — Syncthing preserves it).

## Context `vectorstore.Sync` currently only embeds files it has never seen — once a path appears in `brain_embeddings`, edits to the underlying `.md` are invisible to the vector store. The code has a TODO marker for this: ```go // internal/vectorstore/sync.go if _, ok := known[relSlash]; ok { // Already embedded — TODO: compare mtime once Store exposes // updated_at so we re-embed on edit. For now, skip. return nil } ``` Result: any note revised after its first sync silently drifts from its embedding. ## Proposed change 1. Extend `Store` to expose `KnownPathsWithMtime() map[string]time.Time` — returns each path's `updated_at` from `brain_embeddings`. (Or `KnownPaths` returns a richer struct; small API choice.) 2. In `Sync`, compare each file's `mtime` against the store's `updated_at`. If newer → re-embed + upsert. 3. Cover with a unit test that mutates a file's mtime after first sync and asserts a second Upsert is invoked. ## Acceptance criteria - [ ] `Sync` re-embeds files whose `mtime > updated_at` - [ ] Adds/deletes paths still behave as today - [ ] Unit test added (probably via stubStore with controllable known-mtime map) - [ ] No new round-trips when nothing changed (the index pass should be cheap) ## Out of scope Content-hash invalidation (mtime is enough for the brain's edit pattern — Syncthing preserves it).
Author
Owner

Shipped in commit 8157397.

Approach

Store interface evolved from KnownPaths() map[string]struct{}KnownPathsWithTime() map[string]time.Time. PGStore: SELECT path, updated_at FROM brain_embeddings. Sync groups chunks by parent and tracks the earliest updated_at per parent — if a file's mtime is after that, at least one chunk is stale, so the file is re-embedded.

Re-embed path deletes every old chunk for the parent first, then re-chunks + re-embeds + re-upserts. Handles shrunk files cleanly (no orphan #NNNN rows at higher indexes).

Tests

TestSync_ReembedsFileWhenMtimeNewer  // mtime > store updated_at → re-embed + old chunks deleted
TestSync_SkipsFileWhenMtimeOlder     // mtime < store updated_at → no-op

Existing 15 tests updated for the new stub signature (stubStore.known is now map[string]time.Time; zero values default to a far-future sentinel so "skip if already known" tests keep passing without per-test setup).

Backward compatibility

brain_embeddings rows pre-dating this change carry valid updated_at values — the column was always populated via DEFAULT now() + ON CONFLICT ... updated_at = now() on every upsert. No schema migration. Live pod will start re-embedding any file whose source has been edited since its chunks were originally written.

Acceptance

  • Sync re-embeds files whose mtime > updated_at
  • Adds/deletes paths still behave as before
  • Unit tests added (stubStore with controllable known-mtime map)
  • No new round-trips when nothing changed (single SELECT path, updated_at per cycle replaces single SELECT path)

Closes.

Shipped in commit `8157397`. ## Approach Store interface evolved from `KnownPaths() map[string]struct{}` → `KnownPathsWithTime() map[string]time.Time`. PGStore: `SELECT path, updated_at FROM brain_embeddings`. Sync groups chunks by parent and tracks the **earliest** updated_at per parent — if a file's mtime is after that, at least one chunk is stale, so the file is re-embedded. Re-embed path deletes every old chunk for the parent first, then re-chunks + re-embeds + re-upserts. Handles shrunk files cleanly (no orphan `#NNNN` rows at higher indexes). ## Tests ```go TestSync_ReembedsFileWhenMtimeNewer // mtime > store updated_at → re-embed + old chunks deleted TestSync_SkipsFileWhenMtimeOlder // mtime < store updated_at → no-op ``` Existing 15 tests updated for the new stub signature (`stubStore.known` is now `map[string]time.Time`; zero values default to a far-future sentinel so "skip if already known" tests keep passing without per-test setup). ## Backward compatibility `brain_embeddings` rows pre-dating this change carry valid `updated_at` values — the column was always populated via `DEFAULT now()` + `ON CONFLICT ... updated_at = now()` on every upsert. No schema migration. Live pod will start re-embedding any file whose source has been edited since its chunks were originally written. ## Acceptance - [x] Sync re-embeds files whose `mtime > updated_at` - [x] Adds/deletes paths still behave as before - [x] Unit tests added (stubStore with controllable known-mtime map) - [x] No new round-trips when nothing changed (single `SELECT path, updated_at` per cycle replaces single `SELECT path`) Closes.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mathias/hyperguild#23