-
v0.8.0 — chunk-before-embed for arbitrarily long markdown (#38)
released this
2026-05-19 20:00:05 +00:00 | 24 commits to main since this releaseChunkMarkdown splits at H1/H2 boundaries, sub-splits oversized sections
at paragraph boundaries with greedy packing under maxChunkBytes=4000
(≈1000 nomic tokens — well under the 2048 ceiling).Storage: each chunk lives at "#NNNN" in brain_embeddings, 1-based
4-digit zero-padded for stable sort order. No schema change.Retrieval: hybridMerge collapses chunk-path vector hits to parent via
ParentPath before scope check, RRF accumulation, and hydration. Three
chunk hits → one result row.Backward compatibility: pre-existing bare-path rows in brain_embeddings
keep working — ParentPath is a no-op for them. No migration needed.First production sync after deploy hit added=32 deleted=0 errors=0 —
first errors=0 cycle in days. The three previously-failing files now
have 9 / 11 / 12 chunks each, all retrievable via brain_query.Closes infra#38.
Downloads