Tier-weighted retrieval against the qa-2026-05.md 20-question set:
| run | top-1 | top-3 |
|--------------------------------|-------|-------|
| baseline (pre-phase-1) | 20% | 65% |
| post phase 1 (parser+content) | 20% | 70% |
| post M4 (tier weighting) | 30% | 75% |
| post M4b (entities → K tier) | 35% | 80% |
Net Phase 2 lift: +15pt top-1, +15pt top-3 — comfortably above the
≥10pt close-gate set in infra#72.
Three remaining misses are content-keyword issues, not structure
issues (the questions don't share enough lexical surface with the
target entries to surface via BM25 alone). Vector search would
help here but the iguana embedder is off-mesh (see infra#64).
Top-1 stayed at 20% (4/20), top-3 +5pt (65→70%) after:
- extract.go wing/topic parser fix (commit 3084c41)
- qwen35-9b-fast entity pad (was 239-byte stub → full entity)
- grafana entry: add "pod restart" synonym to lesson body
- dangling refs stripped from index.md + entities/k3s.md
The only retrieval move: qwen35-9b-fast climbed from rank 0 (off top-5)
to rank 2 — the entity pad worked. Other 5 misses are ranker behaviour
on already-keyword-overlapping entries; BM25 doesn't weight the right
slugs to the top.
Per the proposal's gate (≥10pt lift = stop, <10pt = Phase 2 justified),
the DIKW tier redesign earns its cost. Next session: tier column +
file moves + tier-weighted retrieval, then re-measure against this
same eval set.
classifyByPath had a hole: paths like wiki/index.md or wiki/<slug>.md
(direct children of wiki/, no subdirectory) hit the default branch and
wrote Wing=parts[1] — which IS the filename, not a wing. Symptom in
brain_entities: rows like (slug=index, wing=index.md) and
(slug=autobe-..., wing=autobe-evaluation-pattern-....md).
Fix: when len(parts) < 3 (no subdirectory at all), fall through to
Type=knowledge and let frontmatter set wing/hall if present.
Add brain/eval/ artifacts at the same time:
- qa-2026-05.md — 20 hand-authored Q→expected-slug pairs covering the
homelab knowledge corpus across mcp, dex, gitops, postgres, go,
models, methodology
- score.py — calls brain_query for each pair, scores top-1 + top-3,
emits per-question detail. BRAIN_MCP_TOKEN via env.
Pre-fix baseline against the live brain: top-1 = 20% (4/20),
top-3 = 65% (13/20). Six hard misses where the expected slug doesn't
even land in the top-5.
Used to gate the phase 2 DIKW redesign (infra#62 follow-up): if
phase 1 fixes (this parser fix + 20 backlink authoring on top
orphans) lift top-1 by <10 absolute points, structure is the
bottleneck and the tier redesign is justified.