Tier-weighted retrieval against the qa-2026-05.md 20-question set:
| run | top-1 | top-3 |
|--------------------------------|-------|-------|
| baseline (pre-phase-1) | 20% | 65% |
| post phase 1 (parser+content) | 20% | 70% |
| post M4 (tier weighting) | 30% | 75% |
| post M4b (entities → K tier) | 35% | 80% |
Net Phase 2 lift: +15pt top-1, +15pt top-3 — comfortably above the
≥10pt close-gate set in infra#72.
Three remaining misses are content-keyword issues, not structure
issues (the questions don't share enough lexical surface with the
target entries to surface via BM25 alone). Vector search would
help here but the iguana embedder is off-mesh (see infra#64).
Top-1 stayed at 20% (4/20), top-3 +5pt (65→70%) after:
- extract.go wing/topic parser fix (commit 3084c41)
- qwen35-9b-fast entity pad (was 239-byte stub → full entity)
- grafana entry: add "pod restart" synonym to lesson body
- dangling refs stripped from index.md + entities/k3s.md
The only retrieval move: qwen35-9b-fast climbed from rank 0 (off top-5)
to rank 2 — the entity pad worked. Other 5 misses are ranker behaviour
on already-keyword-overlapping entries; BM25 doesn't weight the right
slugs to the top.
Per the proposal's gate (≥10pt lift = stop, <10pt = Phase 2 justified),
the DIKW tier redesign earns its cost. Next session: tier column +
file moves + tier-weighted retrieval, then re-measure against this
same eval set.
classifyByPath had a hole: paths like wiki/index.md or wiki/<slug>.md
(direct children of wiki/, no subdirectory) hit the default branch and
wrote Wing=parts[1] — which IS the filename, not a wing. Symptom in
brain_entities: rows like (slug=index, wing=index.md) and
(slug=autobe-..., wing=autobe-evaluation-pattern-....md).
Fix: when len(parts) < 3 (no subdirectory at all), fall through to
Type=knowledge and let frontmatter set wing/hall if present.
Add brain/eval/ artifacts at the same time:
- qa-2026-05.md — 20 hand-authored Q→expected-slug pairs covering the
homelab knowledge corpus across mcp, dex, gitops, postgres, go,
models, methodology
- score.py — calls brain_query for each pair, scores top-1 + top-3,
emits per-question detail. BRAIN_MCP_TOKEN via env.
Pre-fix baseline against the live brain: top-1 = 20% (4/20),
top-3 = 65% (13/20). Six hard misses where the expected slug doesn't
even land in the top-5.
Used to gate the phase 2 DIKW redesign (infra#62 follow-up): if
phase 1 fixes (this parser fix + 20 backlink authoring on top
orphans) lift top-1 by <10 absolute points, structure is the
bottleneck and the tier redesign is justified.
- Change prompt to reflect new output format: title, type, subtype, domain, content
- Remove slug/path generation responsibility from LLM — pipeline now handles it
- Wikilinks change from [[slug|Display Name]] to [[Display Name]] only
- LLM no longer includes frontmatter or paths in output
docs(schema): update LLM output format and wikilink convention for Level 3
- Specify JSON schema: title, type, subtype, domain, content fields
- Remove frontmatter requirements from schema output (handled by pipeline)
- Simplify wikilink format to [[Display Name]] — no slug or pipe
- Pipeline now responsible for slug generation and frontmatter construction
These changes shift slug/frontmatter generation from LLM to pipeline,
reducing cognitive load on the model and improving control over output.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLAUDE.md has a specific meaning in the Claude Code ecosystem (agent
instructions). The wiki schema for the ingestion pipeline should live
in schema.md to avoid confusion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 8 MCP tools verified (tdd_red, tdd_green, tdd_refactor, brain_query,
brain_write, tier, session_log, retrospective). Ingestion write/query,
brain_query, tier, and session_log all return correct responses end-to-end.
Brain note written during smoke test committed to raw/ and wiki/concepts/.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>