hyperguild

mathias/hyperguild

Fork 0

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Mathias	2b7bbe38c7	docs(eval): record M4 + M4b scorer runs — phase 2 gate cleared (infra#72) All checks were successful CI / Lint / Test / Vet (push) Successful in 11s Details CI / Mirror to GitHub (push) Successful in 4s Details Tier-weighted retrieval against the qa-2026-05.md 20-question set: \| run \| top-1 \| top-3 \| \|--------------------------------\|-------\|-------\| \| baseline (pre-phase-1) \| 20% \| 65% \| \| post phase 1 (parser+content) \| 20% \| 70% \| \| post M4 (tier weighting) \| 30% \| 75% \| \| post M4b (entities → K tier) \| 35% \| 80% \| Net Phase 2 lift: +15pt top-1, +15pt top-3 — comfortably above the ≥10pt close-gate set in infra#72. Three remaining misses are content-keyword issues, not structure issues (the questions don't share enough lexical surface with the target entries to surface via BM25 alone). Vector search would help here but the iguana embedder is off-mesh (see infra#64).	2026-05-25 18:51:29 +02:00
Mathias	e34cd6c12b	docs(eval): record post-fix scorer run — phase 1 lift insufficient All checks were successful CI / Lint / Test / Vet (push) Successful in 12s Details CI / Mirror to GitHub (push) Successful in 4s Details Top-1 stayed at 20% (4/20), top-3 +5pt (65→70%) after: - extract.go wing/topic parser fix (commit `3084c41`) - qwen35-9b-fast entity pad (was 239-byte stub → full entity) - grafana entry: add "pod restart" synonym to lesson body - dangling refs stripped from index.md + entities/k3s.md The only retrieval move: qwen35-9b-fast climbed from rank 0 (off top-5) to rank 2 — the entity pad worked. Other 5 misses are ranker behaviour on already-keyword-overlapping entries; BM25 doesn't weight the right slugs to the top. Per the proposal's gate (≥10pt lift = stop, <10pt = Phase 2 justified), the DIKW tier redesign earns its cost. Next session: tier column + file moves + tier-weighted retrieval, then re-measure against this same eval set.	2026-05-24 22:48:48 +02:00
Mathias	3084c4173d	fix(graph): route wiki/<flat>.md to Type=knowledge, not Type=hall with filename-as-wing All checks were successful CI / Lint / Test / Vet (push) Successful in 12s Details CI / Mirror to GitHub (push) Successful in 4s Details classifyByPath had a hole: paths like wiki/index.md or wiki/<slug>.md (direct children of wiki/, no subdirectory) hit the default branch and wrote Wing=parts[1] — which IS the filename, not a wing. Symptom in brain_entities: rows like (slug=index, wing=index.md) and (slug=autobe-..., wing=autobe-evaluation-pattern-....md). Fix: when len(parts) < 3 (no subdirectory at all), fall through to Type=knowledge and let frontmatter set wing/hall if present. Add brain/eval/ artifacts at the same time: - qa-2026-05.md — 20 hand-authored Q→expected-slug pairs covering the homelab knowledge corpus across mcp, dex, gitops, postgres, go, models, methodology - score.py — calls brain_query for each pair, scores top-1 + top-3, emits per-question detail. BRAIN_MCP_TOKEN via env. Pre-fix baseline against the live brain: top-1 = 20% (4/20), top-3 = 65% (13/20). Six hard misses where the expected slug doesn't even land in the top-5. Used to gate the phase 2 DIKW redesign (infra#62 follow-up): if phase 1 fixes (this parser fix + 20 backlink authoring on top orphans) lift top-1 by <10 absolute points, structure is the bottleneck and the tier redesign is justified.	2026-05-24 22:33:04 +02:00

Mathias

2b7bbe38c7

docs(eval): record M4 + M4b scorer runs — phase 2 gate cleared (infra#72)

CI / Lint / Test / Vet (push) Successful in 11s

Details

CI / Mirror to GitHub (push) Successful in 4s

Details

Tier-weighted retrieval against the qa-2026-05.md 20-question set:

| run                            | top-1 | top-3 |
|--------------------------------|-------|-------|
| baseline (pre-phase-1)         | 20%   | 65%   |
| post phase 1 (parser+content)  | 20%   | 70%   |
| post M4 (tier weighting)       | 30%   | 75%   |
| post M4b (entities → K tier)   | 35%   | 80%   |

Net Phase 2 lift: +15pt top-1, +15pt top-3 — comfortably above the
≥10pt close-gate set in infra#72.

Three remaining misses are content-keyword issues, not structure
issues (the questions don't share enough lexical surface with the
target entries to surface via BM25 alone). Vector search would
help here but the iguana embedder is off-mesh (see infra#64).

2026-05-25 18:51:29 +02:00

Mathias

e34cd6c12b

docs(eval): record post-fix scorer run — phase 1 lift insufficient

CI / Lint / Test / Vet (push) Successful in 12s

Details

CI / Mirror to GitHub (push) Successful in 4s

Details

Top-1 stayed at 20% (4/20), top-3 +5pt (65→70%) after:
- extract.go wing/topic parser fix (commit 3084c41)
- qwen35-9b-fast entity pad (was 239-byte stub → full entity)
- grafana entry: add "pod restart" synonym to lesson body
- dangling refs stripped from index.md + entities/k3s.md

The only retrieval move: qwen35-9b-fast climbed from rank 0 (off top-5)
to rank 2 — the entity pad worked. Other 5 misses are ranker behaviour
on already-keyword-overlapping entries; BM25 doesn't weight the right
slugs to the top.

Per the proposal's gate (≥10pt lift = stop, <10pt = Phase 2 justified),
the DIKW tier redesign earns its cost. Next session: tier column +
file moves + tier-weighted retrieval, then re-measure against this
same eval set.

2026-05-24 22:48:48 +02:00

Mathias

3084c4173d

fix(graph): route wiki/<flat>.md to Type=knowledge, not Type=hall with filename-as-wing

CI / Lint / Test / Vet (push) Successful in 12s

Details

CI / Mirror to GitHub (push) Successful in 4s

Details

classifyByPath had a hole: paths like wiki/index.md or wiki/<slug>.md
(direct children of wiki/, no subdirectory) hit the default branch and
wrote Wing=parts[1] — which IS the filename, not a wing. Symptom in
brain_entities: rows like (slug=index, wing=index.md) and
(slug=autobe-..., wing=autobe-evaluation-pattern-....md).

Fix: when len(parts) < 3 (no subdirectory at all), fall through to
Type=knowledge and let frontmatter set wing/hall if present.

Add brain/eval/ artifacts at the same time:
- qa-2026-05.md — 20 hand-authored Q→expected-slug pairs covering the
  homelab knowledge corpus across mcp, dex, gitops, postgres, go,
  models, methodology
- score.py — calls brain_query for each pair, scores top-1 + top-3,
  emits per-question detail. BRAIN_MCP_TOKEN via env.

Pre-fix baseline against the live brain: top-1 = 20% (4/20),
top-3 = 65% (13/20). Six hard misses where the expected slug doesn't
even land in the top-5.

Used to gate the phase 2 DIKW redesign (infra#62 follow-up): if
phase 1 fixes (this parser fix + 20 backlink authoring on top
orphans) lift top-1 by <10 absolute points, structure is the
bottleneck and the tier redesign is justified.

2026-05-24 22:33:04 +02:00

3 Commits