docs(eval): record post-fix scorer run — phase 1 lift insufficient
Top-1 stayed at 20% (4/20), top-3 +5pt (65→70%) after:
- extract.go wing/topic parser fix (commit 3084c41)
- qwen35-9b-fast entity pad (was 239-byte stub → full entity)
- grafana entry: add "pod restart" synonym to lesson body
- dangling refs stripped from index.md + entities/k3s.md
The only retrieval move: qwen35-9b-fast climbed from rank 0 (off top-5)
to rank 2 — the entity pad worked. Other 5 misses are ranker behaviour
on already-keyword-overlapping entries; BM25 doesn't weight the right
slugs to the top.
Per the proposal's gate (≥10pt lift = stop, <10pt = Phase 2 justified),
the DIKW tier redesign earns its cost. Next session: tier column +
file moves + tier-weighted retrieval, then re-measure against this
same eval set.
This commit is contained in:
167
brain/eval/post-fix.txt
Normal file
167
brain/eval/post-fix.txt
Normal file
@@ -0,0 +1,167 @@
|
||||
# post-fix — 20 questions, k=5
|
||||
|
||||
top-1 hit rate: 4/20 = 20%
|
||||
top-3 hit rate: 14/20 = 70%
|
||||
|
||||
## per-question detail
|
||||
|
||||
· rank=3 expected=dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart
|
||||
q: how do I stop dex from logging users out on every pod restart?
|
||||
1. homelab-network-perimeter-model
|
||||
2. 2026-05-12-koala-machine-state
|
||||
3. dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart <-- expected
|
||||
4. infra-litellm-absorption-2026-05-16
|
||||
5. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
|
||||
|
||||
★ rank=1 expected=postgres-least-privilege-migration-tenant-grant-bypass-2026-05
|
||||
q: my postgres-exporter broke after revoking PUBLIC CONNECT — why?
|
||||
1. postgres-least-privilege-migration-tenant-grant-bypass-2026-05 <-- expected
|
||||
2. infra-litellm-absorption-2026-05-16
|
||||
3. brain-mcp-activation-runbook
|
||||
4. extension-version-lags-platform-major-upgrade
|
||||
5. ntfy-deny-all-rollout-ordering-keep-alert-pipeline-live-during-auth-flip
|
||||
|
||||
★ rank=1 expected=homelab-network-perimeter-model
|
||||
q: when is a NodePort acceptable vs needing a public ingress with bearer gate?
|
||||
1. homelab-network-perimeter-model <-- expected
|
||||
2. qwen3-thinking-model-empty-content-trap
|
||||
3. mcpclient-empty-token-silent-401-envfrom-missing-key
|
||||
4. 2026-05-12-koala-machine-state
|
||||
5. koala-llama-swap-native-tool-calls-survey-2026-05
|
||||
|
||||
· rank=3 expected=exit-255-unknown-reason-not-oom
|
||||
q: what does container exit code 255 with reason Unknown mean?
|
||||
1. qwen3-thinking-model-empty-content-trap
|
||||
2. infra-litellm-absorption-2026-05-16
|
||||
3. exit-255-unknown-reason-not-oom <-- expected
|
||||
4. mcpclient-empty-token-silent-401-envfrom-missing-key
|
||||
5. koala-llama-swap-native-tool-calls-survey-2026-05
|
||||
|
||||
· rank=3 expected=gitea-push-mirror-cannot-create-remote-repo-needs-pre-existing-github-repo
|
||||
q: can gitea push-mirror create the github repo automatically?
|
||||
1. infra-litellm-absorption-2026-05-16
|
||||
2. Autoresearch
|
||||
3. gitea-push-mirror-cannot-create-remote-repo-needs-pre-existing-github-repo <-- expected
|
||||
4. adr-new-project-gitea-first-github-mirror
|
||||
5. adr-github-as-primary-remote
|
||||
|
||||
✗ rank=0 expected=flux-healthcheck-stale-on-resource-removal
|
||||
q: a flux kustomization is stuck after I removed a resource — why?
|
||||
1. qwen3-thinking-model-empty-content-trap
|
||||
2. 2026-05-12-koala-machine-state
|
||||
3. homelab-architecture-principles-2026-05
|
||||
4. gitea-mcp: full stack shipped end-to-end (2026-05-05)
|
||||
5. k8s-configmap-mount-no-reload-needs-pod-restart
|
||||
|
||||
· rank=2 expected=go-bytes-buffer-bytes-reset-aliasing-trap
|
||||
q: the bytes buffer aliasing trap with Reset in a loop — what's the bug?
|
||||
1. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
|
||||
2. go-bytes-buffer-bytes-reset-aliasing-trap <-- expected
|
||||
3. homelab-security-chains-not-bugs
|
||||
4. training-on-rtx-5070-pretraining-vs-finetuning
|
||||
5. Hash Encoding
|
||||
|
||||
★ rank=1 expected=homelab-architecture-principles-2026-05
|
||||
q: what are the homelab architecture principles from may 2026?
|
||||
1. homelab-architecture-principles-2026-05 <-- expected
|
||||
2. homelab-network-perimeter-model
|
||||
3. Claude Managed Agents — architecture notes relevant to homelab agent platform
|
||||
4. homelab-core-glossary
|
||||
5. 2026-05-12-koala-machine-state
|
||||
|
||||
✗ rank=0 expected=2026-05-04-sops-age-key-from-flux-cluster
|
||||
q: where does the sops age private key live in the cluster?
|
||||
1. 2026-05-12-koala-machine-state
|
||||
2. homelab-network-perimeter-model
|
||||
3. postgres-least-privilege-migration-tenant-grant-bypass-2026-05
|
||||
4. brain-mcp-activation-runbook
|
||||
5. dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart
|
||||
|
||||
✗ rank=0 expected=grafana-dashboards-as-code-not-ui-state
|
||||
q: why do my grafana dashboards disappear after a pod restart?
|
||||
1. infra-litellm-absorption-2026-05-16
|
||||
2. 2026-05-12-koala-machine-state
|
||||
3. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
|
||||
4. brain-mcp-activation-runbook
|
||||
5. dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart
|
||||
|
||||
· rank=2 expected=double-diamond-methodology
|
||||
q: what is the double diamond methodology?
|
||||
1. Harnessing the Power of Hash Encoding for Categorical Data in Data Science
|
||||
2. double-diamond-methodology <-- expected
|
||||
3. unified-methodology-diamond-futures-autoresearch
|
||||
4. futures-thinking-extended-double-diamond
|
||||
5. insight-exploration-as-diamond-1
|
||||
|
||||
· rank=3 expected=2026-05-04-mcp-transport-version-claude-ai-strict
|
||||
q: my MCP server works from claude code but fails on claude.ai — what's different?
|
||||
1. qwen3-thinking-model-empty-content-trap
|
||||
2. mcp-resource-url-empty-breaks-claude-ai-discovery-silently
|
||||
3. 2026-05-04-mcp-transport-version-claude-ai-strict <-- expected
|
||||
4. 2026-05-04-claude-ai-custom-mcp-connectors
|
||||
5. finding-github-mcp-claudeai-vs-claudecode
|
||||
|
||||
· rank=2 expected=homelab-security-chains-not-bugs
|
||||
q: how should I rate security findings — isolated bugs or exploit chains?
|
||||
1. homelab-network-perimeter-model
|
||||
2. homelab-security-chains-not-bugs <-- expected
|
||||
3. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
|
||||
4. policy-audit-mode-blocks-nothing
|
||||
5. homelab-document-accepted-risk-to-break-audit-cycle
|
||||
|
||||
· rank=2 expected=2026-05-03-canonical-vs-derived-context-flow
|
||||
q: how should canonical context files relate to derived adapter files?
|
||||
1. qwen3-thinking-model-empty-content-trap
|
||||
2. 2026-05-03-canonical-vs-derived-context-flow <-- expected
|
||||
3. 2026-05-12-koala-machine-state
|
||||
4. 2026-05-04-claude-ai-custom-mcp-connectors
|
||||
5. koala-llama-swap-native-tool-calls-survey-2026-05
|
||||
|
||||
· rank=2 expected=homelab-core-glossary
|
||||
q: what is the homelab core vocabulary glossary?
|
||||
1. homelab-architecture-principles-2026-05
|
||||
2. homelab-core-glossary <-- expected
|
||||
3. Claude Managed Agents — architecture notes relevant to homelab agent platform
|
||||
4. 2026-05-12-koala-machine-state
|
||||
5. Autoresearch
|
||||
|
||||
★ rank=1 expected=koala-llama-swap-native-tool-calls-survey-2026-05
|
||||
q: which models on koala llama-swap actually emit native tool_calls correctly?
|
||||
1. koala-llama-swap-native-tool-calls-survey-2026-05 <-- expected
|
||||
2. 2026-05-12-koala-machine-state
|
||||
3. infra-litellm-absorption-2026-05-16
|
||||
4. training-on-rtx-5070-pretraining-vs-finetuning
|
||||
5. qwen3-thinking-model-empty-content-trap
|
||||
|
||||
· rank=2 expected=qwen35-9b-fast
|
||||
q: what is qwen35-9b-fast and what's it used for?
|
||||
1. koala-llama-swap-native-tool-calls-survey-2026-05
|
||||
2. qwen35-9b-fast <-- expected
|
||||
3. qwen3-thinking-model-empty-content-trap
|
||||
4. infra-litellm-absorption-2026-05-16
|
||||
5. 2026-05-12-koala-machine-state
|
||||
|
||||
✗ rank=0 expected=go-defer-errcheck-body-close
|
||||
q: in go, how do I prevent defer body close from silently dropping errors?
|
||||
1. infra-litellm-absorption-2026-05-16
|
||||
2. homelab-network-perimeter-model
|
||||
3. go-bytes-buffer-bytes-reset-aliasing-trap
|
||||
4. mcpclient-empty-token-silent-401-envfrom-missing-key
|
||||
5. brain-mcp-activation-runbook
|
||||
|
||||
✗ rank=0 expected=hyperguild-level3-pipeline-rewrite
|
||||
q: what was the level 3 rewrite of hyperguild's ingestion pipeline?
|
||||
1. 2026-05-12-koala-machine-state
|
||||
2. homelab-core-glossary
|
||||
3. brain-mcp-activation-runbook
|
||||
4. koala-llama-swap-native-tool-calls-survey-2026-05
|
||||
5. infra-litellm-absorption-2026-05-16
|
||||
|
||||
? rank=4 expected=adr-new-project-gitea-first-github-mirror
|
||||
q: what's the new-project ADR — is it gitea-first or github-first?
|
||||
1. gitea-push-mirror-cannot-create-remote-repo-needs-pre-existing-github-repo
|
||||
2. gitea-mcp: full stack shipped end-to-end (2026-05-05)
|
||||
3. mcp-tool-design-get-needs-list-partner
|
||||
4. adr-new-project-gitea-first-github-mirror <-- expected
|
||||
5. 2026-05-04-gitea-mcp-build-session
|
||||
|
||||
Reference in New Issue
Block a user