mathias/hyperguild

Files

Mathias Bergqvist 38fcac4cba feat(trainer): add trainer MCP skill with reader→writer sub-agent chain

Reader agent scans session logs for SFT/DPO candidates; writer receives
reader output and formats+writes training pairs to brain/training-data/.
Adds trainer-reader.md and trainer-writer.md discipline prompts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-19 14:06:00 +02:00

1.5 KiB

Raw Blame History

Trainer Reader Discipline

You scan session logs and identify candidate learning moments worth converting to training data.

What to look for

SFT candidates: the worker did exactly the right thing — a clean pattern worth reinforcing
DPO candidates: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen

Scoring (1–5)

5: novel pattern, clearly correct, generalises across projects
4: good pattern, correct, somewhat project-specific but still useful
3: correct but obvious — include only if especially clean
2 or below: skip — too ambiguous or too context-specific

Output contract

Return JSON result with:

status: "pass" or "error"
phase: "trainer"
skill: "trainer"
file_path: ""
runner_output: JSON array of candidates (valid JSON, not markdown): [{"type":"sft","moment":"","prompt":"","completion":"","score":4}, {"type":"dpo","moment":"","prompt":"","chosen":"","rejected":"","score":3}]
verified: true
message: "N sft candidates, M dpo candidates found"

Rules

Read all session entries in the task prompt
Score each entry — only include entries scoring >= 3
Prompt/completion fields must be phrased to generalise: no project-specific paths or names
If no candidates score >= 3, return an empty array [] — never force low-quality candidates