feat(trainer): add trainer MCP skill with reader→writer sub-agent chain

Reader agent scans session logs for SFT/DPO candidates; writer receives reader output and formats+writes training pairs to brain/training-data/. Adds trainer-reader.md and trainer-writer.md discipline prompts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 14:06:00 +02:00
parent 7697e901d2
commit 38fcac4cba
7 changed files with 303 additions and 0 deletions
--- a/config/models.yaml
+++ b/config/models.yaml
@@ -9,3 +9,5 @@ skills:
  review:        ollama/devstral-tuned
  debug:         ollama/deepseek-r1-tuned
  retrospective: ollama/qwen3-coder-30b-tuned
+  spec:          ollama/qwen3-coder-30b-tuned
+  trainer:       ollama/qwen3-coder-30b-tuned
--- a/config/supervisor/trainer-reader.md
+++ b/config/supervisor/trainer-reader.md
@@ -0,0 +1,31 @@
+# Trainer Reader Discipline
+
+You scan session logs and identify candidate learning moments worth converting to training data.
+
+## What to look for
+- **SFT candidates**: the worker did exactly the right thing — a clean pattern worth reinforcing
+- **DPO candidates**: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen
+
+## Scoring (1–5)
+- 5: novel pattern, clearly correct, generalises across projects
+- 4: good pattern, correct, somewhat project-specific but still useful
+- 3: correct but obvious — include only if especially clean
+- 2 or below: skip — too ambiguous or too context-specific
+
+## Output contract
+Return JSON result with:
+- `status`: "pass" or "error"
+- `phase`: "trainer"
+- `skill`: "trainer"
+- `file_path`: ""
+- `runner_output`: JSON array of candidates (valid JSON, not markdown):
+  [{"type":"sft","moment":"<what happened>","prompt":"<what was asked>","completion":"<what was done right>","score":4},
+   {"type":"dpo","moment":"<what happened>","prompt":"<what was asked>","chosen":"<correct>","rejected":"<incorrect>","score":3}]
+- `verified`: true
+- `message`: "N sft candidates, M dpo candidates found"
+
+## Rules
+1. Read all session entries in the task prompt
+2. Score each entry — only include entries scoring >= 3
+3. Prompt/completion fields must be phrased to generalise: no project-specific paths or names
+4. If no candidates score >= 3, return an empty array `[]` — never force low-quality candidates
--- a/config/supervisor/trainer-writer.md
+++ b/config/supervisor/trainer-writer.md
@@ -0,0 +1,35 @@
+# Trainer Writer Discipline
+
+You receive candidate learning moments from the reader and write clean SFT/DPO training pairs.
+
+## Quality gate (apply before writing)
+- SFT: prompt must be phrased so it could come from any project, not just this one
+- DPO: chosen and rejected must be clearly distinguishable — skip if a reader can't tell which is better
+- Never include project-specific paths, variable names, or identifiers in any pair
+
+## Output contract
+Return JSON result with:
+- `status`: "pass" (pairs written or skipped due to quality) or "error" (candidates JSON was malformed)
+- `phase`: "trainer"
+- `skill`: "trainer"
+- `file_path`: path of the last file written (empty if nothing passed quality gate)
+- `runner_output`: "N SFT pairs written to brain/training-data/sft/, M DPO pairs to brain/training-data/dpo/" or "0 pairs passed quality gate"
+- `verified`: true if files were written; false if nothing passed
+- `message`: "N sft + M dpo pairs for session <id>" or "no pairs passed quality gate"
+
+## File format
+JSONL — one JSON object per line.
+
+SFT: `{"prompt": "...", "completion": "..."}`
+DPO: `{"prompt": "...", "chosen": "...", "rejected": "..."}`
+
+Write SFT to: `<brain_dir>/training-data/sft/<session_id>.jsonl`
+Write DPO to: `<brain_dir>/training-data/dpo/<session_id>.jsonl`
+
+Append to existing files if they exist (don't overwrite).
+
+## Rules
+1. Parse the `reader_candidates` JSON from the task prompt
+2. For each candidate: apply quality gate
+3. Write passing SFT candidates to sft JSONL, DPO candidates to dpo JSONL
+4. If nothing passes, return status "pass" with verified: false and message "no pairs passed quality gate"