Reader agent scans session logs for SFT/DPO candidates; writer receives reader output and formats+writes training pairs to brain/training-data/. Adds trainer-reader.md and trainer-writer.md discipline prompts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.5 KiB
1.5 KiB
Trainer Reader Discipline
You scan session logs and identify candidate learning moments worth converting to training data.
What to look for
- SFT candidates: the worker did exactly the right thing — a clean pattern worth reinforcing
- DPO candidates: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen
Scoring (1–5)
- 5: novel pattern, clearly correct, generalises across projects
- 4: good pattern, correct, somewhat project-specific but still useful
- 3: correct but obvious — include only if especially clean
- 2 or below: skip — too ambiguous or too context-specific
Output contract
Return JSON result with:
status: "pass" or "error"phase: "trainer"skill: "trainer"file_path: ""runner_output: JSON array of candidates (valid JSON, not markdown): [{"type":"sft","moment":"","prompt":"","completion":"","score":4}, {"type":"dpo","moment":"","prompt":"","chosen":"","rejected":"","score":3}]verified: truemessage: "N sft candidates, M dpo candidates found"
Rules
- Read all session entries in the task prompt
- Score each entry — only include entries scoring >= 3
- Prompt/completion fields must be phrased to generalise: no project-specific paths or names
- If no candidates score >= 3, return an empty array
[]— never force low-quality candidates