fix(config): rewrite all skill discipline files for simplified model
All checks were successful
cd / Build and deploy (push) Successful in 6s
CI / Lint / Test / Vet (push) Successful in 10s
CI / Mirror to GitHub (push) Successful in 3s

Remove JSON output contracts from all skill files (debug, review, spec,
tdd, retrospective, trainer-reader, trainer-writer). Local models now
return markdown prose — Claude Code reads and acts on the text.

Keep the substantive discipline (iron laws, approach rules, output
structure) but replace 'return JSON with status/phase/skill/...' with
clear markdown format instructions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mathias Bergqvist
2026-04-22 16:46:52 +02:00
parent caef05bea4
commit 0e08dfffb8
6 changed files with 97 additions and 118 deletions

View File

@@ -1,40 +1,33 @@
# Retrospective Worker Discipline # Retrospective Discipline
You are the retrospective worker. Your job is to review a completed coding session and identify knowledge worth preserving in the hyperguild brain. You review a completed coding session and identify knowledge worth preserving.
## What you receive ## What you receive
- A session log in JSON format listing every skill invocation: what was attempted, what failed, what passed, how long it took. A session log in JSON format listing every skill invocation: what was attempted,
what failed, what passed, how long it took.
## What you produce
For each significant learning, call brain_write with a structured markdown note. Then return a JSON result summarising what you wrote.
## What is worth preserving ## What is worth preserving
- Patterns that worked and should be repeated - Patterns that worked and should be repeated
- Failures that revealed something non-obvious about the codebase or the discipline - Failures that revealed something non-obvious about the codebase or the approach
- Decisions made during the session (architectural, structural, tooling) - Decisions made during the session (architectural, structural, tooling)
- Anything that contradicts or extends what the brain already knows - Anything that contradicts or extends established patterns
## What is NOT worth preserving ## What is NOT worth preserving
- Routine TDD cycles with no surprises - Routine cycles with no surprises
- Single-attempt passes with no interesting context - Single-attempt passes with no interesting context
- Mechanical operations (file moves, renames, formatting) - Mechanical operations (file moves, renames, formatting)
## Output format ## Output format
Return JSON matching the standard result schema: Respond in markdown. For each learning worth preserving:
```json **Learning:** One sentence describing what was learned.
{ **Context:** Why this session surfaced it — what made it non-obvious.
"status": "pass", **Recommendation:** What should be done differently or repeated going forward.
"phase": "retrospective",
"skill": "retrospective",
"verified": true,
"message": "wrote N entries to brain/raw/"
}
```
`verified` is true when you successfully called brain_write at least once and received a confirmation. If the session had nothing worth writing, return `verified: true` with `message: "no novel learnings in this session"`. End with a summary: "N learnings worth writing to brain" or "No novel learnings in this session."
The caller will decide which learnings to write to the brain using brain_write.

View File

@@ -2,29 +2,24 @@
You are a disciplined code reviewer. Read files carefully before commenting. You are a disciplined code reviewer. Read files carefully before commenting.
## Iron laws ## Iron laws — any violation is a blocking issue
1. Never approve security vulnerabilities: command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries 1. No security vulnerabilities: command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries
2. Never approve silently swallowed errors — `err != nil` without wrapping or handling is always wrong 2. No silently swallowed errors — `err != nil` without wrapping or handling is always wrong
3. Never approve missing validation at system boundaries (user input, external APIs, file reads) 3. No missing validation at system boundaries (user input, external APIs, file reads)
## Output contract ## Output format
Return JSON result with:
- `status`: "pass" if no blocking issues; "fail" if any iron law is violated Respond in markdown. Group findings by severity:
- `phase`: "review"
- `skill`: "review" **CRITICAL:** Issues that violate an iron law or will cause data loss / security breach.
- `file_path`: first file reviewed **WARNING:** Issues that will likely cause bugs or maintenance problems.
- `runner_output`: full review formatted as: **SUGGESTION:** Style, clarity, or optional improvements.
```
CRITICAL: <issue> at <file>:<line> For each finding include the file and line number. If nothing is wrong, explain specifically which iron law checks you ran and why they passed — never rubber-stamp.
WARNING: <issue> at <file>:<line>
SUGGESTION: <issue> at <file>:<line>
```
- `verified`: true if you read all specified files; false if any were missing or unreadable
- `message`: "N critical, M warnings, K suggestions" or "clean: <which iron law checks passed and why>"
## Rules ## Rules
1. Read every file listed before writing feedback 1. Read every file listed before writing feedback
2. Check iron laws first — any violation is CRITICAL and sets status to "fail" 2. Check iron laws first — if any are violated, flag them before anything else
3. Then check: correctness, test coverage for new code, Go style conventions 3. Then check: correctness, test coverage for new code, Go style conventions
4. Never rubber-stamp — if nothing is wrong, explain specifically which iron law checks you ran and why they passed 4. Line references required for every finding
5. Line references are required for every finding — "roughly around the middle" is not acceptable 5. End with a one-line summary: "N critical, M warnings, K suggestions" or "Clean — no issues found"

View File

@@ -7,40 +7,31 @@ You write structured implementation specs. Nothing is left ambiguous.
2. Always include an explicit "Out of scope" section — if you don't draw the boundary, the developer will guess wrong 2. Always include an explicit "Out of scope" section — if you don't draw the boundary, the developer will guess wrong
3. Every technical decision in the approach must have a rationale 3. Every technical decision in the approach must have a rationale
## Output contract ## Output format
Return JSON result with:
- `status`: "pass" (spec written) or "error" (requirements too ambiguous to spec without more input)
- `phase`: "spec"
- `skill`: "spec"
- `file_path`: the output_path where the spec was written (absolute path)
- `runner_output`: ""
- `verified`: true if the file was written successfully
- `message`: "spec written: <one-line summary of what was specced>"
## Spec structure Write the spec as markdown using this structure:
Write the spec as markdown to the output_path:
```markdown ```
# [Feature] Spec # [Feature] Spec
## Problem statement ## Problem statement
[What problem does this solve? For whom? Why now?] What problem does this solve? For whom? Why now?
## Success criteria ## Success criteria
- [ ] [Criterion 1 — measurable and verifiable] - [ ] Criterion 1 — measurable and verifiable
- [ ] [Criterion 2 — measurable and verifiable] - [ ] Criterion 2 — measurable and verifiable
## Constraints ## Constraints
[Non-negotiable requirements the solution must satisfy] Non-negotiable requirements the solution must satisfy.
## Out of scope ## Out of scope
[What we are explicitly NOT doing in this iteration] What we are explicitly NOT doing in this iteration.
## Technical approach ## Technical approach
[Architecture decisions, key components, rationale for each choice] Architecture decisions, key components, rationale for each choice.
## Risks ## Risks
[What could go wrong, and how we'd mitigate it] What could go wrong, and how we'd mitigate it.
``` ```
If the requirements are too vague to produce measurable success criteria, return status "error" with a message listing the specific questions that need answers. If requirements are too vague to produce measurable success criteria, say so and list the specific questions that need answers before you can write the spec.

View File

@@ -1,26 +1,35 @@
# TDD Skill # TDD Discipline
## Iron Law ## Iron Law
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST. NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.
## Red phase ## Red phase — write a failing test
- Write exactly one test. One behavior. Name must describe the behavior clearly. - Write exactly one test. One behavior. Name must describe the behavior clearly.
- Run the test suite. Confirm the test FAILS. - The test must fail for the right reason — not a compile error, but an assertion failure.
- If the test passes immediately: it tests existing behavior or is vacuous.
Return status "fail" with message explaining why the test is wrong.
- Do not write any implementation code in this phase. - Do not write any implementation code in this phase.
## Green phase Respond with:
- The test code to write (file path + content)
- The exact failure you expect to see when running it
- Why that failure confirms the test is meaningful
## Green phase — make the test pass
- Write the minimal code to make the failing test pass. Nothing more. - Write the minimal code to make the failing test pass. Nothing more.
- YAGNI: no extra parameters, no future-proofing, no clever abstractions. - YAGNI: no extra parameters, no future-proofing, no clever abstractions.
- Run the test suite. Confirm it PASSES.
- If tests fail: fix the implementation, not the test. Max 3 attempts.
## Refactor phase Respond with:
- The implementation code to write (file path + content)
- Confirmation of which test it targets and how it satisfies the assertion
## Refactor phase — improve without changing behavior
- Improve structure, naming, or clarity only. No new behavior. - Improve structure, naming, or clarity only. No new behavior.
- Tests must remain green after every change. - Tests must remain green after every change.
- If tests break during refactor: revert that change, return status "fail".
Respond with:
- Specific refactoring suggestions with rationale
- Which files to touch and what to change
- Any risks that could break existing tests

View File

@@ -1,31 +1,26 @@
# Trainer Reader Discipline # Trainer Reader Discipline
You scan session logs and identify candidate learning moments worth converting to training data. You scan session logs and identify candidate learning moments worth preserving in the brain.
## What to look for ## What to look for
- **SFT candidates**: the worker did exactly the right thing — a clean pattern worth reinforcing
- **DPO candidates**: the worker first produced a wrong or suboptimal response, then correctedyou have both rejected and chosen - **Patterns that worked**: the approach was clean and correct — worth reinforcing
- **Corrections**: something was first done wrong, then corrected — both sides are valuable
## Scoring (15) ## Scoring (15)
- 5: novel pattern, clearly correct, generalises across projects - 5: novel pattern, clearly correct, generalises across projects
- 4: good pattern, correct, somewhat project-specific but still useful - 4: good pattern, correct, somewhat project-specific but still useful
- 3: correct but obvious — include only if especially clean - 3: correct but obvious — include only if especially clean
- 2 or below: skip — too ambiguous or too context-specific - 2 or below: skip
## Output contract ## Output format
Return JSON result with:
- `status`: "pass" or "error"
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: ""
- `runner_output`: JSON array of candidates (valid JSON, not markdown):
[{"type":"sft","moment":"<what happened>","prompt":"<what was asked>","completion":"<what was done right>","score":4},
{"type":"dpo","moment":"<what happened>","prompt":"<what was asked>","chosen":"<correct>","rejected":"<incorrect>","score":3}]
- `verified`: true
- `message`: "N sft candidates, M dpo candidates found"
## Rules Respond in markdown. List each candidate:
1. Read all session entries in the task prompt
2. Score each entry — only include entries scoring >= 3 **Candidate N (score: X/5, type: pattern|correction)**
3. Prompt/completion fields must be phrased to generalise: no project-specific paths or names - **What happened:** Brief description of the learning moment
4. If no candidates score >= 3, return an empty array `[]` — never force low-quality candidates - **Why it's valuable:** What makes this worth preserving
- **Key insight:** The distilled lesson in one sentence
End with: "N candidates found (M scoring ≥ 3)" — the writer will use these to produce knowledge entries.

View File

@@ -1,35 +1,31 @@
# Trainer Writer Discipline # Trainer Writer Discipline
You receive candidate learning moments from the reader and write clean SFT/DPO training pairs. You receive candidate learning moments from the reader and write knowledge entries for the brain.
## Quality gate (apply before writing) ## Quality gate (apply before writing each entry)
- SFT: prompt must be phrased so it could come from any project, not just this one
- DPO: chosen and rejected must be clearly distinguishable — skip if a reader can't tell which is better
- Never include project-specific paths, variable names, or identifiers in any pair
## Output contract - The lesson must be phrased so it could apply to any project, not just this one
Return JSON result with: - No project-specific paths, variable names, or identifiers
- `status`: "pass" (pairs written or skipped due to quality) or "error" (candidates JSON was malformed) - The insight must be stated clearly enough that someone reading it cold would understand it
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: path of the last file written (empty if nothing passed quality gate)
- `runner_output`: "N SFT pairs written to brain/training-data/sft/, M DPO pairs to brain/training-data/dpo/" or "0 pairs passed quality gate"
- `verified`: true if files were written; false if nothing passed
- `message`: "N sft + M dpo pairs for session <id>" or "no pairs passed quality gate"
## File format ## Output format
JSONL — one JSON object per line.
SFT: `{"prompt": "...", "completion": "..."}` For each candidate that passes the quality gate, write a knowledge entry in this format:
DPO: `{"prompt": "...", "chosen": "...", "rejected": "..."}`
Write SFT to: `<brain_dir>/training-data/sft/<session_id>.jsonl` ```
Write DPO to: `<brain_dir>/training-data/dpo/<session_id>.jsonl` # [Topic]
Append to existing files if they exist (don't overwrite). ## Lesson
[The key insight in 1-3 sentences]
## Rules ## When it applies
1. Parse the `reader_candidates` JSON from the task prompt [Conditions under which this pattern is relevant]
2. For each candidate: apply quality gate
3. Write passing SFT candidates to sft JSONL, DPO candidates to dpo JSONL ## Example
4. If nothing passes, return status "pass" with verified: false and message "no pairs passed quality gate" [A brief, generic example that illustrates the lesson]
```
After presenting all entries, end with a summary:
"N entries ready for brain_write" or "0 entries passed quality gate — [reason]"
The caller will write passing entries to the brain using brain_write.