mathias/hyperguild

Fork 0

Files

Mathias Bergqvist c9310b1079

cd / Build and deploy (push) Successful in 9s

Details

CI / Lint / Test / Vet (push) Successful in 10s

Details

CI / Mirror to GitHub (push) Successful in 4s

Details

fix(ingestion): always append .md extension to written filenames

brain_write with a custom filename omitted the .md extension, causing
search to skip the file (search.go filters on HasSuffix .md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-22 19:23:07 +02:00

55 KiB

Raw Permalink Blame History

Hyperguild Phase 2 Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add four new MCP skills (review, debug, spec, trainer) to the hyperguild supervisor, with automatic session history injection into all multi-phase skill workers.

Architecture: The supervisor reads prior session entries and injects them into each worker's task prompt when session_id is provided — the orchestrator no longer re-summarises history. The trainer skill runs a two-step sub-agent chain: a reader agent identifies learning moments from the session log, then a writer agent formats them as SFT/DPO pairs and writes to brain/training-data/. All other new skills are single-worker.

Tech Stack: Go 1.26, internal/session (JSONL log), internal/exec (claude subprocess executor), internal/registry (MCP tool registry), config/supervisor/*.md (discipline files), config/models.yaml (model routing)

File Map

New files

File	Responsibility
`internal/session/history.go`	`FormatHistory(entries, excludePhase)` — formats prior session entries as a prompt block
`internal/session/history_test.go`	Unit tests for FormatHistory
`internal/skills/review/skill.go`	review skill: Config, Skill, New, Name, Tools
`internal/skills/review/handlers.go`	`Handle`, `handleReview` — single-phase code review worker
`internal/skills/review/handlers_test.go`	review handler unit tests
`internal/skills/debug/skill.go`	debug skill
`internal/skills/debug/handlers.go`	`handleDebug` — hypothesis generation worker
`internal/skills/debug/handlers_test.go`	debug handler unit tests
`internal/skills/spec/skill.go`	spec skill
`internal/skills/spec/handlers.go`	`handleSpec` — spec writing worker
`internal/skills/spec/handlers_test.go`	spec handler unit tests
`internal/skills/trainer/skill.go`	trainer skill: Config with ReaderPrompt + WriterPrompt + BrainDir
`internal/skills/trainer/handlers.go`	`handleTrain` — calls ExecutorFn twice: reader then writer
`internal/skills/trainer/handlers_test.go`	trainer handler unit tests (verifies two-call chain)
`config/supervisor/review.md`	Review worker discipline file
`config/supervisor/debug.md`	Debug worker discipline file
`config/supervisor/spec.md`	Spec worker discipline file
`config/supervisor/trainer-reader.md`	Trainer reader agent discipline
`config/supervisor/trainer-writer.md`	Trainer writer agent discipline

Modified files

File	Change
`internal/exec/result.go`	Add review/debug/spec/trainer to `validPhases` and Schema enum
`internal/skills/tdd/skill.go`	Add `SessionsDir string` to Config; add `session_id` to green/refactor tool schemas
`internal/skills/tdd/handlers.go`	Add `session_id` to `greenArgs` and `refactorArgs`; inject history when `session_id` non-empty
`internal/skills/tdd/handlers_test.go`	Add tests for history injection in green and refactor
`cmd/supervisor/main.go`	Pass `SessionsDir` to tdd.Config; read discipline files and register 4 new skills
`config/models.yaml`	Add `trainer` model entry

Task 1: Session history utility

Files:

Create: internal/session/history.go
Create: internal/session/history_test.go
Step 1: Write the failing test

// internal/session/history_test.go
package session_test

import (
	"testing"
	"time"

	"github.com/mathiasbq/supervisor/internal/session"
	"github.com/stretchr/testify/assert"
)

func TestFormatHistoryEmpty(t *testing.T) {
	result := session.FormatHistory(nil, "")
	assert.Equal(t, "", result)
}

func TestFormatHistoryFormatsEntries(t *testing.T) {
	entries := []session.Entry{
		{
			Skill: "tdd", Phase: "red", FinalStatus: "pass",
			FilePath: "internal/foo/foo_test.go",
			Message:  "wrote failing test for Foo",
			Timestamp: time.Now(),
		},
	}
	result := session.FormatHistory(entries, "")
	assert.Contains(t, result, "## Session history")
	assert.Contains(t, result, "Phase: red")
	assert.Contains(t, result, "wrote failing test for Foo")
	assert.Contains(t, result, "internal/foo/foo_test.go")
}

func TestFormatHistoryExcludesCurrentPhase(t *testing.T) {
	entries := []session.Entry{
		{Skill: "tdd", Phase: "red", Message: "red done", FinalStatus: "pass"},
		{Skill: "tdd", Phase: "green", Message: "green done", FinalStatus: "pass"},
	}
	result := session.FormatHistory(entries, "green")
	assert.Contains(t, result, "red done")
	assert.NotContains(t, result, "green done")
}

Step 2: Run test to confirm it fails

cd /path/to/supervisor
go test ./internal/session/... -run TestFormatHistory -v

Expected: FAIL — session.FormatHistory undefined

Step 3: Implement FormatHistory

// internal/session/history.go
package session

import (
	"fmt"
	"strings"
)

// FormatHistory formats prior session entries as a structured block for
// injection into a worker task prompt. Entries matching excludePhase are
// omitted (pass the current phase to avoid circular injection).
func FormatHistory(entries []Entry, excludePhase string) string {
	var filtered []Entry
	for _, e := range entries {
		if e.Phase != excludePhase {
			filtered = append(filtered, e)
		}
	}
	if len(filtered) == 0 {
		return ""
	}

	var b strings.Builder
	b.WriteString("## Session history\n\n")
	for _, e := range filtered {
		fmt.Fprintf(&b, "### Phase: %s\n", e.Phase)
		fmt.Fprintf(&b, "- Skill: %s\n", e.Skill)
		fmt.Fprintf(&b, "- Status: %s\n", e.FinalStatus)
		if e.FilePath != "" {
			fmt.Fprintf(&b, "- File: %s\n", e.FilePath)
		}
		if e.Message != "" {
			fmt.Fprintf(&b, "- Summary: %s\n", e.Message)
		}
		b.WriteString("\n")
	}
	return b.String()
}

Step 4: Run tests to confirm they pass

go test ./internal/session/... -v

Expected: all TestFormatHistory* tests PASS

Step 5: Commit

git add internal/session/history.go internal/session/history_test.go
git commit -m "feat(session): add FormatHistory for worker context injection"

Task 2: Fix Schema enum and validPhases

Files:

Modify: internal/exec/result.go

The Schema's phase enum currently only lists ["red","green","refactor"]. Workers using any other phase name get forced to pick from that list (the retrospective worker was returning "refactor" as a result). Fix by removing the enum constraint from the schema (let it be any string) and rely solely on validPhases for server-side validation.

Step 1: Write the failing test

// Add to internal/exec/result_test.go (check existing file first for test helpers)
func TestValidateAcceptsAllPhases(t *testing.T) {
	phases := []string{"red", "green", "refactor", "retrospective", "review", "debug", "spec", "trainer"}
	for _, phase := range phases {
		r := exec.Result{Status: "pass", Phase: phase, Skill: "test", ModelUsed: "self", Message: "ok"}
		assert.NoError(t, r.Validate(), "phase %q should be valid", phase)
	}
}

Step 2: Run test to confirm it fails

go test ./internal/exec/... -run TestValidateAcceptsAllPhases -v

Expected: FAIL — "review", "debug", "spec", "trainer" fail validation

Step 3: Update result.go

In internal/exec/result.go, make these two changes:

Change 1 — update validPhases:

var validPhases = map[string]bool{
	"red":           true,
	"green":         true,
	"refactor":      true,
	"retrospective": true,
	"review":        true,
	"debug":         true,
	"spec":          true,
	"trainer":       true,
}

Change 2 — remove the enum constraint from the Schema phase property (replace the "phase" line):

// Before:
"phase":         {"type": "string", "enum": ["red","green","refactor"]},

// After:
"phase":         {"type": "string"},

Also fix the error message in Validate():

// Before:
errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase)

// After:
errs = append(errs, "phase must be one of red|green|refactor|retrospective|review|debug|spec|trainer, got: "+r.Phase)

Step 4: Run tests to confirm they pass

go test ./internal/exec/... -v

Expected: all tests PASS, including TestValidateAcceptsAllPhases

Step 5: Commit

git add internal/exec/result.go
git commit -m "fix(exec): expand validPhases and remove schema enum constraint for phase"

Task 3: Session history injection in TDD green and refactor

Files:

Modify: internal/skills/tdd/skill.go
Modify: internal/skills/tdd/handlers.go
Modify: internal/skills/tdd/handlers_test.go
Modify: cmd/supervisor/main.go
Step 1: Write the failing tests

Add to internal/skills/tdd/handlers_test.go:

func TestTDDGreenInjectsSessionHistory(t *testing.T) {
	sessDir := t.TempDir()
	require.NoError(t, session.Append(sessDir, "sess-1", session.Entry{
		SessionID: "sess-1", Skill: "tdd", Phase: "red", FinalStatus: "pass",
		FilePath: "internal/foo/foo_test.go",
		Message:  "wrote failing test for Foo",
	}))

	var capturedPrompt string
	fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
		capturedPrompt = req.TaskPrompt
		return iexec.Result{Status: "pass", Phase: "green", Skill: "tdd", Verified: true, ModelUsed: "self", Message: "ok"}, nil
	}

	sk := tdd.New(tdd.Config{SkillPrompt: "tdd", ExecutorFn: fakeFn, SessionsDir: sessDir})
	_, err := sk.Handle(context.Background(), "tdd_green", json.RawMessage(
		`{"project_root":"/tmp","test_path":"internal/foo/foo_test.go","test_cmd":"go test ./...","session_id":"sess-1"}`,
	))
	require.NoError(t, err)
	assert.Contains(t, capturedPrompt, "## Session history")
	assert.Contains(t, capturedPrompt, "wrote failing test for Foo")
}

func TestTDDGreenNoHistoryWhenSessionIDEmpty(t *testing.T) {
	var capturedPrompt string
	fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
		capturedPrompt = req.TaskPrompt
		return iexec.Result{Status: "pass", Phase: "green", Skill: "tdd", Verified: true, ModelUsed: "self", Message: "ok"}, nil
	}

	sk := tdd.New(tdd.Config{SkillPrompt: "tdd", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
	_, err := sk.Handle(context.Background(), "tdd_green", json.RawMessage(
		`{"project_root":"/tmp","test_path":"internal/foo/foo_test.go"}`,
	))
	require.NoError(t, err)
	assert.NotContains(t, capturedPrompt, "## Session history")
}

You will need these imports in the test file:

import (
    "github.com/mathiasbq/supervisor/internal/session"
    iexec "github.com/mathiasbq/supervisor/internal/exec"
)

Step 2: Run tests to confirm they fail

go test ./internal/skills/tdd/... -run TestTDDGreen -v

Expected: FAIL — tdd.Config has no SessionsDir field, session_id not in args

Step 3: Add SessionsDir to Config and session_id to tool schemas

In internal/skills/tdd/skill.go, add SessionsDir to Config:

type Config struct {
	SystemPrompt string
	SkillPrompt  string
	ExecutorFn   ExecutorFn
	DefaultModel string
	SessionsDir  string // optional: path to brain/sessions/ for history injection
}

In Tools(), add "session_id" as an optional property to tdd_green and tdd_refactor input schemas:

// tdd_green InputSchema:
InputSchema: schema(
    []string{"project_root", "test_path"},
    map[string]any{
        "project_root": strProp,
        "test_path":    strProp,
        "model":        strProp,
        "test_cmd":     strProp,
        "session_id":   strProp,
    },
),
// tdd_refactor InputSchema (same addition):
InputSchema: schema(
    []string{"project_root", "test_path", "impl_path"},
    map[string]any{
        "project_root": strProp,
        "test_path":    strProp,
        "impl_path":    strProp,
        "model":        strProp,
        "test_cmd":     strProp,
        "session_id":   strProp,
    },
),

Step 4: Update greenArgs, refactorArgs, and inject history

In internal/skills/tdd/handlers.go, add imports and update structs + handlers:

import (
    // existing imports...
    "github.com/mathiasbq/supervisor/internal/session"
)

type greenArgs struct {
	ProjectRoot string `json:"project_root"`
	TestPath    string `json:"test_path"`
	Model       string `json:"model"`
	TestCmd     string `json:"test_cmd"`
	SessionID   string `json:"session_id"`
}

func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
	var args greenArgs
	if err := json.Unmarshal(raw, &args); err != nil {
		return nil, fmt.Errorf("parse args: %w", err)
	}
	if args.ProjectRoot == "" {
		return nil, fmt.Errorf("project_root is required")
	}
	if args.TestPath == "" {
		return nil, fmt.Errorf("test_path is required")
	}

	task := fmt.Sprintf(
		"phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s",
		args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd,
	)
	task = s.prependHistory(args.SessionID, "green", task)
	return s.execute(ctx, task)
}

type refactorArgs struct {
	ProjectRoot string `json:"project_root"`
	TestPath    string `json:"test_path"`
	ImplPath    string `json:"impl_path"`
	Model       string `json:"model"`
	TestCmd     string `json:"test_cmd"`
	SessionID   string `json:"session_id"`
}

func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
	var args refactorArgs
	if err := json.Unmarshal(raw, &args); err != nil {
		return nil, fmt.Errorf("parse args: %w", err)
	}
	if args.ProjectRoot == "" {
		return nil, fmt.Errorf("project_root is required")
	}
	if args.TestPath == "" {
		return nil, fmt.Errorf("test_path is required")
	}
	if args.ImplPath == "" {
		return nil, fmt.Errorf("impl_path is required")
	}

	task := fmt.Sprintf(
		"phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s",
		args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd,
	)
	task = s.prependHistory(args.SessionID, "refactor", task)
	return s.execute(ctx, task)
}

// prependHistory reads the session log and prepends prior phase entries to the task prompt.
// If sessionID is empty or SessionsDir is not configured, task is returned unchanged.
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
	if sessionID == "" || s.cfg.SessionsDir == "" {
		return task
	}
	entries, err := session.Read(s.cfg.SessionsDir, sessionID)
	if err != nil || len(entries) == 0 {
		return task
	}
	history := session.FormatHistory(entries, currentPhase)
	if history == "" {
		return task
	}
	return history + "\n---\n\n" + task
}

Step 5: Update main.go to pass SessionsDir to tdd.Config

In cmd/supervisor/main.go, find the tdd.New(tdd.Config{...}) call and add SessionsDir:

reg.Register(tdd.New(tdd.Config{
	SystemPrompt: string(systemPrompt),
	SkillPrompt:  string(tddPrompt),
	DefaultModel: models.Resolve("tdd", ""),
	ExecutorFn:   executor.Run,
	SessionsDir:  cfg.SessionsDir,  // ← add this line
}))

Step 6: Run all tests

go test ./... -race -count=1

Expected: all tests PASS

Step 7: Commit

git add internal/skills/tdd/skill.go internal/skills/tdd/handlers.go internal/skills/tdd/handlers_test.go cmd/supervisor/main.go
git commit -m "feat(tdd): inject session history into green and refactor worker prompts"

Task 4: review skill

Files:

Create: internal/skills/review/skill.go
Create: internal/skills/review/handlers.go
Create: internal/skills/review/handlers_test.go
Create: config/supervisor/review.md
Modify: cmd/supervisor/main.go
Step 1: Write the failing test

// internal/skills/review/handlers_test.go
package review_test

import (
	"context"
	"encoding/json"
	"testing"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/skills/review"
	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

func TestReviewToolRegistered(t *testing.T) {
	sk := review.New(review.Config{SkillPrompt: "review rules"})
	names := make([]string, 0)
	for _, tool := range sk.Tools() {
		names = append(names, tool.Name)
	}
	assert.Contains(t, names, "review")
}

func TestReviewRequiresProjectRoot(t *testing.T) {
	sk := review.New(review.Config{SkillPrompt: "r"})
	_, err := sk.Handle(context.Background(), "review", json.RawMessage(`{"files":["main.go"]}`))
	assert.ErrorContains(t, err, "project_root")
}

func TestReviewRequiresFiles(t *testing.T) {
	sk := review.New(review.Config{SkillPrompt: "r"})
	_, err := sk.Handle(context.Background(), "review", json.RawMessage(`{"project_root":"/tmp"}`))
	assert.ErrorContains(t, err, "files")
}

func TestReviewCallsExecutor(t *testing.T) {
	called := false
	var capturedTask string
	fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
		called = true
		capturedTask = req.TaskPrompt
		return iexec.Result{
			Status: "pass", Phase: "review", Skill: "review",
			Verified: true, ModelUsed: "self", Message: "2 warnings found",
		}, nil
	}

	sk := review.New(review.Config{SkillPrompt: "review rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
	out, err := sk.Handle(context.Background(), "review", json.RawMessage(
		`{"project_root":"/tmp/proj","files":["internal/foo/foo.go"],"context":"PR: add Foo helper"}`,
	))
	require.NoError(t, err)
	assert.True(t, called)
	assert.Contains(t, capturedTask, "internal/foo/foo.go")
	assert.Contains(t, capturedTask, "PR: add Foo helper")

	var result iexec.Result
	require.NoError(t, json.Unmarshal(out, &result))
	assert.Equal(t, "pass", result.Status)
	assert.Equal(t, "review", result.Phase)
}

var _ = require.New

Step 2: Run test to confirm it fails

go test ./internal/skills/review/... -v

Expected: FAIL — package review does not exist

Step 3: Create skill.go

// internal/skills/review/skill.go
package review

import (
	"context"
	"encoding/json"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/registry"
)

type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)

type Config struct {
	SkillPrompt  string
	DefaultModel string
	ExecutorFn   ExecutorFn
	SessionsDir  string
}

type Skill struct{ cfg Config }

func New(cfg Config) *Skill { return &Skill{cfg: cfg} }

func (s *Skill) Name() string { return "review" }

func (s *Skill) Tools() []registry.ToolDef {
	schema := func(required []string, props map[string]any) json.RawMessage {
		b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
		return b
	}
	return []registry.ToolDef{
		{
			Name:        "review",
			Description: "Perform a structured code review of the specified files. Returns findings with severity levels.",
			InputSchema: schema(
				[]string{"project_root", "files"},
				map[string]any{
					"project_root": map[string]any{"type": "string"},
					"files":        map[string]any{"type": "array", "items": map[string]any{"type": "string"}},
					"context":      map[string]any{"type": "string"},
					"model":        map[string]any{"type": "string"},
					"session_id":   map[string]any{"type": "string"},
				},
			),
		},
	}
}

Step 4: Create handlers.go

// internal/skills/review/handlers.go
package review

import (
	"context"
	"encoding/json"
	"fmt"
	"strings"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/session"
)

type reviewArgs struct {
	ProjectRoot string   `json:"project_root"`
	Files       []string `json:"files"`
	Context     string   `json:"context"`
	Model       string   `json:"model"`
	SessionID   string   `json:"session_id"`
}

func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
	if tool != "review" {
		return nil, fmt.Errorf("unknown tool: %s", tool)
	}
	var a reviewArgs
	if err := json.Unmarshal(args, &a); err != nil {
		return nil, fmt.Errorf("parse args: %w", err)
	}
	if a.ProjectRoot == "" {
		return nil, fmt.Errorf("project_root is required")
	}
	if len(a.Files) == 0 {
		return nil, fmt.Errorf("files is required")
	}

	model := a.Model
	if model == "" {
		model = s.cfg.DefaultModel
	}

	task := fmt.Sprintf(
		"phase: review\nproject_root: %s\nfiles: %s\ncontext: %s\nmodel: %s",
		a.ProjectRoot, strings.Join(a.Files, ", "), a.Context, model,
	)
	task = s.prependHistory(a.SessionID, "review", task)

	if s.cfg.ExecutorFn == nil {
		return nil, fmt.Errorf("no executor configured")
	}
	result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
		SkillPrompt: s.cfg.SkillPrompt,
		TaskPrompt:  task,
		Model:       model,
		Tools:       "Read,Bash",
	})
	if err != nil {
		return nil, err
	}
	b, err := json.Marshal(result)
	if err != nil {
		return nil, fmt.Errorf("marshal result: %w", err)
	}
	return b, nil
}

func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
	if sessionID == "" || s.cfg.SessionsDir == "" {
		return task
	}
	entries, err := session.Read(s.cfg.SessionsDir, sessionID)
	if err != nil || len(entries) == 0 {
		return task
	}
	history := session.FormatHistory(entries, currentPhase)
	if history == "" {
		return task
	}
	return history + "\n---\n\n" + task
}

Step 5: Create config/supervisor/review.md

# Code Review Discipline

You are a disciplined code reviewer. Read files carefully before commenting.

## Iron laws
1. Never approve security vulnerabilities: command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries
2. Never approve silently swallowed errors — `err != nil` without wrapping or handling is always wrong
3. Never approve missing validation at system boundaries (user input, external APIs, file reads)

## Output contract
Return JSON result with:
- `status`: "pass" if no blocking issues; "fail" if any iron law is violated
- `phase`: "review"
- `skill`: "review"
- `file_path`: first file reviewed
- `runner_output`: full review formatted as:

CRITICAL: at : WARNING: at : SUGGESTION: at :

- `verified`: true if you read all specified files; false if any were missing or unreadable
- `message`: "N critical, M warnings, K suggestions" or "clean: <which iron law checks passed and why>"

## Rules
1. Read every file listed before writing feedback
2. Check iron laws first — any violation is CRITICAL and sets status to "fail"
3. Then check: correctness, test coverage for new code, Go style conventions
4. Never rubber-stamp — if nothing is wrong, explain specifically which iron law checks you ran and why they passed
5. Line references are required for every finding — "roughly around the middle" is not acceptable

Step 6: Wire into main.go

In cmd/supervisor/main.go, add after existing os.ReadFile calls for discipline files:

reviewPrompt, err := os.ReadFile(cfg.ConfigDir + "/review.md")
if err != nil {
    logger.Error("read review.md", "path", cfg.ConfigDir+"/review.md", "err", err)
    os.Exit(1)
}

Add import: "github.com/mathiasbq/supervisor/internal/skills/review"

reg.Register(review.New(review.Config{
    SkillPrompt:  string(reviewPrompt),
    DefaultModel: models.Resolve("review", ""),
    ExecutorFn:   executor.Run,
    SessionsDir:  cfg.SessionsDir,
}))

Step 7: Run all tests

go test ./... -race -count=1

Expected: all tests PASS

Step 8: Commit

git add internal/skills/review/ config/supervisor/review.md cmd/supervisor/main.go
git commit -m "feat(review): add code review MCP skill with session history injection"

Task 5: debug skill

Files:

Create: internal/skills/debug/skill.go
Create: internal/skills/debug/handlers.go
Create: internal/skills/debug/handlers_test.go
Create: config/supervisor/debug.md
Modify: cmd/supervisor/main.go
Step 1: Write the failing test

// internal/skills/debug/handlers_test.go
package debug_test

import (
	"context"
	"encoding/json"
	"testing"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/skills/debug"
	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

func TestDebugToolRegistered(t *testing.T) {
	sk := debug.New(debug.Config{SkillPrompt: "debug rules"})
	names := make([]string, 0)
	for _, tool := range sk.Tools() {
		names = append(names, tool.Name)
	}
	assert.Contains(t, names, "debug")
}

func TestDebugRequiresProjectRoot(t *testing.T) {
	sk := debug.New(debug.Config{SkillPrompt: "d"})
	_, err := sk.Handle(context.Background(), "debug", json.RawMessage(`{"error":"panic: nil pointer"}`))
	assert.ErrorContains(t, err, "project_root")
}

func TestDebugRequiresError(t *testing.T) {
	sk := debug.New(debug.Config{SkillPrompt: "d"})
	_, err := sk.Handle(context.Background(), "debug", json.RawMessage(`{"project_root":"/tmp"}`))
	assert.ErrorContains(t, err, "error")
}

func TestDebugCallsExecutor(t *testing.T) {
	called := false
	var capturedTask string
	fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
		called = true
		capturedTask = req.TaskPrompt
		return iexec.Result{
			Status: "pass", Phase: "debug", Skill: "debug",
			RunnerOutput: "HYPOTHESIS 1 (likelihood: high): nil map access\nVERIFY: go test ./... → expected: panic line reference",
			Verified:     false, ModelUsed: "self", Message: "3 hypotheses for: panic nil pointer at foo.go:42",
		}, nil
	}

	sk := debug.New(debug.Config{SkillPrompt: "debug rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
	out, err := sk.Handle(context.Background(), "debug", json.RawMessage(
		`{"project_root":"/tmp/proj","error":"panic: nil pointer dereference at foo.go:42","context":"occurs on startup"}`,
	))
	require.NoError(t, err)
	assert.True(t, called)
	assert.Contains(t, capturedTask, "panic: nil pointer dereference")
	assert.Contains(t, capturedTask, "occurs on startup")

	var result iexec.Result
	require.NoError(t, json.Unmarshal(out, &result))
	assert.Equal(t, "debug", result.Phase)
}

var _ = require.New

Step 2: Run test to confirm it fails

go test ./internal/skills/debug/... -v

Expected: FAIL — package does not exist

Step 3: Create skill.go

// internal/skills/debug/skill.go
package debug

import (
	"context"
	"encoding/json"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/registry"
)

type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)

type Config struct {
	SkillPrompt  string
	DefaultModel string
	ExecutorFn   ExecutorFn
	SessionsDir  string
}

type Skill struct{ cfg Config }

func New(cfg Config) *Skill { return &Skill{cfg: cfg} }

func (s *Skill) Name() string { return "debug" }

func (s *Skill) Tools() []registry.ToolDef {
	schema := func(required []string, props map[string]any) json.RawMessage {
		b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
		return b
	}
	str := map[string]any{"type": "string"}
	return []registry.ToolDef{
		{
			Name:        "debug",
			Description: "Analyse an error and return 3-5 hypotheses ordered by likelihood, each with a concrete verification step.",
			InputSchema: schema(
				[]string{"project_root", "error"},
				map[string]any{
					"project_root": str,
					"error":        str,
					"context":      str,
					"model":        str,
					"session_id":   str,
				},
			),
		},
	}
}

Step 4: Create handlers.go

// internal/skills/debug/handlers.go
package debug

import (
	"context"
	"encoding/json"
	"fmt"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/session"
)

type debugArgs struct {
	ProjectRoot string `json:"project_root"`
	Error       string `json:"error"`
	Context     string `json:"context"`
	Model       string `json:"model"`
	SessionID   string `json:"session_id"`
}

func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
	if tool != "debug" {
		return nil, fmt.Errorf("unknown tool: %s", tool)
	}
	var a debugArgs
	if err := json.Unmarshal(args, &a); err != nil {
		return nil, fmt.Errorf("parse args: %w", err)
	}
	if a.ProjectRoot == "" {
		return nil, fmt.Errorf("project_root is required")
	}
	if a.Error == "" {
		return nil, fmt.Errorf("error is required")
	}

	model := a.Model
	if model == "" {
		model = s.cfg.DefaultModel
	}

	task := fmt.Sprintf(
		"phase: debug\nproject_root: %s\nerror: %s\ncontext: %s\nmodel: %s",
		a.ProjectRoot, a.Error, a.Context, model,
	)
	task = s.prependHistory(a.SessionID, "debug", task)

	if s.cfg.ExecutorFn == nil {
		return nil, fmt.Errorf("no executor configured")
	}
	result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
		SkillPrompt: s.cfg.SkillPrompt,
		TaskPrompt:  task,
		Model:       model,
		Tools:       "Read,Bash",
	})
	if err != nil {
		return nil, err
	}
	b, err := json.Marshal(result)
	if err != nil {
		return nil, fmt.Errorf("marshal result: %w", err)
	}
	return b, nil
}

func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
	if sessionID == "" || s.cfg.SessionsDir == "" {
		return task
	}
	entries, err := session.Read(s.cfg.SessionsDir, sessionID)
	if err != nil || len(entries) == 0 {
		return task
	}
	history := session.FormatHistory(entries, currentPhase)
	if history == "" {
		return task
	}
	return history + "\n---\n\n" + task
}

Step 5: Create config/supervisor/debug.md

# Debug Discipline

You are a systematic debugger. Form hypotheses before suggesting fixes.

## Iron laws
1. Never suggest "try X and see what happens" — every hypothesis must have a specific expected outcome if correct
2. Generate exactly 3-5 hypotheses, ordered by likelihood (most likely first)
3. Never fix the bug — diagnose only; the caller decides what to do with the hypotheses

## Output contract
Return JSON result with:
- `status`: "pass" (hypotheses generated) or "error" (error too ambiguous to analyse)
- `phase`: "debug"
- `skill`: "debug"
- `file_path`: the most relevant file to the error (read it)
- `runner_output`: your hypotheses, formatted as:

HYPOTHESIS 1 (likelihood: high): VERIFY: → expected if correct:

HYPOTHESIS 2 (likelihood: medium): VERIFY: → expected if correct:

- `verified`: false — verification is the caller's job
- `message`: "N hypotheses for: <one-line error summary>"

## Rules
1. Read the error and any context files provided before forming hypotheses
2. Identify the failure mode first — what actually went wrong, not just what the error says
3. For each hypothesis: name the mechanism, explain why it would produce this exact error, give a concrete verification command with expected output
4. If the error is clearly a typo or trivial mistake, still form 3 hypotheses — surface the most likely cause as #1

Step 6: Wire into main.go

Add file read:

debugPrompt, err := os.ReadFile(cfg.ConfigDir + "/debug.md")
if err != nil {
    logger.Error("read debug.md", "path", cfg.ConfigDir+"/debug.md", "err", err)
    os.Exit(1)
}

Add import: "github.com/mathiasbq/supervisor/internal/skills/debug"

reg.Register(debug.New(debug.Config{
    SkillPrompt:  string(debugPrompt),
    DefaultModel: models.Resolve("debug", ""),
    ExecutorFn:   executor.Run,
    SessionsDir:  cfg.SessionsDir,
}))

Step 7: Run all tests

go test ./... -race -count=1

Expected: all tests PASS

Step 8: Commit

git add internal/skills/debug/ config/supervisor/debug.md cmd/supervisor/main.go
git commit -m "feat(debug): add debug MCP skill with hypothesis generation"

Task 6: spec skill

Files:

Create: internal/skills/spec/skill.go
Create: internal/skills/spec/handlers.go
Create: internal/skills/spec/handlers_test.go
Create: config/supervisor/spec.md
Modify: cmd/supervisor/main.go
Step 1: Write the failing test

// internal/skills/spec/handlers_test.go
package spec_test

import (
	"context"
	"encoding/json"
	"testing"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/skills/spec"
	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

func TestSpecToolRegistered(t *testing.T) {
	sk := spec.New(spec.Config{SkillPrompt: "spec rules"})
	names := make([]string, 0)
	for _, tool := range sk.Tools() {
		names = append(names, tool.Name)
	}
	assert.Contains(t, names, "spec")
}

func TestSpecRequiresProjectRoot(t *testing.T) {
	sk := spec.New(spec.Config{SkillPrompt: "s"})
	_, err := sk.Handle(context.Background(), "spec", json.RawMessage(`{"requirements":"add login"}`))
	assert.ErrorContains(t, err, "project_root")
}

func TestSpecRequiresRequirements(t *testing.T) {
	sk := spec.New(spec.Config{SkillPrompt: "s"})
	_, err := sk.Handle(context.Background(), "spec", json.RawMessage(`{"project_root":"/tmp"}`))
	assert.ErrorContains(t, err, "requirements")
}

func TestSpecCallsExecutor(t *testing.T) {
	called := false
	var capturedTask string
	fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
		called = true
		capturedTask = req.TaskPrompt
		return iexec.Result{
			Status: "pass", Phase: "spec", Skill: "spec",
			FilePath: "/tmp/proj/docs/login-spec.md",
			Verified: true, ModelUsed: "self", Message: "spec written: login feature",
		}, nil
	}

	sk := spec.New(spec.Config{SkillPrompt: "spec rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
	out, err := sk.Handle(context.Background(), "spec", json.RawMessage(
		`{"project_root":"/tmp/proj","requirements":"add OAuth2 login","output_path":"docs/login-spec.md"}`,
	))
	require.NoError(t, err)
	assert.True(t, called)
	assert.Contains(t, capturedTask, "OAuth2 login")
	assert.Contains(t, capturedTask, "docs/login-spec.md")

	var result iexec.Result
	require.NoError(t, json.Unmarshal(out, &result))
	assert.Equal(t, "spec", result.Phase)
}

var _ = require.New

Step 2: Run test to confirm it fails

go test ./internal/skills/spec/... -v

Expected: FAIL — package does not exist

Step 3: Create skill.go

// internal/skills/spec/skill.go
package spec

import (
	"context"
	"encoding/json"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/registry"
)

type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)

type Config struct {
	SkillPrompt  string
	DefaultModel string
	ExecutorFn   ExecutorFn
	SessionsDir  string
}

type Skill struct{ cfg Config }

func New(cfg Config) *Skill { return &Skill{cfg: cfg} }

func (s *Skill) Name() string { return "spec" }

func (s *Skill) Tools() []registry.ToolDef {
	schema := func(required []string, props map[string]any) json.RawMessage {
		b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
		return b
	}
	str := map[string]any{"type": "string"}
	return []registry.ToolDef{
		{
			Name:        "spec",
			Description: "Generate a structured implementation spec from requirements. Writes the spec to output_path in the project.",
			InputSchema: schema(
				[]string{"project_root", "requirements"},
				map[string]any{
					"project_root": str,
					"requirements": str,
					"output_path":  str,
					"context":      str,
					"model":        str,
					"session_id":   str,
				},
			),
		},
	}
}

Step 4: Create handlers.go

// internal/skills/spec/handlers.go
package spec

import (
	"context"
	"encoding/json"
	"fmt"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/session"
)

type specArgs struct {
	ProjectRoot  string `json:"project_root"`
	Requirements string `json:"requirements"`
	OutputPath   string `json:"output_path"`
	Context      string `json:"context"`
	Model        string `json:"model"`
	SessionID    string `json:"session_id"`
}

func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
	if tool != "spec" {
		return nil, fmt.Errorf("unknown tool: %s", tool)
	}
	var a specArgs
	if err := json.Unmarshal(args, &a); err != nil {
		return nil, fmt.Errorf("parse args: %w", err)
	}
	if a.ProjectRoot == "" {
		return nil, fmt.Errorf("project_root is required")
	}
	if a.Requirements == "" {
		return nil, fmt.Errorf("requirements is required")
	}
	outputPath := a.OutputPath
	if outputPath == "" {
		outputPath = "docs/spec.md"
	}

	model := a.Model
	if model == "" {
		model = s.cfg.DefaultModel
	}

	task := fmt.Sprintf(
		"phase: spec\nproject_root: %s\nrequirements: %s\noutput_path: %s\ncontext: %s\nmodel: %s",
		a.ProjectRoot, a.Requirements, outputPath, a.Context, model,
	)
	task = s.prependHistory(a.SessionID, "spec", task)

	if s.cfg.ExecutorFn == nil {
		return nil, fmt.Errorf("no executor configured")
	}
	result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
		SkillPrompt: s.cfg.SkillPrompt,
		TaskPrompt:  task,
		Model:       model,
		Tools:       "Read,Write",
	})
	if err != nil {
		return nil, err
	}
	b, err := json.Marshal(result)
	if err != nil {
		return nil, fmt.Errorf("marshal result: %w", err)
	}
	return b, nil
}

func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
	if sessionID == "" || s.cfg.SessionsDir == "" {
		return task
	}
	entries, err := session.Read(s.cfg.SessionsDir, sessionID)
	if err != nil || len(entries) == 0 {
		return task
	}
	history := session.FormatHistory(entries, currentPhase)
	if history == "" {
		return task
	}
	return history + "\n---\n\n" + task
}

Step 5: Create config/supervisor/spec.md

# Spec Writing Discipline

You write structured implementation specs. Nothing is left ambiguous.

## Iron laws
1. Success criteria must be measurable — "the system is fast" is banned; "p99 < 200ms under 100 RPS" is valid
2. Always include an explicit "Out of scope" section — if you don't draw the boundary, the developer will guess wrong
3. Every technical decision in the approach must have a rationale

## Output contract
Return JSON result with:
- `status`: "pass" (spec written) or "error" (requirements too ambiguous to spec without more input)
- `phase`: "spec"
- `skill`: "spec"
- `file_path`: the output_path where the spec was written (absolute path)
- `runner_output`: ""
- `verified`: true if the file was written successfully
- `message`: "spec written: <one-line summary of what was specced>"

## Spec structure
Write the spec as markdown to the output_path:

```markdown
# [Feature] Spec

## Problem statement
[What problem does this solve? For whom? Why now?]

## Success criteria
- [ ] [Criterion 1 — measurable and verifiable]
- [ ] [Criterion 2 — measurable and verifiable]

## Constraints
[Non-negotiable requirements the solution must satisfy]

## Out of scope
[What we are explicitly NOT doing in this iteration]

## Technical approach
[Architecture decisions, key components, rationale for each choice]

## Risks
[What could go wrong, and how we'd mitigate it]

If the requirements are too vague to produce measurable success criteria, return status "error" with a message listing the specific questions that need answers.


- [ ] **Step 6: Wire into main.go**

Add file read:
```go
specPrompt, err := os.ReadFile(cfg.ConfigDir + "/spec.md")
if err != nil {
    logger.Error("read spec.md", "path", cfg.ConfigDir+"/spec.md", "err", err)
    os.Exit(1)
}

Add import: "github.com/mathiasbq/supervisor/internal/skills/spec"

reg.Register(spec.New(spec.Config{
    SkillPrompt:  string(specPrompt),
    DefaultModel: models.Resolve("spec", ""),
    ExecutorFn:   executor.Run,
    SessionsDir:  cfg.SessionsDir,
}))

Step 7: Run all tests

go test ./... -race -count=1

Expected: all tests PASS

Step 8: Commit

git add internal/skills/spec/ config/supervisor/spec.md cmd/supervisor/main.go
git commit -m "feat(spec): add spec writing MCP skill"

Task 7: trainer skill

Files:

Create: internal/skills/trainer/skill.go
Create: internal/skills/trainer/handlers.go
Create: internal/skills/trainer/handlers_test.go
Create: config/supervisor/trainer-reader.md
Create: config/supervisor/trainer-writer.md
Modify: cmd/supervisor/main.go
Modify: config/models.yaml
Step 1: Write the failing test

// internal/skills/trainer/handlers_test.go
package trainer_test

import (
	"context"
	"encoding/json"
	"testing"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/session"
	"github.com/mathiasbq/supervisor/internal/skills/trainer"
	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

func TestTrainerToolRegistered(t *testing.T) {
	sk := trainer.New(trainer.Config{ReaderPrompt: "r", WriterPrompt: "w"})
	names := make([]string, 0)
	for _, tool := range sk.Tools() {
		names = append(names, tool.Name)
	}
	assert.Contains(t, names, "trainer")
}

func TestTrainerRequiresSessionID(t *testing.T) {
	sk := trainer.New(trainer.Config{ReaderPrompt: "r", WriterPrompt: "w"})
	_, err := sk.Handle(context.Background(), "trainer", json.RawMessage(`{}`))
	assert.ErrorContains(t, err, "session_id")
}

func TestTrainerCallsReaderThenWriter(t *testing.T) {
	sessDir := t.TempDir()
	require.NoError(t, session.Append(sessDir, "sess-1", session.Entry{
		SessionID: "sess-1", Skill: "tdd", Phase: "red", FinalStatus: "pass",
		Message: "wrote failing test", FilePath: "internal/foo/foo_test.go",
	}))

	callCount := 0
	var readerTask, writerTask string

	fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
		callCount++
		if callCount == 1 {
			// reader call
			readerTask = req.TaskPrompt
			return iexec.Result{
				Status: "pass", Phase: "trainer", Skill: "trainer",
				RunnerOutput: `[{"type":"sft","moment":"first-pass clean TDD","score":4}]`,
				Verified: true, ModelUsed: "self", Message: "1 sft candidate found",
			}, nil
		}
		// writer call
		writerTask = req.TaskPrompt
		return iexec.Result{
			Status: "pass", Phase: "trainer", Skill: "trainer",
			FilePath: sessDir + "/training-data/sft/sess-1.jsonl",
			Verified: true, ModelUsed: "self", Message: "1 sft pair written",
		}, nil
	}

	sk := trainer.New(trainer.Config{
		ReaderPrompt: "reader rules",
		WriterPrompt: "writer rules",
		ExecutorFn:   fakeFn,
		SessionsDir:  sessDir,
		BrainDir:     t.TempDir(),
	})
	out, err := sk.Handle(context.Background(), "trainer", json.RawMessage(`{"session_id":"sess-1"}`))
	require.NoError(t, err)

	assert.Equal(t, 2, callCount, "executor must be called exactly twice: reader then writer")
	assert.Contains(t, readerTask, "role: reader")
	assert.Contains(t, readerTask, "sess-1")
	assert.Contains(t, readerTask, "wrote failing test") // session history in reader prompt
	assert.Contains(t, writerTask, "role: writer")
	assert.Contains(t, writerTask, "sft candidate") // reader output passed to writer

	var result iexec.Result
	require.NoError(t, json.Unmarshal(out, &result))
	assert.Equal(t, "trainer", result.Phase)
	assert.Equal(t, "pass", result.Status)
}

var _ = require.New

Step 2: Run test to confirm it fails

go test ./internal/skills/trainer/... -v

Expected: FAIL — package does not exist

Step 3: Create skill.go

// internal/skills/trainer/skill.go
package trainer

import (
	"context"
	"encoding/json"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/registry"
)

type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)

type Config struct {
	ReaderPrompt string
	WriterPrompt string
	DefaultModel string
	ExecutorFn   ExecutorFn
	SessionsDir  string
	BrainDir     string // root of brain/ directory; writer writes to BrainDir/training-data/
}

type Skill struct{ cfg Config }

func New(cfg Config) *Skill { return &Skill{cfg: cfg} }

func (s *Skill) Name() string { return "trainer" }

func (s *Skill) Tools() []registry.ToolDef {
	schema := func(required []string, props map[string]any) json.RawMessage {
		b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
		return b
	}
	return []registry.ToolDef{
		{
			Name:        "trainer",
			Description: "Extract SFT and DPO training pairs from a session log. Runs a reader→writer chain: reader identifies learning moments, writer formats and writes pairs to brain/training-data/.",
			InputSchema: schema(
				[]string{"session_id"},
				map[string]any{
					"session_id": map[string]any{"type": "string"},
					"model":      map[string]any{"type": "string"},
				},
			),
		},
	}
}

Step 4: Create handlers.go

// internal/skills/trainer/handlers.go
package trainer

import (
	"context"
	"encoding/json"
	"fmt"

	iexec "github.com/mathiasbq/supervisor/internal/exec"
	"github.com/mathiasbq/supervisor/internal/session"
)

type trainArgs struct {
	SessionID string `json:"session_id"`
	Model     string `json:"model"`
}

func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
	if tool != "trainer" {
		return nil, fmt.Errorf("unknown tool: %s", tool)
	}
	var a trainArgs
	if err := json.Unmarshal(args, &a); err != nil {
		return nil, fmt.Errorf("parse args: %w", err)
	}
	if a.SessionID == "" {
		return nil, fmt.Errorf("session_id is required")
	}
	if s.cfg.ExecutorFn == nil {
		return nil, fmt.Errorf("no executor configured")
	}

	model := a.Model
	if model == "" {
		model = s.cfg.DefaultModel
	}

	entries, err := session.Read(s.cfg.SessionsDir, a.SessionID)
	if err != nil {
		return nil, fmt.Errorf("read session log: %w", err)
	}

	// ── Step 1: Reader agent ─────────────────────────────────────────────────
	history := session.FormatHistory(entries, "")
	readerTask := fmt.Sprintf(
		"role: reader\nsession_id: %s\nbrain_dir: %s\n\n%s",
		a.SessionID, s.cfg.BrainDir, history,
	)
	readerResult, err := s.cfg.ExecutorFn(ctx, iexec.Request{
		SkillPrompt: s.cfg.ReaderPrompt,
		TaskPrompt:  readerTask,
		Model:       model,
		Tools:       "Read",
	})
	if err != nil {
		return nil, fmt.Errorf("reader agent: %w", err)
	}

	// ── Step 2: Writer agent (receives reader candidates) ────────────────────
	writerTask := fmt.Sprintf(
		"role: writer\nsession_id: %s\nbrain_dir: %s\n\nreader_candidates:\n%s",
		a.SessionID, s.cfg.BrainDir, readerResult.RunnerOutput,
	)
	writerResult, err := s.cfg.ExecutorFn(ctx, iexec.Request{
		SkillPrompt: s.cfg.WriterPrompt,
		TaskPrompt:  writerTask,
		Model:       model,
		Tools:       "Read,Write",
	})
	if err != nil {
		return nil, fmt.Errorf("writer agent: %w", err)
	}

	b, err := json.Marshal(writerResult)
	if err != nil {
		return nil, fmt.Errorf("marshal result: %w", err)
	}
	return b, nil
}

Step 5: Create config/supervisor/trainer-reader.md

# Trainer Reader Discipline

You scan session logs and identify candidate learning moments worth converting to training data.

## What to look for
- **SFT candidates**: the worker did exactly the right thing — a clean pattern worth reinforcing
- **DPO candidates**: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen

## Scoring (1–5)
- 5: novel pattern, clearly correct, generalises across projects
- 4: good pattern, correct, somewhat project-specific but still useful
- 3: correct but obvious — include only if especially clean
- 2 or below: skip — too ambiguous or too context-specific

## Output contract
Return JSON result with:
- `status`: "pass" or "error"
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: ""
- `runner_output`: JSON array of candidates (valid JSON, not markdown):
  [{"type":"sft","moment":"<what happened>","prompt":"<what was asked>","completion":"<what was done right>","score":4},
   {"type":"dpo","moment":"<what happened>","prompt":"<what was asked>","chosen":"<correct>","rejected":"<incorrect>","score":3}]
- `verified`: true
- `message`: "N sft candidates, M dpo candidates found"

## Rules
1. Read all session entries in the task prompt
2. Score each entry — only include entries scoring >= 3
3. Prompt/completion fields must be phrased to generalise: no project-specific paths or names
4. If no candidates score >= 3, return an empty array `[]` — never force low-quality candidates

Step 6: Create config/supervisor/trainer-writer.md

# Trainer Writer Discipline

You receive candidate learning moments from the reader and write clean SFT/DPO training pairs.

## Quality gate (apply before writing)
- SFT: prompt must be phrased so it could come from any project, not just this one
- DPO: chosen and rejected must be clearly distinguishable — skip if a reader can't tell which is better
- Never include project-specific paths, variable names, or identifiers in any pair

## Output contract
Return JSON result with:
- `status`: "pass" (pairs written or skipped due to quality) or "error" (candidates JSON was malformed)
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: path of the last file written (empty if nothing passed quality gate)
- `runner_output`: "N SFT pairs written to brain/training-data/sft/, M DPO pairs to brain/training-data/dpo/" or "0 pairs passed quality gate"
- `verified`: true if files were written; false if nothing passed
- `message`: "N sft + M dpo pairs for session <id>" or "no pairs passed quality gate"

## File format
JSONL — one JSON object per line.

SFT: `{"prompt": "...", "completion": "..."}`
DPO: `{"prompt": "...", "chosen": "...", "rejected": "..."}`

Write SFT to: `<brain_dir>/training-data/sft/<session_id>.jsonl`
Write DPO to: `<brain_dir>/training-data/dpo/<session_id>.jsonl`

Append to existing files if they exist (don't overwrite).

## Rules
1. Parse the `reader_candidates` JSON from the task prompt
2. For each candidate: apply quality gate
3. Write passing SFT candidates to sft JSONL, DPO candidates to dpo JSONL
4. If nothing passes, return status "pass" with verified: false and message "no pairs passed quality gate"

Step 7: Wire into main.go

Add file reads:

trainerReaderPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-reader.md")
if err != nil {
    logger.Error("read trainer-reader.md", "path", cfg.ConfigDir+"/trainer-reader.md", "err", err)
    os.Exit(1)
}
trainerWriterPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-writer.md")
if err != nil {
    logger.Error("read trainer-writer.md", "path", cfg.ConfigDir+"/trainer-writer.md", "err", err)
    os.Exit(1)
}

Add import: "github.com/mathiasbq/supervisor/internal/skills/trainer"

reg.Register(trainer.New(trainer.Config{
    ReaderPrompt: string(trainerReaderPrompt),
    WriterPrompt: string(trainerWriterPrompt),
    DefaultModel: models.Resolve("trainer", ""),
    ExecutorFn:   executor.Run,
    SessionsDir:  cfg.SessionsDir,
    BrainDir:     cfg.BrainDir,
}))

Step 8: Update config/models.yaml

Add trainer entry following existing format:

skills:
  tdd:           ollama/qwen3-coder-30b-tuned
  review:        ollama/devstral-tuned
  debug:         ollama/deepseek-r1-tuned
  retrospective: ollama/qwen3-coder-30b-tuned
  spec:          ollama/qwen3-coder-30b-tuned
  trainer:       ollama/qwen3-coder-30b-tuned

Step 9: Run all tests

go test ./... -race -count=1

Expected: all tests PASS, including TestTrainerCallsReaderThenWriter

Step 10: Commit

git add internal/skills/trainer/ config/supervisor/trainer-reader.md config/supervisor/trainer-writer.md config/models.yaml cmd/supervisor/main.go
git commit -m "feat(trainer): add trainer MCP skill with reader→writer sub-agent chain"

Task 8: Integration smoke test and CI push

Files: none new — validates the full system

Step 1: Run task check (full quality gate)

task check

Expected: lint, test, and vet all pass for both modules

Step 2: Verify all 12 MCP tools are registered

With servers running (task start in another terminal):

curl -s -X POST http://localhost:3200/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' \
  | python3 -c "import json,sys; tools=json.load(sys.stdin)['result']['tools']; [print(t['name']) for t in tools]"

Expected output (12 tools):

tdd_red
tdd_green
tdd_refactor
brain_query
brain_write
tier
session_log
retrospective
review
debug
spec
trainer

Step 3: Smoke test each new skill

# review
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"review","arguments":{"project_root":"/tmp","files":["nonexistent.go"],"context":"test"}}}'

# debug
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"debug","arguments":{"project_root":"/tmp","error":"test error"}}}'

# spec
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"spec","arguments":{"project_root":"/tmp","requirements":"add login button"}}}'

# trainer (requires a session log entry)
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":5,"method":"tools/call","params":{"name":"trainer","arguments":{"session_id":"2026-04-17-validate-hyperguild"}}}'

Each should return a valid JSON-RPC result (not -32000 error). The actual worker call will take longer — --max-time 10 just tests that the MCP dispatch layer works.

Step 4: Push and verify CI

git push origin main

Watch the CI run complete on Gitea. Check for:

check job: green (lint + test + vet)
mirror job: green (pushed to GitHub)

curl -s "https://gitea.d-ma.be/api/v1/repos/mathias/hyperguild/actions/runs?limit=1" \
  -H "Authorization: token $GITEA_TOKEN" | python3 -c "
import json,sys
d=json.load(sys.stdin)
for r in d.get('workflow_runs',[]):
    print(f'#{r[\"id\"]} {r[\"status\"]:12} {r.get(\"conclusion\",\"\"):8} {r[\"display_title\"][:50]}')
"

Step 5: Tag v0.2.0

task tag version=v0.2.0

Self-review

Spec coverage check:

Session history injection into tdd_green and tdd_refactor → Task 3 ✓
All new skills accept session_id for history injection → Tasks 4–7 ✓
Trainer reader→writer chain → Task 7 ✓
Schema enum fixed (was causing retrospective to return wrong phase) → Task 2 ✓
Phase 2 skills registered in main.go → Tasks 4–7 each include main.go wiring ✓
CI passes → Task 8 ✓

Placeholder scan: None found — all steps include complete code.

Type consistency:

ExecutorFn is defined in each skill package as func(ctx context.Context, req iexec.Request) (iexec.Result, error) — consistent across tdd, review, debug, spec, trainer ✓
Config.SessionsDir present in all new skills ✓
trainer.Config.BrainDir used in handlers.go Task 7 writer task prompt ✓
session.FormatHistory signature (entries []Entry, excludePhase string) string used consistently ✓
prependHistory method is defined identically in review, debug, spec handlers — this is intentional duplication (YAGNI: not enough skills to justify extracting a shared mixin) ✓

Note on tasks 4–7: These are independent of each other and can be executed in parallel by separate subagents after Tasks 1–3 are complete.

55 KiB Raw Permalink Blame History Unescape Escape

Hyperguild Phase 2 Implementation Plan

File Map

New files

Modified files

Task 1: Session history utility

Task 2: Fix Schema enum and validPhases

Task 3: Session history injection in TDD green and refactor

Task 4: review skill

Task 5: debug skill

Task 6: spec skill

Task 7: trainer skill

Task 8: Integration smoke test and CI push

Self-review

55 KiB

Raw Permalink Blame History