skills/tdd/SKILL.md

---
name: tdd
description: Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes.
---

# Test-Driven Development (TDD)

## Overview

Write the test first. Watch it fail. Write minimal code to pass.

**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.

**Violating the letter of the rules is violating the spirit of the rules.**

## When to Use

**Always:**
- New features
- Bug fixes
- Refactoring
- Behavior changes

**Exceptions (ask Mathias):**
- Throwaway prototypes
- Generated code (sqlc output, templ output)
- Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization.

## The Iron Law

```
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
```

Write code before the test? Delete it. Start over.

**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete

Implement fresh from tests. Period.

## Red-Green-Refactor

```
RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next)
```

### RED - Write Failing Test

Write one minimal test showing what should happen.

**Go example (good):**
```go
func TestRetryOperation_RetriesThreeTimes(t *testing.T) {
    attempts := 0
    op := func() error {
        attempts++
        if attempts < 3 {
            return errors.New("fail")
        }
        return nil
    }

    err := RetryOperation(op, 3)

    require.NoError(t, err)
    assert.Equal(t, 3, attempts)
}
```
Clear name, tests real behavior, one thing.

**Go example (bad):**
```go
func TestRetry(t *testing.T) {
    // Vague name, tests nothing meaningful
    err := RetryOperation(nil, 0)
    assert.NoError(t, err)
}
```

**Requirements:**
- One behavior per test
- Clear name describing the behavior
- Real code (no mocks unless crossing a system boundary)

### Verify RED - Watch It Fail

**MANDATORY. Never skip.**

```bash
go test -run TestRetryOperation_RetriesThreeTimes ./...
```

Confirm:
- Test fails (not compilation errors)
- Failure message is expected
- Fails because feature is missing (not typos)

**Test passes?** You're testing existing behavior. Fix test.

**Compilation errors?** Fix errors, re-run until it fails correctly.

### GREEN - Minimal Code

Write simplest code to pass the test.

**Good:**
```go
func RetryOperation(op func() error, maxRetries int) error {
    for i := 0; i < maxRetries; i++ {
        if err := op(); err == nil {
            return nil
        }
    }
    return op()
}
```
Just enough to pass.

**Bad:**
```go
func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error {
    // YAGNI — don't add options, backoff, jitter before the test demands it
}
```
Over-engineered.

Don't add features, refactor other code, or "improve" beyond what the test demands.

### Verify GREEN - Watch It Pass

**MANDATORY.**

```bash
go test ./...
```

Confirm:
- Test passes
- All other tests still pass
- No race conditions: `go test -race ./...`

**Test fails?** Fix code, not test.

**Other tests fail?** Fix now.

### REFACTOR - Clean Up

After green only:
- Remove duplication
- Improve names
- Extract helpers
- Apply clean code principles (load `clean-code` skill)

Keep tests green. Don't add behavior.

### Repeat

Next failing test for next behavior.

## Go-Specific TDD Notes

### Table-Driven Tests (Preferred Pattern)

```go
func TestValidateEmail(t *testing.T) {
    tests := []struct {
        name    string
        email   string
        wantErr bool
    }{
        {"valid email", "user@example.com", false},
        {"empty email", "", true},
        {"no at sign", "userexample.com", true},
        {"no domain", "user@", true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            err := ValidateEmail(tt.email)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}
```

Table-driven tests are the Go idiom. Use them for behavior that varies across inputs.

### Subtests with t.Run

Use `t.Run` to name subtests clearly. Failure messages include the subtest name.

```go
t.Run("rejects empty input", func(t *testing.T) { ... })
t.Run("accepts valid UUID", func(t *testing.T) { ... })
```

### Test File Conventions

- File: `<package>_test.go` in the same directory
- Package: `package foo_test` for black-box testing (preferred), `package foo` for white-box
- Helpers: use `t.Helper()` so stack traces point to the caller, not the helper

```go
func assertNoError(t *testing.T, err error) {
    t.Helper()
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
}
```

### Running Tests

```bash
go test ./...                        # all packages
go test -run TestFoo ./pkg/...       # specific test
go test -run TestFoo/subtest ./...   # specific subtest
go test -race ./...                  # race detector (always run before commit)
go test -cover ./...                 # coverage
go test -v ./...                     # verbose
```

### testify

Pre-approved. Use `assert` (continues on failure) and `require` (stops on failure):

```go
require.NoError(t, err)          // fatal if error
assert.Equal(t, expected, got)   // non-fatal comparison
assert.ErrorIs(t, err, ErrFoo)   // error chain check
```

## Good Tests

| Quality | Good | Bad |
|---------|------|-----|
| **Minimal** | One thing. "and" in name? Split it. | `TestValidatesEmailAndDomainAndWhitespace` |
| **Clear** | Name describes behavior | `TestFoo`, `Test1` |
| **Shows intent** | Demonstrates desired API | Obscures what code should do |
| **Table-driven** | Multiple cases, one test function | Copy-pasted test functions |

## Common Rationalizations

| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |

## Red Flags - STOP and Start Over

- Code before test
- Test after implementation
- Test passes immediately without seeing it fail
- Can't explain why test failed
- Tests added "later"
- Rationalizing "just this once"
- "Already manually tested it"
- "Tests after achieve the same purpose"
- "Keep as reference" or "adapt existing code"
- "Already spent X hours, deleting is wasteful"
- "TDD is dogmatic, I'm being pragmatic"
- "This is different because..."

**All of these mean: Delete code. Start over with TDD.**

## Example: Bug Fix

**Bug:** Empty email accepted

**RED**
```go
func TestSubmitForm_RejectsEmptyEmail(t *testing.T) {
    result := submitForm(FormData{Email: ""})
    assert.Equal(t, "email required", result.Error)
}
```

**Verify RED**
```bash
$ go test -run TestSubmitForm_RejectsEmptyEmail ./...
FAIL: expected "email required", got ""
```

**GREEN**
```go
func submitForm(data FormData) FormResult {
    if strings.TrimSpace(data.Email) == "" {
        return FormResult{Error: "email required"}
    }
    // ...
}
```

**Verify GREEN**
```bash
$ go test ./...
ok  	example.com/myapp	0.003s
```

**REFACTOR**
Extract validation for multiple fields if needed.

## Verification Checklist

Before marking work complete:

- [ ] Every new function/method has a test
- [ ] Watched each test fail before implementing
- [ ] Each test failed for expected reason (feature missing, not typo)
- [ ] Wrote minimal code to pass each test
- [ ] All tests pass: `go test ./...`
- [ ] Race detector clean: `go test -race ./...`
- [ ] Tests use real code (mocks only if crossing a system boundary)
- [ ] Edge cases and errors covered

Can't check all boxes? You skipped TDD. Start over.

## When Stuck

| Problem | Solution |
|---------|----------|
| Don't know how to test | Write wished-for API. Write assertion first. Ask Mathias. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers with `t.Helper()`. Still complex? Simplify design. |

## Debugging Integration

Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.

Never fix bugs without a test.

## Testing Anti-Patterns

When adding test utilities or mocks, load the `references/testing-anti-patterns.md` to avoid:
- Testing mock behavior instead of real behavior
- Adding test-only methods to production types
- Mocking without understanding what the dependency does

## Brain MCP Integration

The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot.

**At session start:**
- Run `brain_query` with the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new.

**Never:**
- Embed brain content in test code or assertions. The brain is context for you, not state for the system under test.

### Logging

Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.

**At end of `red` phase:**
- `session_log` with `{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`

**At end of `green` phase:**
- `session_log` with `{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`

**At end of `refactor` phase:**
- `session_log` with `{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`

**Status semantics:**
- `pass` — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).
- `fail` — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).
- `skip` — phase was skipped intentionally (e.g. refactor not warranted).

**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future `tdd` call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.

## Final Rule

```
Production code → test exists and failed first
Otherwise → not TDD
```

No exceptions without Mathias's permission.

## Mode 2 Routing Note

This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.