Phase 1 of mathias/skills extraction (infra#62 Track D — homelab next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill dirs + SKILLS_INDEX.md) and adds the installation surface: - Taskfile.yml — install / update / list / release / check targets - install.sh — bootstrap installer for hosts without Task. Idempotent symlink wirer; default checkout at ~/.local/share/skills/ on every host; SKILLS_REF env var pins a tag (default: main). - .gitea/workflows/release.yml — auto-tag every push to main by Bump-Type footer (major/minor/patch, default patch). Skipped when commit contains [skip-release]. - README — usage, versioning, contribution flow, secret-hygiene rule. Phase 1 wires Claude Code only (~/.claude/skills/<name> global + <repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode, antigravity, and gitea-resident agents (cobalt-dingo, agentsquad) once their skill conventions are researched. Public repo, markdown-only — no secrets, no client names. Verified via pre-push grep before initial push. [skip-release]
12 KiB
name, description
| name | description |
|---|---|
| tdd | Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes. |
Test-Driven Development (TDD)
Overview
Write the test first. Watch it fail. Write minimal code to pass.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
Violating the letter of the rules is violating the spirit of the rules.
When to Use
Always:
- New features
- Bug fixes
- Refactoring
- Behavior changes
Exceptions (ask Mathias):
- Throwaway prototypes
- Generated code (sqlc output, templ output)
- Configuration files
Thinking "skip TDD just this once"? Stop. That's rationalization.
The Iron Law
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over.
No exceptions:
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
Implement fresh from tests. Period.
Red-Green-Refactor
RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next)
RED - Write Failing Test
Write one minimal test showing what should happen.
Go example (good):
func TestRetryOperation_RetriesThreeTimes(t *testing.T) {
attempts := 0
op := func() error {
attempts++
if attempts < 3 {
return errors.New("fail")
}
return nil
}
err := RetryOperation(op, 3)
require.NoError(t, err)
assert.Equal(t, 3, attempts)
}
Clear name, tests real behavior, one thing.
Go example (bad):
func TestRetry(t *testing.T) {
// Vague name, tests nothing meaningful
err := RetryOperation(nil, 0)
assert.NoError(t, err)
}
Requirements:
- One behavior per test
- Clear name describing the behavior
- Real code (no mocks unless crossing a system boundary)
Verify RED - Watch It Fail
MANDATORY. Never skip.
go test -run TestRetryOperation_RetriesThreeTimes ./...
Confirm:
- Test fails (not compilation errors)
- Failure message is expected
- Fails because feature is missing (not typos)
Test passes? You're testing existing behavior. Fix test.
Compilation errors? Fix errors, re-run until it fails correctly.
GREEN - Minimal Code
Write simplest code to pass the test.
Good:
func RetryOperation(op func() error, maxRetries int) error {
for i := 0; i < maxRetries; i++ {
if err := op(); err == nil {
return nil
}
}
return op()
}
Just enough to pass.
Bad:
func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error {
// YAGNI — don't add options, backoff, jitter before the test demands it
}
Over-engineered.
Don't add features, refactor other code, or "improve" beyond what the test demands.
Verify GREEN - Watch It Pass
MANDATORY.
go test ./...
Confirm:
- Test passes
- All other tests still pass
- No race conditions:
go test -race ./...
Test fails? Fix code, not test.
Other tests fail? Fix now.
REFACTOR - Clean Up
After green only:
- Remove duplication
- Improve names
- Extract helpers
- Apply clean code principles (load
clean-codeskill)
Keep tests green. Don't add behavior.
Repeat
Next failing test for next behavior.
Go-Specific TDD Notes
Table-Driven Tests (Preferred Pattern)
func TestValidateEmail(t *testing.T) {
tests := []struct {
name string
email string
wantErr bool
}{
{"valid email", "user@example.com", false},
{"empty email", "", true},
{"no at sign", "userexample.com", true},
{"no domain", "user@", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := ValidateEmail(tt.email)
if tt.wantErr {
assert.Error(t, err)
} else {
assert.NoError(t, err)
}
})
}
}
Table-driven tests are the Go idiom. Use them for behavior that varies across inputs.
Subtests with t.Run
Use t.Run to name subtests clearly. Failure messages include the subtest name.
t.Run("rejects empty input", func(t *testing.T) { ... })
t.Run("accepts valid UUID", func(t *testing.T) { ... })
Test File Conventions
- File:
<package>_test.goin the same directory - Package:
package foo_testfor black-box testing (preferred),package foofor white-box - Helpers: use
t.Helper()so stack traces point to the caller, not the helper
func assertNoError(t *testing.T, err error) {
t.Helper()
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
Running Tests
go test ./... # all packages
go test -run TestFoo ./pkg/... # specific test
go test -run TestFoo/subtest ./... # specific subtest
go test -race ./... # race detector (always run before commit)
go test -cover ./... # coverage
go test -v ./... # verbose
testify
Pre-approved. Use assert (continues on failure) and require (stops on failure):
require.NoError(t, err) // fatal if error
assert.Equal(t, expected, got) // non-fatal comparison
assert.ErrorIs(t, err, ErrFoo) // error chain check
Good Tests
| Quality | Good | Bad |
|---|---|---|
| Minimal | One thing. "and" in name? Split it. | TestValidatesEmailAndDomainAndWhitespace |
| Clear | Name describes behavior | TestFoo, Test1 |
| Shows intent | Demonstrates desired API | Obscures what code should do |
| Table-driven | Multiple cases, one test function | Copy-pasted test functions |
Common Rationalizations
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |
Red Flags - STOP and Start Over
- Code before test
- Test after implementation
- Test passes immediately without seeing it fail
- Can't explain why test failed
- Tests added "later"
- Rationalizing "just this once"
- "Already manually tested it"
- "Tests after achieve the same purpose"
- "Keep as reference" or "adapt existing code"
- "Already spent X hours, deleting is wasteful"
- "TDD is dogmatic, I'm being pragmatic"
- "This is different because..."
All of these mean: Delete code. Start over with TDD.
Example: Bug Fix
Bug: Empty email accepted
RED
func TestSubmitForm_RejectsEmptyEmail(t *testing.T) {
result := submitForm(FormData{Email: ""})
assert.Equal(t, "email required", result.Error)
}
Verify RED
$ go test -run TestSubmitForm_RejectsEmptyEmail ./...
FAIL: expected "email required", got ""
GREEN
func submitForm(data FormData) FormResult {
if strings.TrimSpace(data.Email) == "" {
return FormResult{Error: "email required"}
}
// ...
}
Verify GREEN
$ go test ./...
ok example.com/myapp 0.003s
REFACTOR Extract validation for multiple fields if needed.
Verification Checklist
Before marking work complete:
- Every new function/method has a test
- Watched each test fail before implementing
- Each test failed for expected reason (feature missing, not typo)
- Wrote minimal code to pass each test
- All tests pass:
go test ./... - Race detector clean:
go test -race ./... - Tests use real code (mocks only if crossing a system boundary)
- Edge cases and errors covered
Can't check all boxes? You skipped TDD. Start over.
When Stuck
| Problem | Solution |
|---|---|
| Don't know how to test | Write wished-for API. Write assertion first. Ask Mathias. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers with t.Helper(). Still complex? Simplify design. |
Debugging Integration
Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
Never fix bugs without a test.
Testing Anti-Patterns
When adding test utilities or mocks, load the references/testing-anti-patterns.md to avoid:
- Testing mock behavior instead of real behavior
- Adding test-only methods to production types
- Mocking without understanding what the dependency does
Brain MCP Integration
The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot.
At session start:
- Run
brain_querywith the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new.
Never:
- Embed brain content in test code or assertions. The brain is context for you, not state for the system under test.
Logging
Call session_log once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the /pass-rate HTTP endpoint, which
treats pass as success, fail as failure, skip as neither.
At end of red phase:
session_logwith{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}
At end of green phase:
session_logwith{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}
At end of refactor phase:
session_logwith{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}
Status semantics:
pass— the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).fail— the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).skip— phase was skipped intentionally (e.g. refactor not warranted).
Why this matters: the routing pod (Plan 6) reads pass-rate to decide whether to route a future tdd call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.
Final Rule
Production code → test exists and failed first
Otherwise → not TDD
No exceptions without Mathias's permission.
Mode 2 Routing Note
This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.