chore: bootstrap skills library — 19 skills + installer + CI auto-tag

Phase 1 of mathias/skills extraction (infra#62 Track D — homelab next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill dirs + SKILLS_INDEX.md) and adds the installation surface: - Taskfile.yml — install / update / list / release / check targets - install.sh — bootstrap installer for hosts without Task. Idempotent symlink wirer; default checkout at ~/.local/share/skills/ on every host; SKILLS_REF env var pins a tag (default: main). - .gitea/workflows/release.yml — auto-tag every push to main by Bump-Type footer (major/minor/patch, default patch). Skipped when commit contains [skip-release]. - README — usage, versioning, contribution flow, secret-hygiene rule. Phase 1 wires Claude Code only (~/.claude/skills/<name> global + <repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode, antigravity, and gitea-resident agents (cobalt-dingo, agentsquad) once their skill conventions are researched. Public repo, markdown-only — no secrets, no client names. Verified via pre-push grep before initial push. [skip-release]
2026-05-24 14:59:54 +02:00
commit d6a71e370e
33 changed files with 8688 additions and 0 deletions
--- a/tdd/SKILL.md
+++ b/tdd/SKILL.md
@@ -0,0 +1,401 @@
+---
+name: tdd
+description: Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes.
+---
+
+# Test-Driven Development (TDD)
+
+## Overview
+
+Write the test first. Watch it fail. Write minimal code to pass.
+
+**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
+
+**Violating the letter of the rules is violating the spirit of the rules.**
+
+## When to Use
+
+**Always:**
+- New features
+- Bug fixes
+- Refactoring
+- Behavior changes
+
+**Exceptions (ask Mathias):**
+- Throwaway prototypes
+- Generated code (sqlc output, templ output)
+- Configuration files
+
+Thinking "skip TDD just this once"? Stop. That's rationalization.
+
+## The Iron Law
+
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+
+Write code before the test? Delete it. Start over.
+
+**No exceptions:**
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+
+Implement fresh from tests. Period.
+
+## Red-Green-Refactor
+
+```
+RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next)
+```
+
+### RED - Write Failing Test
+
+Write one minimal test showing what should happen.
+
+**Go example (good):**
+```go
+func TestRetryOperation_RetriesThreeTimes(t *testing.T) {
+    attempts := 0
+    op := func() error {
+        attempts++
+        if attempts < 3 {
+            return errors.New("fail")
+        }
+        return nil
+    }
+
+    err := RetryOperation(op, 3)
+
+    require.NoError(t, err)
+    assert.Equal(t, 3, attempts)
+}
+```
+Clear name, tests real behavior, one thing.
+
+**Go example (bad):**
+```go
+func TestRetry(t *testing.T) {
+    // Vague name, tests nothing meaningful
+    err := RetryOperation(nil, 0)
+    assert.NoError(t, err)
+}
+```
+
+**Requirements:**
+- One behavior per test
+- Clear name describing the behavior
+- Real code (no mocks unless crossing a system boundary)
+
+### Verify RED - Watch It Fail
+
+**MANDATORY. Never skip.**
+
+```bash
+go test -run TestRetryOperation_RetriesThreeTimes ./...
+```
+
+Confirm:
+- Test fails (not compilation errors)
+- Failure message is expected
+- Fails because feature is missing (not typos)
+
+**Test passes?** You're testing existing behavior. Fix test.
+
+**Compilation errors?** Fix errors, re-run until it fails correctly.
+
+### GREEN - Minimal Code
+
+Write simplest code to pass the test.
+
+**Good:**
+```go
+func RetryOperation(op func() error, maxRetries int) error {
+    for i := 0; i < maxRetries; i++ {
+        if err := op(); err == nil {
+            return nil
+        }
+    }
+    return op()
+}
+```
+Just enough to pass.
+
+**Bad:**
+```go
+func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error {
+    // YAGNI — don't add options, backoff, jitter before the test demands it
+}
+```
+Over-engineered.
+
+Don't add features, refactor other code, or "improve" beyond what the test demands.
+
+### Verify GREEN - Watch It Pass
+
+**MANDATORY.**
+
+```bash
+go test ./...
+```
+
+Confirm:
+- Test passes
+- All other tests still pass
+- No race conditions: `go test -race ./...`
+
+**Test fails?** Fix code, not test.
+
+**Other tests fail?** Fix now.
+
+### REFACTOR - Clean Up
+
+After green only:
+- Remove duplication
+- Improve names
+- Extract helpers
+- Apply clean code principles (load `clean-code` skill)
+
+Keep tests green. Don't add behavior.
+
+### Repeat
+
+Next failing test for next behavior.
+
+## Go-Specific TDD Notes
+
+### Table-Driven Tests (Preferred Pattern)
+
+```go
+func TestValidateEmail(t *testing.T) {
+    tests := []struct {
+        name    string
+        email   string
+        wantErr bool
+    }{
+        {"valid email", "user@example.com", false},
+        {"empty email", "", true},
+        {"no at sign", "userexample.com", true},
+        {"no domain", "user@", true},
+    }
+
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            err := ValidateEmail(tt.email)
+            if tt.wantErr {
+                assert.Error(t, err)
+            } else {
+                assert.NoError(t, err)
+            }
+        })
+    }
+}
+```
+
+Table-driven tests are the Go idiom. Use them for behavior that varies across inputs.
+
+### Subtests with t.Run
+
+Use `t.Run` to name subtests clearly. Failure messages include the subtest name.
+
+```go
+t.Run("rejects empty input", func(t *testing.T) { ... })
+t.Run("accepts valid UUID", func(t *testing.T) { ... })
+```
+
+### Test File Conventions
+
+- File: `<package>_test.go` in the same directory
+- Package: `package foo_test` for black-box testing (preferred), `package foo` for white-box
+- Helpers: use `t.Helper()` so stack traces point to the caller, not the helper
+
+```go
+func assertNoError(t *testing.T, err error) {
+    t.Helper()
+    if err != nil {
+        t.Fatalf("unexpected error: %v", err)
+    }
+}
+```
+
+### Running Tests
+
+```bash
+go test ./...                        # all packages
+go test -run TestFoo ./pkg/...       # specific test
+go test -run TestFoo/subtest ./...   # specific subtest
+go test -race ./...                  # race detector (always run before commit)
+go test -cover ./...                 # coverage
+go test -v ./...                     # verbose
+```
+
+### testify
+
+Pre-approved. Use `assert` (continues on failure) and `require` (stops on failure):
+
+```go
+require.NoError(t, err)          // fatal if error
+assert.Equal(t, expected, got)   // non-fatal comparison
+assert.ErrorIs(t, err, ErrFoo)   // error chain check
+```
+
+## Good Tests
+
+| Quality | Good | Bad |
+|---------|------|-----|
+| **Minimal** | One thing. "and" in name? Split it. | `TestValidatesEmailAndDomainAndWhitespace` |
+| **Clear** | Name describes behavior | `TestFoo`, `Test1` |
+| **Shows intent** | Demonstrates desired API | Obscures what code should do |
+| **Table-driven** | Multiple cases, one test function | Copy-pasted test functions |
+
+## Common Rationalizations
+
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
+| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
+| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
+| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
+| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
+| "Existing code has no tests" | You're improving it. Add tests for existing code. |
+
+## Red Flags - STOP and Start Over
+
+- Code before test
+- Test after implementation
+- Test passes immediately without seeing it fail
+- Can't explain why test failed
+- Tests added "later"
+- Rationalizing "just this once"
+- "Already manually tested it"
+- "Tests after achieve the same purpose"
+- "Keep as reference" or "adapt existing code"
+- "Already spent X hours, deleting is wasteful"
+- "TDD is dogmatic, I'm being pragmatic"
+- "This is different because..."
+
+**All of these mean: Delete code. Start over with TDD.**
+
+## Example: Bug Fix
+
+**Bug:** Empty email accepted
+
+**RED**
+```go
+func TestSubmitForm_RejectsEmptyEmail(t *testing.T) {
+    result := submitForm(FormData{Email: ""})
+    assert.Equal(t, "email required", result.Error)
+}
+```
+
+**Verify RED**
+```bash
+$ go test -run TestSubmitForm_RejectsEmptyEmail ./...
+FAIL: expected "email required", got ""
+```
+
+**GREEN**
+```go
+func submitForm(data FormData) FormResult {
+    if strings.TrimSpace(data.Email) == "" {
+        return FormResult{Error: "email required"}
+    }
+    // ...
+}
+```
+
+**Verify GREEN**
+```bash
+$ go test ./...
+ok  	example.com/myapp	0.003s
+```
+
+**REFACTOR**
+Extract validation for multiple fields if needed.
+
+## Verification Checklist
+
+Before marking work complete:
+
+- [ ] Every new function/method has a test
+- [ ] Watched each test fail before implementing
+- [ ] Each test failed for expected reason (feature missing, not typo)
+- [ ] Wrote minimal code to pass each test
+- [ ] All tests pass: `go test ./...`
+- [ ] Race detector clean: `go test -race ./...`
+- [ ] Tests use real code (mocks only if crossing a system boundary)
+- [ ] Edge cases and errors covered
+
+Can't check all boxes? You skipped TDD. Start over.
+
+## When Stuck
+
+| Problem | Solution |
+|---------|----------|
+| Don't know how to test | Write wished-for API. Write assertion first. Ask Mathias. |
+| Test too complicated | Design too complicated. Simplify interface. |
+| Must mock everything | Code too coupled. Use dependency injection. |
+| Test setup huge | Extract helpers with `t.Helper()`. Still complex? Simplify design. |
+
+## Debugging Integration
+
+Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
+
+Never fix bugs without a test.
+
+## Testing Anti-Patterns
+
+When adding test utilities or mocks, load the `references/testing-anti-patterns.md` to avoid:
+- Testing mock behavior instead of real behavior
+- Adding test-only methods to production types
+- Mocking without understanding what the dependency does
+
+## Brain MCP Integration
+
+The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot.
+
+**At session start:**
+- Run `brain_query` with the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new.
+
+**Never:**
+- Embed brain content in test code or assertions. The brain is context for you, not state for the system under test.
+
+### Logging
+
+Call `session_log` once at the end of every phase to record the outcome.
+Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
+treats `pass` as success, `fail` as failure, `skip` as neither.
+
+**At end of `red` phase:**
+- `session_log` with `{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
+
+**At end of `green` phase:**
+- `session_log` with `{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
+
+**At end of `refactor` phase:**
+- `session_log` with `{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
+
+**Status semantics:**
+- `pass` — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).
+- `fail` — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).
+- `skip` — phase was skipped intentionally (e.g. refactor not warranted).
+
+**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future `tdd` call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.
+
+## Final Rule
+
+```
+Production code → test exists and failed first
+Otherwise → not TDD
+```
+
+No exceptions without Mathias's permission.
+
+## Mode 2 Routing Note
+
+This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.
--- a/tdd/references/testing-anti-patterns.md
+++ b/tdd/references/testing-anti-patterns.md
@@ -0,0 +1,299 @@
+# Testing Anti-Patterns
+
+**Load this reference when:** writing or changing tests, adding mocks, or tempted to add test-only methods to production code.
+
+## Overview
+
+Tests must verify real behavior, not mock behavior. Mocks are a means to isolate, not the thing being tested.
+
+**Core principle:** Test what the code does, not what the mocks do.
+
+**Following strict TDD prevents these anti-patterns.**
+
+## The Iron Laws
+
+```
+1. NEVER test mock behavior
+2. NEVER add test-only methods to production classes
+3. NEVER mock without understanding dependencies
+```
+
+## Anti-Pattern 1: Testing Mock Behavior
+
+**The violation:**
+```typescript
+// ❌ BAD: Testing that the mock exists
+test('renders sidebar', () => {
+  render(<Page />);
+  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
+});
+```
+
+**Why this is wrong:**
+- You're verifying the mock works, not that the component works
+- Test passes when mock is present, fails when it's not
+- Tells you nothing about real behavior
+
+**your human partner's correction:** "Are we testing the behavior of a mock?"
+
+**The fix:**
+```typescript
+// ✅ GOOD: Test real component or don't mock it
+test('renders sidebar', () => {
+  render(<Page />);  // Don't mock sidebar
+  expect(screen.getByRole('navigation')).toBeInTheDocument();
+});
+
+// OR if sidebar must be mocked for isolation:
+// Don't assert on the mock - test Page's behavior with sidebar present
+```
+
+### Gate Function
+
+```
+BEFORE asserting on any mock element:
+  Ask: "Am I testing real component behavior or just mock existence?"
+
+  IF testing mock existence:
+    STOP - Delete the assertion or unmock the component
+
+  Test real behavior instead
+```
+
+## Anti-Pattern 2: Test-Only Methods in Production
+
+**The violation:**
+```typescript
+// ❌ BAD: destroy() only used in tests
+class Session {
+  async destroy() {  // Looks like production API!
+    await this._workspaceManager?.destroyWorkspace(this.id);
+    // ... cleanup
+  }
+}
+
+// In tests
+afterEach(() => session.destroy());
+```
+
+**Why this is wrong:**
+- Production class polluted with test-only code
+- Dangerous if accidentally called in production
+- Violates YAGNI and separation of concerns
+- Confuses object lifecycle with entity lifecycle
+
+**The fix:**
+```typescript
+// ✅ GOOD: Test utilities handle test cleanup
+// Session has no destroy() - it's stateless in production
+
+// In test-utils/
+export async function cleanupSession(session: Session) {
+  const workspace = session.getWorkspaceInfo();
+  if (workspace) {
+    await workspaceManager.destroyWorkspace(workspace.id);
+  }
+}
+
+// In tests
+afterEach(() => cleanupSession(session));
+```
+
+### Gate Function
+
+```
+BEFORE adding any method to production class:
+  Ask: "Is this only used by tests?"
+
+  IF yes:
+    STOP - Don't add it
+    Put it in test utilities instead
+
+  Ask: "Does this class own this resource's lifecycle?"
+
+  IF no:
+    STOP - Wrong class for this method
+```
+
+## Anti-Pattern 3: Mocking Without Understanding
+
+**The violation:**
+```typescript
+// ❌ BAD: Mock breaks test logic
+test('detects duplicate server', () => {
+  // Mock prevents config write that test depends on!
+  vi.mock('ToolCatalog', () => ({
+    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
+  }));
+
+  await addServer(config);
+  await addServer(config);  // Should throw - but won't!
+});
+```
+
+**Why this is wrong:**
+- Mocked method had side effect test depended on (writing config)
+- Over-mocking to "be safe" breaks actual behavior
+- Test passes for wrong reason or fails mysteriously
+
+**The fix:**
+```typescript
+// ✅ GOOD: Mock at correct level
+test('detects duplicate server', () => {
+  // Mock the slow part, preserve behavior test needs
+  vi.mock('MCPServerManager'); // Just mock slow server startup
+
+  await addServer(config);  // Config written
+  await addServer(config);  // Duplicate detected ✓
+});
+```
+
+### Gate Function
+
+```
+BEFORE mocking any method:
+  STOP - Don't mock yet
+
+  1. Ask: "What side effects does the real method have?"
+  2. Ask: "Does this test depend on any of those side effects?"
+  3. Ask: "Do I fully understand what this test needs?"
+
+  IF depends on side effects:
+    Mock at lower level (the actual slow/external operation)
+    OR use test doubles that preserve necessary behavior
+    NOT the high-level method the test depends on
+
+  IF unsure what test depends on:
+    Run test with real implementation FIRST
+    Observe what actually needs to happen
+    THEN add minimal mocking at the right level
+
+  Red flags:
+    - "I'll mock this to be safe"
+    - "This might be slow, better mock it"
+    - Mocking without understanding the dependency chain
+```
+
+## Anti-Pattern 4: Incomplete Mocks
+
+**The violation:**
+```typescript
+// ❌ BAD: Partial mock - only fields you think you need
+const mockResponse = {
+  status: 'success',
+  data: { userId: '123', name: 'Alice' }
+  // Missing: metadata that downstream code uses
+};
+
+// Later: breaks when code accesses response.metadata.requestId
+```
+
+**Why this is wrong:**
+- **Partial mocks hide structural assumptions** - You only mocked fields you know about
+- **Downstream code may depend on fields you didn't include** - Silent failures
+- **Tests pass but integration fails** - Mock incomplete, real API complete
+- **False confidence** - Test proves nothing about real behavior
+
+**The Iron Rule:** Mock the COMPLETE data structure as it exists in reality, not just fields your immediate test uses.
+
+**The fix:**
+```typescript
+// ✅ GOOD: Mirror real API completeness
+const mockResponse = {
+  status: 'success',
+  data: { userId: '123', name: 'Alice' },
+  metadata: { requestId: 'req-789', timestamp: 1234567890 }
+  // All fields real API returns
+};
+```
+
+### Gate Function
+
+```
+BEFORE creating mock responses:
+  Check: "What fields does the real API response contain?"
+
+  Actions:
+    1. Examine actual API response from docs/examples
+    2. Include ALL fields system might consume downstream
+    3. Verify mock matches real response schema completely
+
+  Critical:
+    If you're creating a mock, you must understand the ENTIRE structure
+    Partial mocks fail silently when code depends on omitted fields
+
+  If uncertain: Include all documented fields
+```
+
+## Anti-Pattern 5: Integration Tests as Afterthought
+
+**The violation:**
+```
+✅ Implementation complete
+❌ No tests written
+"Ready for testing"
+```
+
+**Why this is wrong:**
+- Testing is part of implementation, not optional follow-up
+- TDD would have caught this
+- Can't claim complete without tests
+
+**The fix:**
+```
+TDD cycle:
+1. Write failing test
+2. Implement to pass
+3. Refactor
+4. THEN claim complete
+```
+
+## When Mocks Become Too Complex
+
+**Warning signs:**
+- Mock setup longer than test logic
+- Mocking everything to make test pass
+- Mocks missing methods real components have
+- Test breaks when mock changes
+
+**your human partner's question:** "Do we need to be using a mock here?"
+
+**Consider:** Integration tests with real components often simpler than complex mocks
+
+## TDD Prevents These Anti-Patterns
+
+**Why TDD helps:**
+1. **Write test first** → Forces you to think about what you're actually testing
+2. **Watch it fail** → Confirms test tests real behavior, not mocks
+3. **Minimal implementation** → No test-only methods creep in
+4. **Real dependencies** → You see what the test actually needs before mocking
+
+**If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
+
+## Quick Reference
+
+| Anti-Pattern | Fix |
+|--------------|-----|
+| Assert on mock elements | Test real component or unmock it |
+| Test-only methods in production | Move to test utilities |
+| Mock without understanding | Understand dependencies first, mock minimally |
+| Incomplete mocks | Mirror real API completely |
+| Tests as afterthought | TDD - tests first |
+| Over-complex mocks | Consider integration tests |
+
+## Red Flags
+
+- Assertion checks for `*-mock` test IDs
+- Methods only called in test files
+- Mock setup is >50% of test
+- Test fails when you remove mock
+- Can't explain why mock is needed
+- Mocking "just to be safe"
+
+## The Bottom Line
+
+**Mocks are tools to isolate, not things to test.**
+
+If TDD reveals you're testing mock behavior, you've gone wrong.
+
+Fix: Test real behavior or question why you're mocking at all.