chore: bootstrap skills library — 19 skills + installer + CI auto-tag

Phase 1 of mathias/skills extraction (infra#62 Track D — homelab next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill dirs + SKILLS_INDEX.md) and adds the installation surface: - Taskfile.yml — install / update / list / release / check targets - install.sh — bootstrap installer for hosts without Task. Idempotent symlink wirer; default checkout at ~/.local/share/skills/ on every host; SKILLS_REF env var pins a tag (default: main). - .gitea/workflows/release.yml — auto-tag every push to main by Bump-Type footer (major/minor/patch, default patch). Skipped when commit contains [skip-release]. - README — usage, versioning, contribution flow, secret-hygiene rule. Phase 1 wires Claude Code only (~/.claude/skills/<name> global + <repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode, antigravity, and gitea-resident agents (cobalt-dingo, agentsquad) once their skill conventions are researched. Public repo, markdown-only — no secrets, no client names. Verified via pre-push grep before initial push. [skip-release]
2026-05-24 14:59:54 +02:00
commit d6a71e370e
33 changed files with 8688 additions and 0 deletions
--- a/test-design/SKILL.md
+++ b/test-design/SKILL.md
@@ -0,0 +1,308 @@
+---
+name: test-design
+description: Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification.
+---
+
+# Test Design
+
+## Overview
+
+Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.
+
+This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The **Farley Index** (0–10) provides a scored summary.
+
+Reference: [Dave Farley's Properties of Good Tests](https://www.linkedin.com/pulse/tdd-properties-good-tests-dave-farley-iexge/)
+
+## The 8 Properties
+
+| Property | Weight | What it measures |
+|----------|--------|-----------------|
+| **Understandable** | 1.5x | Can a reader understand what behavior is being tested? |
+| **Maintainable** | 1.5x | Will small code changes cause test failures unrelated to behavior? |
+| **Repeatable** | 1.25x | Same result every time, regardless of environment or order |
+| **Atomic** | 1.0x | One behavior per test; tests are independent |
+| **Necessary** | 1.0x | Tests real behavior, not mock internals or framework behavior |
+| **Granular** | 1.0x | Each test covers one specific case |
+| **Fast** | 0.75x | Tests run quickly enough to support rapid TDD cycles |
+| **First (TDD)** | 1.0x | Tests were written before implementation |
+
+**Farley Index formula:** `(U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0`
+
+## Rating Scale
+
+| Score | Rating | Interpretation |
+|-------|--------|----------------|
+| 9.0–10.0 | Exemplary | Model quality; tests serve as living documentation |
+| 7.5–8.9 | Excellent | High quality with minor improvement opportunities |
+| 6.0–7.4 | Good | Solid foundation with clear improvement areas |
+| 4.5–5.9 | Fair | Functional but needs significant attention |
+| 3.0–4.4 | Poor | Tests provide limited value; refactoring needed |
+| 0.0–2.9 | Critical | Tests may be harmful; consider rewriting |
+
+## Property Deep Dives
+
+### Understandable (U)
+
+A test should tell a story: what behavior, under what conditions, produces what result.
+
+**Go patterns that help:**
+- Subtest names in `t.Run`: `t.Run("returns error when email is empty", ...)`
+- Table-driven tests with descriptive `name` fields
+- Arrange-Act-Assert structure with blank lines separating sections
+
+```go
+// Good: clear behavior name, clear structure
+func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
+    // Arrange
+    user := User{Name: "Alice", Email: ""}
+
+    // Act
+    err := ValidateUser(user)
+
+    // Assert
+    require.Error(t, err)
+    assert.ErrorIs(t, err, ErrInvalidEmail)
+}
+
+// Bad: cryptic name, no structure
+func TestUser1(t *testing.T) {
+    u := User{}
+    assert.NotNil(t, ValidateUser(u))
+}
+```
+
+**Negative signals:** cryptic names (`test_1`, `TestFoo`), no AAA structure, multiple behaviors in one test.
+
+### Maintainable (M)
+
+Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.
+
+**Negative signals:**
+- Over-specified mock interactions (`assert.Called(mock, "MethodX", args...)` when behavior is all that matters)
+- ArgumentCaptor deep inspection
+- `verifyNoMoreInteractions` that breaks when you add a logging call
+- Tests coupled to internal field names
+
+**Go patterns that help:**
+- Test behavior via public API, not internal state
+- Avoid asserting on exact call counts unless the count IS the behavior
+
+```go
+// Bad: breaks when you add an audit log call
+mock.AssertCalled(t, "Save", user)
+mock.AssertNumberOfCalls(t, "Save", 1)
+mock.AssertNotCalled(t, "Log") // Breaks if you add logging later
+
+// Good: test the outcome
+result, err := service.CreateUser(ctx, req)
+require.NoError(t, err)
+assert.Equal(t, user.Email, result.Email)
+```
+
+### Repeatable (R)
+
+Tests must produce the same result regardless of when, where, or in what order they run.
+
+**Negative signals (Go):**
+- `time.Now()` in test logic without injection
+- `os.ReadFile` for fixtures that aren't hermetic
+- Shared global state between tests
+- Tests that depend on network availability
+- `time.Sleep` for synchronization
+
+**Go fixes:**
+```go
+// Bad: time-dependent
+func TestTokenExpiry(t *testing.T) {
+    token := generateToken()
+    time.Sleep(2 * time.Second)
+    assert.True(t, token.IsExpired())
+}
+
+// Good: inject clock
+type Clock interface {
+    Now() time.Time
+}
+
+type FixedClock struct{ t time.Time }
+func (c FixedClock) Now() time.Time { return c.t }
+
+func TestTokenExpiry(t *testing.T) {
+    clock := FixedClock{t: time.Unix(0, 0)}
+    token := generateTokenWithClock(clock)
+    futureClk := FixedClock{t: time.Unix(3600, 0)}
+    assert.True(t, token.IsExpiredAt(futureClk.Now()))
+}
+```
+
+Use `t.TempDir()` for filesystem fixtures — cleaned up automatically.
+
+### Atomic (A)
+
+One test = one behavior. Tests must be independent — running in any order must produce the same result.
+
+**Go patterns:**
+- `t.Parallel()` on subtests forces isolation
+- Fresh state in each `t.Run`
+- No `init()` or package-level setup that leaks between tests
+
+```go
+func TestUserService(t *testing.T) {
+    tests := []struct {
+        name    string
+        input   CreateUserReq
+        wantErr bool
+    }{
+        {"valid user", validReq, false},
+        {"duplicate email", dupEmailReq, true},
+    }
+
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            t.Parallel() // Each subtest runs independently
+            store := NewInMemoryStore() // Fresh state per test
+            svc := NewUserService(store)
+            _, err := svc.Create(context.Background(), tt.input)
+            if tt.wantErr {
+                assert.Error(t, err)
+            } else {
+                assert.NoError(t, err)
+            }
+        })
+    }
+}
+```
+
+### Necessary (N)
+
+Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.
+
+**Types of Tautology Theatre:**
+
+1. **Mock tautology:** Configure mock return, then assert that mock returns it.
+   ```go
+   // Bad: this passes even if production code is deleted
+   mockStore.On("GetUser", id).Return(user, nil)
+   result, _ := mockStore.GetUser(id)
+   assert.Equal(t, user, result) // Testing the mock, not production code
+   ```
+
+2. **Mock-only test:** Every object is a mock; no real class instantiated.
+
+3. **Trivial tautology:** `assert.True(t, true)` or `assert.NotNil(t, new(User))`
+
+4. **Framework test:** Verifying that Go's `make(map[string]int)` returns non-nil.
+
+**Fix:** Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).
+
+### Granular (G)
+
+Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.
+
+```go
+// Good: each row is one case, each can fail independently
+func TestParseAmount(t *testing.T) {
+    tests := []struct {
+        name    string
+        input   string
+        want    Amount
+        wantErr bool
+    }{
+        {"integer", "100", Amount{Value: 100}, false},
+        {"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
+        {"negative", "-5", Amount{}, true},
+        {"empty", "", Amount{}, true},
+        {"non-numeric", "abc", Amount{}, true},
+    }
+
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            got, err := ParseAmount(tt.input)
+            if tt.wantErr {
+                require.Error(t, err)
+                return
+            }
+            require.NoError(t, err)
+            assert.Equal(t, tt.want, got)
+        })
+    }
+}
+```
+
+### Fast (F)
+
+Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.
+
+**Go fixes:**
+- Mark slow integration tests with build tags: `//go:build integration`
+- Use `t.Parallel()` to parallelize safe tests
+- Use `InMemoryStore` implementations instead of real DB for unit tests
+- Use `httptest.NewServer` for HTTP tests instead of real servers
+
+```bash
+# Unit tests only (fast, default)
+go test ./...
+
+# Integration tests (slower, explicit)
+go test -tags=integration ./...
+```
+
+**Negative signals:** `time.Sleep`, network calls without build tags, database calls in unit tests.
+
+### First / TDD (T)
+
+Evidence that tests were written before implementation. This is the hardest property to verify statically.
+
+**Positive signals:**
+- Commit history shows test commit before implementation commit
+- Tests test behavior, not implementation details (tests-first forces API design)
+- Tests are simpler than the implementation (tests-first keeps tests focused)
+
+**Negative signals:**
+- Tests that exactly mirror the implementation structure
+- Tests that only cover happy paths (implementation-first misses edge cases)
+- Tests added in the same commit as a large implementation
+
+## Go-Specific Test Design Notes
+
+### t.Helper()
+
+Use `t.Helper()` in helper functions so stack traces point to the call site, not the helper:
+
+```go
+func assertValidUser(t *testing.T, u User) {
+    t.Helper()
+    assert.NotEmpty(t, u.ID)
+    assert.NotEmpty(t, u.Email)
+}
+```
+
+### Table-Driven Tests Are Preferred
+
+Go convention is table-driven tests. They're granular, readable, and easy to extend:
+- Add a new case by adding a row to the table — no new test function
+- Each case can be run independently: `go test -run TestFoo/case_name`
+
+### Subtests Enable Targeted Runs
+
+```bash
+go test -run TestValidateUser/rejects_empty_email ./...
+```
+
+## When Writing Tests
+
+Apply this checklist to every new test:
+
+- [ ] Name describes the behavior being tested (not the function name)
+- [ ] Structure follows Arrange-Act-Assert
+- [ ] Tests one behavior (no "and" in the name)
+- [ ] Uses real implementations where feasible
+- [ ] Runs in < 100ms (or tagged for integration)
+- [ ] Uses `t.Helper()` in helper functions
+- [ ] Table-driven if testing multiple similar inputs
+
+## Cross-References
+
+- Load `tdd` skill for the full TDD workflow
+- Load `code-review` skill for test quality review during pre-merge review
+- See `clean-code/references/code-smells.md` for testing-specific smells