Files

release / tag (push) Has been cancelled

Details

chore: bootstrap skills library — 19 skills + installer + CI auto-tag

Phase 1 of mathias/skills extraction (infra#62 Track D — homelab
next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill
dirs + SKILLS_INDEX.md) and adds the installation surface:

- Taskfile.yml — install / update / list / release / check targets
- install.sh — bootstrap installer for hosts without Task. Idempotent
  symlink wirer; default checkout at ~/.local/share/skills/ on every
  host; SKILLS_REF env var pins a tag (default: main).
- .gitea/workflows/release.yml — auto-tag every push to main by
  Bump-Type footer (major/minor/patch, default patch). Skipped when
  commit contains [skip-release].
- README — usage, versioning, contribution flow, secret-hygiene rule.

Phase 1 wires Claude Code only (~/.claude/skills/<name> global +
<repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode,
antigravity, and gitea-resident agents (cobalt-dingo, agentsquad)
once their skill conventions are researched.

Public repo, markdown-only — no secrets, no client names. Verified
via pre-push grep before initial push.

[skip-release]

2026-05-24 14:59:54 +02:00

9.9 KiB

Raw Blame History

name, description

name	description
test-design	Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification.

Test Design

Overview

Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.

This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The Farley Index (0–10) provides a scored summary.

Reference: Dave Farley's Properties of Good Tests

The 8 Properties

Property	Weight	What it measures
Understandable	1.5x	Can a reader understand what behavior is being tested?
Maintainable	1.5x	Will small code changes cause test failures unrelated to behavior?
Repeatable	1.25x	Same result every time, regardless of environment or order
Atomic	1.0x	One behavior per test; tests are independent
Necessary	1.0x	Tests real behavior, not mock internals or framework behavior
Granular	1.0x	Each test covers one specific case
Fast	0.75x	Tests run quickly enough to support rapid TDD cycles
First (TDD)	1.0x	Tests were written before implementation

Farley Index formula: (U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0

Rating Scale

Score	Rating	Interpretation
9.0–10.0	Exemplary	Model quality; tests serve as living documentation
7.5–8.9	Excellent	High quality with minor improvement opportunities
6.0–7.4	Good	Solid foundation with clear improvement areas
4.5–5.9	Fair	Functional but needs significant attention
3.0–4.4	Poor	Tests provide limited value; refactoring needed
0.0–2.9	Critical	Tests may be harmful; consider rewriting

Property Deep Dives

Understandable (U)

A test should tell a story: what behavior, under what conditions, produces what result.

Go patterns that help:

Subtest names in t.Run: t.Run("returns error when email is empty", ...)
Table-driven tests with descriptive name fields
Arrange-Act-Assert structure with blank lines separating sections

// Good: clear behavior name, clear structure
func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
    // Arrange
    user := User{Name: "Alice", Email: ""}

    // Act
    err := ValidateUser(user)

    // Assert
    require.Error(t, err)
    assert.ErrorIs(t, err, ErrInvalidEmail)
}

// Bad: cryptic name, no structure
func TestUser1(t *testing.T) {
    u := User{}
    assert.NotNil(t, ValidateUser(u))
}

Negative signals: cryptic names (test_1, TestFoo), no AAA structure, multiple behaviors in one test.

Maintainable (M)

Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.

Negative signals:

Over-specified mock interactions (assert.Called(mock, "MethodX", args...) when behavior is all that matters)
ArgumentCaptor deep inspection
verifyNoMoreInteractions that breaks when you add a logging call
Tests coupled to internal field names

Go patterns that help:

Test behavior via public API, not internal state
Avoid asserting on exact call counts unless the count IS the behavior

// Bad: breaks when you add an audit log call
mock.AssertCalled(t, "Save", user)
mock.AssertNumberOfCalls(t, "Save", 1)
mock.AssertNotCalled(t, "Log") // Breaks if you add logging later

// Good: test the outcome
result, err := service.CreateUser(ctx, req)
require.NoError(t, err)
assert.Equal(t, user.Email, result.Email)

Repeatable (R)

Tests must produce the same result regardless of when, where, or in what order they run.

Negative signals (Go):

time.Now() in test logic without injection
os.ReadFile for fixtures that aren't hermetic
Shared global state between tests
Tests that depend on network availability
time.Sleep for synchronization

Go fixes:

// Bad: time-dependent
func TestTokenExpiry(t *testing.T) {
    token := generateToken()
    time.Sleep(2 * time.Second)
    assert.True(t, token.IsExpired())
}

// Good: inject clock
type Clock interface {
    Now() time.Time
}

type FixedClock struct{ t time.Time }
func (c FixedClock) Now() time.Time { return c.t }

func TestTokenExpiry(t *testing.T) {
    clock := FixedClock{t: time.Unix(0, 0)}
    token := generateTokenWithClock(clock)
    futureClk := FixedClock{t: time.Unix(3600, 0)}
    assert.True(t, token.IsExpiredAt(futureClk.Now()))
}

Use t.TempDir() for filesystem fixtures — cleaned up automatically.

Atomic (A)

One test = one behavior. Tests must be independent — running in any order must produce the same result.

Go patterns:

t.Parallel() on subtests forces isolation
Fresh state in each t.Run
No init() or package-level setup that leaks between tests

func TestUserService(t *testing.T) {
    tests := []struct {
        name    string
        input   CreateUserReq
        wantErr bool
    }{
        {"valid user", validReq, false},
        {"duplicate email", dupEmailReq, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            t.Parallel() // Each subtest runs independently
            store := NewInMemoryStore() // Fresh state per test
            svc := NewUserService(store)
            _, err := svc.Create(context.Background(), tt.input)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}

Necessary (N)

Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.

Types of Tautology Theatre:

Mock tautology: Configure mock return, then assert that mock returns it.

// Bad: this passes even if production code is deleted
mockStore.On("GetUser", id).Return(user, nil)
result, _ := mockStore.GetUser(id)
assert.Equal(t, user, result) // Testing the mock, not production code

Mock-only test: Every object is a mock; no real class instantiated.
Trivial tautology: assert.True(t, true) or assert.NotNil(t, new(User))
Framework test: Verifying that Go's make(map[string]int) returns non-nil.

Fix: Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).

Granular (G)

Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.

// Good: each row is one case, each can fail independently
func TestParseAmount(t *testing.T) {
    tests := []struct {
        name    string
        input   string
        want    Amount
        wantErr bool
    }{
        {"integer", "100", Amount{Value: 100}, false},
        {"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
        {"negative", "-5", Amount{}, true},
        {"empty", "", Amount{}, true},
        {"non-numeric", "abc", Amount{}, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            got, err := ParseAmount(tt.input)
            if tt.wantErr {
                require.Error(t, err)
                return
            }
            require.NoError(t, err)
            assert.Equal(t, tt.want, got)
        })
    }
}

Fast (F)

Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.

Go fixes:

Mark slow integration tests with build tags: //go:build integration
Use t.Parallel() to parallelize safe tests
Use InMemoryStore implementations instead of real DB for unit tests
Use httptest.NewServer for HTTP tests instead of real servers

# Unit tests only (fast, default)
go test ./...

# Integration tests (slower, explicit)
go test -tags=integration ./...

Negative signals: time.Sleep, network calls without build tags, database calls in unit tests.

First / TDD (T)

Evidence that tests were written before implementation. This is the hardest property to verify statically.

Positive signals:

Commit history shows test commit before implementation commit
Tests test behavior, not implementation details (tests-first forces API design)
Tests are simpler than the implementation (tests-first keeps tests focused)

Negative signals:

Tests that exactly mirror the implementation structure
Tests that only cover happy paths (implementation-first misses edge cases)
Tests added in the same commit as a large implementation

Go-Specific Test Design Notes

t.Helper()

Use t.Helper() in helper functions so stack traces point to the call site, not the helper:

func assertValidUser(t *testing.T, u User) {
    t.Helper()
    assert.NotEmpty(t, u.ID)
    assert.NotEmpty(t, u.Email)
}

Table-Driven Tests Are Preferred

Go convention is table-driven tests. They're granular, readable, and easy to extend:

Add a new case by adding a row to the table — no new test function
Each case can be run independently: go test -run TestFoo/case_name

Subtests Enable Targeted Runs

go test -run TestValidateUser/rejects_empty_email ./...

When Writing Tests

Apply this checklist to every new test:

Name describes the behavior being tested (not the function name)
Structure follows Arrange-Act-Assert
Tests one behavior (no "and" in the name)
Uses real implementations where feasible
Runs in < 100ms (or tagged for integration)
Uses t.Helper() in helper functions
Table-driven if testing multiple similar inputs

Cross-References

Load tdd skill for the full TDD workflow
Load code-review skill for test quality review during pre-merge review
See clean-code/references/code-smells.md for testing-specific smells

9.9 KiB Raw Blame History Unescape Escape