Files
skills/test-design/SKILL.md
Mathias d6a71e370e
Some checks failed
release / tag (push) Has been cancelled
chore: bootstrap skills library — 19 skills + installer + CI auto-tag
Phase 1 of mathias/skills extraction (infra#62 Track D — homelab
next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill
dirs + SKILLS_INDEX.md) and adds the installation surface:

- Taskfile.yml — install / update / list / release / check targets
- install.sh — bootstrap installer for hosts without Task. Idempotent
  symlink wirer; default checkout at ~/.local/share/skills/ on every
  host; SKILLS_REF env var pins a tag (default: main).
- .gitea/workflows/release.yml — auto-tag every push to main by
  Bump-Type footer (major/minor/patch, default patch). Skipped when
  commit contains [skip-release].
- README — usage, versioning, contribution flow, secret-hygiene rule.

Phase 1 wires Claude Code only (~/.claude/skills/<name> global +
<repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode,
antigravity, and gitea-resident agents (cobalt-dingo, agentsquad)
once their skill conventions are researched.

Public repo, markdown-only — no secrets, no client names. Verified
via pre-push grep before initial push.

[skip-release]
2026-05-24 14:59:54 +02:00

9.9 KiB
Raw Blame History

name, description
name description
test-design Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification.

Test Design

Overview

Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.

This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The Farley Index (010) provides a scored summary.

Reference: Dave Farley's Properties of Good Tests

The 8 Properties

Property Weight What it measures
Understandable 1.5x Can a reader understand what behavior is being tested?
Maintainable 1.5x Will small code changes cause test failures unrelated to behavior?
Repeatable 1.25x Same result every time, regardless of environment or order
Atomic 1.0x One behavior per test; tests are independent
Necessary 1.0x Tests real behavior, not mock internals or framework behavior
Granular 1.0x Each test covers one specific case
Fast 0.75x Tests run quickly enough to support rapid TDD cycles
First (TDD) 1.0x Tests were written before implementation

Farley Index formula: (U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0

Rating Scale

Score Rating Interpretation
9.010.0 Exemplary Model quality; tests serve as living documentation
7.58.9 Excellent High quality with minor improvement opportunities
6.07.4 Good Solid foundation with clear improvement areas
4.55.9 Fair Functional but needs significant attention
3.04.4 Poor Tests provide limited value; refactoring needed
0.02.9 Critical Tests may be harmful; consider rewriting

Property Deep Dives

Understandable (U)

A test should tell a story: what behavior, under what conditions, produces what result.

Go patterns that help:

  • Subtest names in t.Run: t.Run("returns error when email is empty", ...)
  • Table-driven tests with descriptive name fields
  • Arrange-Act-Assert structure with blank lines separating sections
// Good: clear behavior name, clear structure
func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
    // Arrange
    user := User{Name: "Alice", Email: ""}

    // Act
    err := ValidateUser(user)

    // Assert
    require.Error(t, err)
    assert.ErrorIs(t, err, ErrInvalidEmail)
}

// Bad: cryptic name, no structure
func TestUser1(t *testing.T) {
    u := User{}
    assert.NotNil(t, ValidateUser(u))
}

Negative signals: cryptic names (test_1, TestFoo), no AAA structure, multiple behaviors in one test.

Maintainable (M)

Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.

Negative signals:

  • Over-specified mock interactions (assert.Called(mock, "MethodX", args...) when behavior is all that matters)
  • ArgumentCaptor deep inspection
  • verifyNoMoreInteractions that breaks when you add a logging call
  • Tests coupled to internal field names

Go patterns that help:

  • Test behavior via public API, not internal state
  • Avoid asserting on exact call counts unless the count IS the behavior
// Bad: breaks when you add an audit log call
mock.AssertCalled(t, "Save", user)
mock.AssertNumberOfCalls(t, "Save", 1)
mock.AssertNotCalled(t, "Log") // Breaks if you add logging later

// Good: test the outcome
result, err := service.CreateUser(ctx, req)
require.NoError(t, err)
assert.Equal(t, user.Email, result.Email)

Repeatable (R)

Tests must produce the same result regardless of when, where, or in what order they run.

Negative signals (Go):

  • time.Now() in test logic without injection
  • os.ReadFile for fixtures that aren't hermetic
  • Shared global state between tests
  • Tests that depend on network availability
  • time.Sleep for synchronization

Go fixes:

// Bad: time-dependent
func TestTokenExpiry(t *testing.T) {
    token := generateToken()
    time.Sleep(2 * time.Second)
    assert.True(t, token.IsExpired())
}

// Good: inject clock
type Clock interface {
    Now() time.Time
}

type FixedClock struct{ t time.Time }
func (c FixedClock) Now() time.Time { return c.t }

func TestTokenExpiry(t *testing.T) {
    clock := FixedClock{t: time.Unix(0, 0)}
    token := generateTokenWithClock(clock)
    futureClk := FixedClock{t: time.Unix(3600, 0)}
    assert.True(t, token.IsExpiredAt(futureClk.Now()))
}

Use t.TempDir() for filesystem fixtures — cleaned up automatically.

Atomic (A)

One test = one behavior. Tests must be independent — running in any order must produce the same result.

Go patterns:

  • t.Parallel() on subtests forces isolation
  • Fresh state in each t.Run
  • No init() or package-level setup that leaks between tests
func TestUserService(t *testing.T) {
    tests := []struct {
        name    string
        input   CreateUserReq
        wantErr bool
    }{
        {"valid user", validReq, false},
        {"duplicate email", dupEmailReq, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            t.Parallel() // Each subtest runs independently
            store := NewInMemoryStore() // Fresh state per test
            svc := NewUserService(store)
            _, err := svc.Create(context.Background(), tt.input)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}

Necessary (N)

Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.

Types of Tautology Theatre:

  1. Mock tautology: Configure mock return, then assert that mock returns it.

    // Bad: this passes even if production code is deleted
    mockStore.On("GetUser", id).Return(user, nil)
    result, _ := mockStore.GetUser(id)
    assert.Equal(t, user, result) // Testing the mock, not production code
    
  2. Mock-only test: Every object is a mock; no real class instantiated.

  3. Trivial tautology: assert.True(t, true) or assert.NotNil(t, new(User))

  4. Framework test: Verifying that Go's make(map[string]int) returns non-nil.

Fix: Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).

Granular (G)

Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.

// Good: each row is one case, each can fail independently
func TestParseAmount(t *testing.T) {
    tests := []struct {
        name    string
        input   string
        want    Amount
        wantErr bool
    }{
        {"integer", "100", Amount{Value: 100}, false},
        {"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
        {"negative", "-5", Amount{}, true},
        {"empty", "", Amount{}, true},
        {"non-numeric", "abc", Amount{}, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            got, err := ParseAmount(tt.input)
            if tt.wantErr {
                require.Error(t, err)
                return
            }
            require.NoError(t, err)
            assert.Equal(t, tt.want, got)
        })
    }
}

Fast (F)

Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.

Go fixes:

  • Mark slow integration tests with build tags: //go:build integration
  • Use t.Parallel() to parallelize safe tests
  • Use InMemoryStore implementations instead of real DB for unit tests
  • Use httptest.NewServer for HTTP tests instead of real servers
# Unit tests only (fast, default)
go test ./...

# Integration tests (slower, explicit)
go test -tags=integration ./...

Negative signals: time.Sleep, network calls without build tags, database calls in unit tests.

First / TDD (T)

Evidence that tests were written before implementation. This is the hardest property to verify statically.

Positive signals:

  • Commit history shows test commit before implementation commit
  • Tests test behavior, not implementation details (tests-first forces API design)
  • Tests are simpler than the implementation (tests-first keeps tests focused)

Negative signals:

  • Tests that exactly mirror the implementation structure
  • Tests that only cover happy paths (implementation-first misses edge cases)
  • Tests added in the same commit as a large implementation

Go-Specific Test Design Notes

t.Helper()

Use t.Helper() in helper functions so stack traces point to the call site, not the helper:

func assertValidUser(t *testing.T, u User) {
    t.Helper()
    assert.NotEmpty(t, u.ID)
    assert.NotEmpty(t, u.Email)
}

Table-Driven Tests Are Preferred

Go convention is table-driven tests. They're granular, readable, and easy to extend:

  • Add a new case by adding a row to the table — no new test function
  • Each case can be run independently: go test -run TestFoo/case_name

Subtests Enable Targeted Runs

go test -run TestValidateUser/rejects_empty_email ./...

When Writing Tests

Apply this checklist to every new test:

  • Name describes the behavior being tested (not the function name)
  • Structure follows Arrange-Act-Assert
  • Tests one behavior (no "and" in the name)
  • Uses real implementations where feasible
  • Runs in < 100ms (or tagged for integration)
  • Uses t.Helper() in helper functions
  • Table-driven if testing multiple similar inputs

Cross-References

  • Load tdd skill for the full TDD workflow
  • Load code-review skill for test quality review during pre-merge review
  • See clean-code/references/code-smells.md for testing-specific smells