chore: bootstrap skills library — 19 skills + installer + CI auto-tag
Some checks failed
release / tag (push) Has been cancelled

Phase 1 of mathias/skills extraction (infra#62 Track D — homelab
next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill
dirs + SKILLS_INDEX.md) and adds the installation surface:

- Taskfile.yml — install / update / list / release / check targets
- install.sh — bootstrap installer for hosts without Task. Idempotent
  symlink wirer; default checkout at ~/.local/share/skills/ on every
  host; SKILLS_REF env var pins a tag (default: main).
- .gitea/workflows/release.yml — auto-tag every push to main by
  Bump-Type footer (major/minor/patch, default patch). Skipped when
  commit contains [skip-release].
- README — usage, versioning, contribution flow, secret-hygiene rule.

Phase 1 wires Claude Code only (~/.claude/skills/<name> global +
<repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode,
antigravity, and gitea-resident agents (cobalt-dingo, agentsquad)
once their skill conventions are researched.

Public repo, markdown-only — no secrets, no client names. Verified
via pre-push grep before initial push.

[skip-release]
This commit is contained in:
Mathias
2026-05-24 14:59:54 +02:00
commit d6a71e370e
33 changed files with 8688 additions and 0 deletions

401
tdd/SKILL.md Normal file
View File

@@ -0,0 +1,401 @@
---
name: tdd
description: Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes.
---
# Test-Driven Development (TDD)
## Overview
Write the test first. Watch it fail. Write minimal code to pass.
**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
**Violating the letter of the rules is violating the spirit of the rules.**
## When to Use
**Always:**
- New features
- Bug fixes
- Refactoring
- Behavior changes
**Exceptions (ask Mathias):**
- Throwaway prototypes
- Generated code (sqlc output, templ output)
- Configuration files
Thinking "skip TDD just this once"? Stop. That's rationalization.
## The Iron Law
```
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
```
Write code before the test? Delete it. Start over.
**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
Implement fresh from tests. Period.
## Red-Green-Refactor
```
RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next)
```
### RED - Write Failing Test
Write one minimal test showing what should happen.
**Go example (good):**
```go
func TestRetryOperation_RetriesThreeTimes(t *testing.T) {
attempts := 0
op := func() error {
attempts++
if attempts < 3 {
return errors.New("fail")
}
return nil
}
err := RetryOperation(op, 3)
require.NoError(t, err)
assert.Equal(t, 3, attempts)
}
```
Clear name, tests real behavior, one thing.
**Go example (bad):**
```go
func TestRetry(t *testing.T) {
// Vague name, tests nothing meaningful
err := RetryOperation(nil, 0)
assert.NoError(t, err)
}
```
**Requirements:**
- One behavior per test
- Clear name describing the behavior
- Real code (no mocks unless crossing a system boundary)
### Verify RED - Watch It Fail
**MANDATORY. Never skip.**
```bash
go test -run TestRetryOperation_RetriesThreeTimes ./...
```
Confirm:
- Test fails (not compilation errors)
- Failure message is expected
- Fails because feature is missing (not typos)
**Test passes?** You're testing existing behavior. Fix test.
**Compilation errors?** Fix errors, re-run until it fails correctly.
### GREEN - Minimal Code
Write simplest code to pass the test.
**Good:**
```go
func RetryOperation(op func() error, maxRetries int) error {
for i := 0; i < maxRetries; i++ {
if err := op(); err == nil {
return nil
}
}
return op()
}
```
Just enough to pass.
**Bad:**
```go
func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error {
// YAGNI — don't add options, backoff, jitter before the test demands it
}
```
Over-engineered.
Don't add features, refactor other code, or "improve" beyond what the test demands.
### Verify GREEN - Watch It Pass
**MANDATORY.**
```bash
go test ./...
```
Confirm:
- Test passes
- All other tests still pass
- No race conditions: `go test -race ./...`
**Test fails?** Fix code, not test.
**Other tests fail?** Fix now.
### REFACTOR - Clean Up
After green only:
- Remove duplication
- Improve names
- Extract helpers
- Apply clean code principles (load `clean-code` skill)
Keep tests green. Don't add behavior.
### Repeat
Next failing test for next behavior.
## Go-Specific TDD Notes
### Table-Driven Tests (Preferred Pattern)
```go
func TestValidateEmail(t *testing.T) {
tests := []struct {
name string
email string
wantErr bool
}{
{"valid email", "user@example.com", false},
{"empty email", "", true},
{"no at sign", "userexample.com", true},
{"no domain", "user@", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := ValidateEmail(tt.email)
if tt.wantErr {
assert.Error(t, err)
} else {
assert.NoError(t, err)
}
})
}
}
```
Table-driven tests are the Go idiom. Use them for behavior that varies across inputs.
### Subtests with t.Run
Use `t.Run` to name subtests clearly. Failure messages include the subtest name.
```go
t.Run("rejects empty input", func(t *testing.T) { ... })
t.Run("accepts valid UUID", func(t *testing.T) { ... })
```
### Test File Conventions
- File: `<package>_test.go` in the same directory
- Package: `package foo_test` for black-box testing (preferred), `package foo` for white-box
- Helpers: use `t.Helper()` so stack traces point to the caller, not the helper
```go
func assertNoError(t *testing.T, err error) {
t.Helper()
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
```
### Running Tests
```bash
go test ./... # all packages
go test -run TestFoo ./pkg/... # specific test
go test -run TestFoo/subtest ./... # specific subtest
go test -race ./... # race detector (always run before commit)
go test -cover ./... # coverage
go test -v ./... # verbose
```
### testify
Pre-approved. Use `assert` (continues on failure) and `require` (stops on failure):
```go
require.NoError(t, err) // fatal if error
assert.Equal(t, expected, got) // non-fatal comparison
assert.ErrorIs(t, err, ErrFoo) // error chain check
```
## Good Tests
| Quality | Good | Bad |
|---------|------|-----|
| **Minimal** | One thing. "and" in name? Split it. | `TestValidatesEmailAndDomainAndWhitespace` |
| **Clear** | Name describes behavior | `TestFoo`, `Test1` |
| **Shows intent** | Demonstrates desired API | Obscures what code should do |
| **Table-driven** | Multiple cases, one test function | Copy-pasted test functions |
## Common Rationalizations
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |
## Red Flags - STOP and Start Over
- Code before test
- Test after implementation
- Test passes immediately without seeing it fail
- Can't explain why test failed
- Tests added "later"
- Rationalizing "just this once"
- "Already manually tested it"
- "Tests after achieve the same purpose"
- "Keep as reference" or "adapt existing code"
- "Already spent X hours, deleting is wasteful"
- "TDD is dogmatic, I'm being pragmatic"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
## Example: Bug Fix
**Bug:** Empty email accepted
**RED**
```go
func TestSubmitForm_RejectsEmptyEmail(t *testing.T) {
result := submitForm(FormData{Email: ""})
assert.Equal(t, "email required", result.Error)
}
```
**Verify RED**
```bash
$ go test -run TestSubmitForm_RejectsEmptyEmail ./...
FAIL: expected "email required", got ""
```
**GREEN**
```go
func submitForm(data FormData) FormResult {
if strings.TrimSpace(data.Email) == "" {
return FormResult{Error: "email required"}
}
// ...
}
```
**Verify GREEN**
```bash
$ go test ./...
ok example.com/myapp 0.003s
```
**REFACTOR**
Extract validation for multiple fields if needed.
## Verification Checklist
Before marking work complete:
- [ ] Every new function/method has a test
- [ ] Watched each test fail before implementing
- [ ] Each test failed for expected reason (feature missing, not typo)
- [ ] Wrote minimal code to pass each test
- [ ] All tests pass: `go test ./...`
- [ ] Race detector clean: `go test -race ./...`
- [ ] Tests use real code (mocks only if crossing a system boundary)
- [ ] Edge cases and errors covered
Can't check all boxes? You skipped TDD. Start over.
## When Stuck
| Problem | Solution |
|---------|----------|
| Don't know how to test | Write wished-for API. Write assertion first. Ask Mathias. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers with `t.Helper()`. Still complex? Simplify design. |
## Debugging Integration
Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
Never fix bugs without a test.
## Testing Anti-Patterns
When adding test utilities or mocks, load the `references/testing-anti-patterns.md` to avoid:
- Testing mock behavior instead of real behavior
- Adding test-only methods to production types
- Mocking without understanding what the dependency does
## Brain MCP Integration
The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot.
**At session start:**
- Run `brain_query` with the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new.
**Never:**
- Embed brain content in test code or assertions. The brain is context for you, not state for the system under test.
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of `red` phase:**
- `session_log` with `{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**At end of `green` phase:**
- `session_log` with `{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**At end of `refactor` phase:**
- `session_log` with `{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Status semantics:**
- `pass` — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).
- `fail` — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).
- `skip` — phase was skipped intentionally (e.g. refactor not warranted).
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future `tdd` call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.
## Final Rule
```
Production code → test exists and failed first
Otherwise → not TDD
```
No exceptions without Mathias's permission.
## Mode 2 Routing Note
This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.

View File

@@ -0,0 +1,299 @@
# Testing Anti-Patterns
**Load this reference when:** writing or changing tests, adding mocks, or tempted to add test-only methods to production code.
## Overview
Tests must verify real behavior, not mock behavior. Mocks are a means to isolate, not the thing being tested.
**Core principle:** Test what the code does, not what the mocks do.
**Following strict TDD prevents these anti-patterns.**
## The Iron Laws
```
1. NEVER test mock behavior
2. NEVER add test-only methods to production classes
3. NEVER mock without understanding dependencies
```
## Anti-Pattern 1: Testing Mock Behavior
**The violation:**
```typescript
// ❌ BAD: Testing that the mock exists
test('renders sidebar', () => {
render(<Page />);
expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});
```
**Why this is wrong:**
- You're verifying the mock works, not that the component works
- Test passes when mock is present, fails when it's not
- Tells you nothing about real behavior
**your human partner's correction:** "Are we testing the behavior of a mock?"
**The fix:**
```typescript
// ✅ GOOD: Test real component or don't mock it
test('renders sidebar', () => {
render(<Page />); // Don't mock sidebar
expect(screen.getByRole('navigation')).toBeInTheDocument();
});
// OR if sidebar must be mocked for isolation:
// Don't assert on the mock - test Page's behavior with sidebar present
```
### Gate Function
```
BEFORE asserting on any mock element:
Ask: "Am I testing real component behavior or just mock existence?"
IF testing mock existence:
STOP - Delete the assertion or unmock the component
Test real behavior instead
```
## Anti-Pattern 2: Test-Only Methods in Production
**The violation:**
```typescript
// ❌ BAD: destroy() only used in tests
class Session {
async destroy() { // Looks like production API!
await this._workspaceManager?.destroyWorkspace(this.id);
// ... cleanup
}
}
// In tests
afterEach(() => session.destroy());
```
**Why this is wrong:**
- Production class polluted with test-only code
- Dangerous if accidentally called in production
- Violates YAGNI and separation of concerns
- Confuses object lifecycle with entity lifecycle
**The fix:**
```typescript
// ✅ GOOD: Test utilities handle test cleanup
// Session has no destroy() - it's stateless in production
// In test-utils/
export async function cleanupSession(session: Session) {
const workspace = session.getWorkspaceInfo();
if (workspace) {
await workspaceManager.destroyWorkspace(workspace.id);
}
}
// In tests
afterEach(() => cleanupSession(session));
```
### Gate Function
```
BEFORE adding any method to production class:
Ask: "Is this only used by tests?"
IF yes:
STOP - Don't add it
Put it in test utilities instead
Ask: "Does this class own this resource's lifecycle?"
IF no:
STOP - Wrong class for this method
```
## Anti-Pattern 3: Mocking Without Understanding
**The violation:**
```typescript
// ❌ BAD: Mock breaks test logic
test('detects duplicate server', () => {
// Mock prevents config write that test depends on!
vi.mock('ToolCatalog', () => ({
discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
}));
await addServer(config);
await addServer(config); // Should throw - but won't!
});
```
**Why this is wrong:**
- Mocked method had side effect test depended on (writing config)
- Over-mocking to "be safe" breaks actual behavior
- Test passes for wrong reason or fails mysteriously
**The fix:**
```typescript
// ✅ GOOD: Mock at correct level
test('detects duplicate server', () => {
// Mock the slow part, preserve behavior test needs
vi.mock('MCPServerManager'); // Just mock slow server startup
await addServer(config); // Config written
await addServer(config); // Duplicate detected ✓
});
```
### Gate Function
```
BEFORE mocking any method:
STOP - Don't mock yet
1. Ask: "What side effects does the real method have?"
2. Ask: "Does this test depend on any of those side effects?"
3. Ask: "Do I fully understand what this test needs?"
IF depends on side effects:
Mock at lower level (the actual slow/external operation)
OR use test doubles that preserve necessary behavior
NOT the high-level method the test depends on
IF unsure what test depends on:
Run test with real implementation FIRST
Observe what actually needs to happen
THEN add minimal mocking at the right level
Red flags:
- "I'll mock this to be safe"
- "This might be slow, better mock it"
- Mocking without understanding the dependency chain
```
## Anti-Pattern 4: Incomplete Mocks
**The violation:**
```typescript
// ❌ BAD: Partial mock - only fields you think you need
const mockResponse = {
status: 'success',
data: { userId: '123', name: 'Alice' }
// Missing: metadata that downstream code uses
};
// Later: breaks when code accesses response.metadata.requestId
```
**Why this is wrong:**
- **Partial mocks hide structural assumptions** - You only mocked fields you know about
- **Downstream code may depend on fields you didn't include** - Silent failures
- **Tests pass but integration fails** - Mock incomplete, real API complete
- **False confidence** - Test proves nothing about real behavior
**The Iron Rule:** Mock the COMPLETE data structure as it exists in reality, not just fields your immediate test uses.
**The fix:**
```typescript
// ✅ GOOD: Mirror real API completeness
const mockResponse = {
status: 'success',
data: { userId: '123', name: 'Alice' },
metadata: { requestId: 'req-789', timestamp: 1234567890 }
// All fields real API returns
};
```
### Gate Function
```
BEFORE creating mock responses:
Check: "What fields does the real API response contain?"
Actions:
1. Examine actual API response from docs/examples
2. Include ALL fields system might consume downstream
3. Verify mock matches real response schema completely
Critical:
If you're creating a mock, you must understand the ENTIRE structure
Partial mocks fail silently when code depends on omitted fields
If uncertain: Include all documented fields
```
## Anti-Pattern 5: Integration Tests as Afterthought
**The violation:**
```
✅ Implementation complete
❌ No tests written
"Ready for testing"
```
**Why this is wrong:**
- Testing is part of implementation, not optional follow-up
- TDD would have caught this
- Can't claim complete without tests
**The fix:**
```
TDD cycle:
1. Write failing test
2. Implement to pass
3. Refactor
4. THEN claim complete
```
## When Mocks Become Too Complex
**Warning signs:**
- Mock setup longer than test logic
- Mocking everything to make test pass
- Mocks missing methods real components have
- Test breaks when mock changes
**your human partner's question:** "Do we need to be using a mock here?"
**Consider:** Integration tests with real components often simpler than complex mocks
## TDD Prevents These Anti-Patterns
**Why TDD helps:**
1. **Write test first** → Forces you to think about what you're actually testing
2. **Watch it fail** → Confirms test tests real behavior, not mocks
3. **Minimal implementation** → No test-only methods creep in
4. **Real dependencies** → You see what the test actually needs before mocking
**If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
## Quick Reference
| Anti-Pattern | Fix |
|--------------|-----|
| Assert on mock elements | Test real component or unmock it |
| Test-only methods in production | Move to test utilities |
| Mock without understanding | Understand dependencies first, mock minimally |
| Incomplete mocks | Mirror real API completely |
| Tests as afterthought | TDD - tests first |
| Over-complex mocks | Consider integration tests |
## Red Flags
- Assertion checks for `*-mock` test IDs
- Methods only called in test files
- Mock setup is >50% of test
- Test fails when you remove mock
- Can't explain why mock is needed
- Mocking "just to be safe"
## The Bottom Line
**Mocks are tools to isolate, not things to test.**
If TDD reveals you're testing mock behavior, you've gone wrong.
Fix: Test real behavior or question why you're mocking at all.