Compare commits
115 Commits
v0.1.0
...
a34c66d7cd
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a34c66d7cd | ||
|
|
cc401d92d6 | ||
|
|
9bdf00f51f | ||
|
|
7f7524c859 | ||
|
|
0a70d9e972 | ||
|
|
3e9a648115 | ||
|
|
923a665365 | ||
|
|
537aebc302 | ||
|
|
de35d4dbb0 | ||
|
|
26855f69b0 | ||
|
|
a7b363d589 | ||
|
|
7b57051af8 | ||
|
|
a620f6cb01 | ||
|
|
26b5636b43 | ||
|
|
989f375aec | ||
|
|
6403d5e444 | ||
|
|
ab19968ae2 | ||
|
|
1605624668 | ||
|
|
55fa0b503a | ||
|
|
3c2bd9268c | ||
|
|
29727ec2a5 | ||
|
|
0a075088b2 | ||
|
|
1bfe501d09 | ||
|
|
3607920601 | ||
|
|
a6c39e8691 | ||
|
|
a37d18bf7a | ||
|
|
2975eadc87 | ||
|
|
53e46781b1 | ||
|
|
e9b5cc401c | ||
|
|
bf6f497d9d | ||
|
|
9cc6c2d053 | ||
|
|
43a46d07e5 | ||
|
|
820d1c93a7 | ||
|
|
6928907d79 | ||
|
|
e74320a8e8 | ||
|
|
1b0706f270 | ||
|
|
2ae6bfe81e | ||
|
|
a6dce972d6 | ||
|
|
2f4b577131 | ||
|
|
a25bb18c54 | ||
|
|
78531bb238 | ||
|
|
04fefe8e9c | ||
|
|
103f4d90bf | ||
|
|
9b11719481 | ||
|
|
d405346f07 | ||
|
|
bf8a3fc11c | ||
|
|
ae5a4d04f0 | ||
|
|
3a0424a6b4 | ||
|
|
08dd7b9365 | ||
|
|
91e02b930c | ||
|
|
c7341a2607 | ||
|
|
b5a0085c0a | ||
|
|
d6daa37c71 | ||
|
|
62fc3989f2 | ||
|
|
c9310b1079 | ||
|
|
ca8a691241 | ||
|
|
214f607007 | ||
|
|
0e08dfffb8 | ||
|
|
caef05bea4 | ||
|
|
ca1a16873c | ||
|
|
63c238c650 | ||
|
|
ce45592730 | ||
|
|
823de23213 | ||
|
|
78d3939caa | ||
|
|
f2bc39b500 | ||
|
|
3625e1268d | ||
|
|
47df642836 | ||
|
|
235d70ad0b | ||
|
|
7d5289ac54 | ||
|
|
3d8fc9dacd | ||
|
|
f9f804cd49 | ||
|
|
85f142ade0 | ||
|
|
0dfad02513 | ||
|
|
c44eb680b2 | ||
|
|
38ada998a2 | ||
|
|
74547c2bdf | ||
|
|
587c0d3b1c | ||
|
|
bb61f2992b | ||
|
|
3ba72d9b28 | ||
|
|
b4f0fbc3ea | ||
|
|
12943ee6f4 | ||
|
|
9af95ebd96 | ||
|
|
f0b567f3e6 | ||
|
|
e3d6cf4cf5 | ||
|
|
df59bd010c | ||
|
|
e5152151d6 | ||
|
|
aa2d57e619 | ||
|
|
6b53706987 | ||
|
|
a0cfc866df | ||
|
|
7bf19b6a7b | ||
|
|
19b019a8d8 | ||
|
|
4ef6a22e28 | ||
|
|
3796cfca87 | ||
|
|
7ce544a051 | ||
|
|
391720155e | ||
|
|
ae6600b8d2 | ||
|
|
6328766c7f | ||
|
|
f1deedd39d | ||
|
|
5cb272a869 | ||
|
|
e96b39a812 | ||
|
|
5db5b33cd7 | ||
|
|
a32457b5bc | ||
|
|
e0be5f0f98 | ||
|
|
6d410b810b | ||
|
|
76f195de2a | ||
|
|
f901d4e67d | ||
|
|
509c04b6e4 | ||
|
|
738275252c | ||
|
|
38fcac4cba | ||
|
|
7697e901d2 | ||
|
|
8cff57009a | ||
|
|
8fb44affef | ||
|
|
582ca5019b | ||
|
|
858a9ba1a1 | ||
|
|
cbef2da8de |
240
.aider.conventions.md
Normal file
240
.aider.conventions.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Agent context — Mathias workspace
|
||||
|
||||
<!-- Canonical root context for all AI coding agents.
|
||||
Lives at: ~/dev/.context/AGENT.md
|
||||
Applies to every project under ~/dev/ unless overridden.
|
||||
|
||||
Run `task context:sync` from ~/dev/ to regenerate harness-specific files.
|
||||
Project-level context in .context/PROJECT.md layers on top of this. -->
|
||||
|
||||
## Who I am
|
||||
|
||||
I'm Mathias, a digital product manager and technology consultant based in Sweden.
|
||||
I build software, research emerging tech, and deliver consulting engagements
|
||||
for clients under NDA. I work across AI/ML, financial automation, web applications,
|
||||
and climate/sustainability tech.
|
||||
|
||||
## How I work with agents
|
||||
|
||||
- I think like a product manager — I care about *why* before *how*
|
||||
- I want agents to be opinionated and push back, not just execute blindly
|
||||
- I prefer concise responses; skip ceremony and get to the point
|
||||
- When I say "build this", I mean production-quality with tests, not a demo
|
||||
- Ask me before making irreversible changes or adding heavy dependencies
|
||||
- I work with confidential client data — never send it to cloud APIs unless I explicitly say it's OK
|
||||
|
||||
## Behavior rules
|
||||
|
||||
These rules apply to every task across every project, regardless of harness.
|
||||
|
||||
1. **No assumptions.** Don't hide confusion — surface it. Surface tradeoffs explicitly.
|
||||
Think before coding; if the problem is unclear, ask or state assumptions before acting.
|
||||
2. **Minimum viable code.** Solve with the smallest change that works. Nothing
|
||||
speculative, no "while we're here" cleanups, no premature abstractions. Simplicity first.
|
||||
3. **Surgical changes.** Touch only what the task requires. Leave unrelated code,
|
||||
files, and formatting alone. Diffs should be small and reviewable.
|
||||
4. **Goal-driven execution.** Define clear success criteria up front for every task.
|
||||
Loop — implement, verify, refine — until those criteria are met. Don't claim
|
||||
completion without evidence (tests pass, command output, observed behavior).
|
||||
|
||||
## Default stack
|
||||
|
||||
| Layer | Default | Fallback | Last resort |
|
||||
|-------|---------|----------|-------------|
|
||||
| Language | Go | Python | TypeScript, Java, C |
|
||||
| UI | HTMX + Templ | Server-rendered HTML | React (only if SPA is justified) |
|
||||
| Build | Task (taskfile.dev) | Make | — |
|
||||
| Containers | Docker Compose (dev), k3s (prod) | — | — |
|
||||
| DB | PostgreSQL + sqlc | SQLite | — |
|
||||
| Search | Qdrant (vector), BM25 | — | — |
|
||||
| Logging | slog (structured) | — | — |
|
||||
| Testing | Table-driven, testify | — | — |
|
||||
|
||||
Exploratory: Rust, Zig — I'll tell you when I want these.
|
||||
|
||||
## Code conventions
|
||||
|
||||
- **Go style**: golines, gofumpt, golangci-lint
|
||||
- **Errors**: `fmt.Errorf("operation: %w", err)` — never naked, never log-and-return
|
||||
- **Naming**: stdlib conventions, no stuttering
|
||||
- **Architecture**: prefer stdlib over frameworks, constructor injection, env-var config parsed into typed structs
|
||||
- **Git**: conventional commits (`feat:`, `fix:`, `chore:`), one concern per PR, PR describes *why* not *what*
|
||||
- **Security**: no secrets in code, govulncheck before adding deps, SOPS for encrypted config
|
||||
- **Dependencies**: prefer stdlib. testify, slog, templ, sqlc are pre-approved; anything else needs justification in the commit message
|
||||
|
||||
## Infrastructure
|
||||
|
||||
Three machines on Tailscale:
|
||||
|
||||
| Machine | Role | Key specs |
|
||||
|---------|------|-----------|
|
||||
| koala | GPU inference, heavy compute | RTX 5070, runs llama-swap, Qdrant |
|
||||
| iguana | Services, builds | M2 Ultra Mac |
|
||||
| flamingo | Daily driver, edge | Mac mini, ~/dev is here |
|
||||
|
||||
- **Model routing**: LiteLLM in front of llama-swap (local) + cloud APIs (when permitted)
|
||||
- **Orchestration**: k3s cluster across all three machines
|
||||
- **Networking**: Tailscale mesh
|
||||
|
||||
## Project landscape
|
||||
|
||||
All development repos live at `~/dev/` (softlink from `~/Documents/local-dev/`).
|
||||
|
||||
Organized in thematic folders:
|
||||
|
||||
| Folder | Focus | Count |
|
||||
|--------|-------|-------|
|
||||
| `GO/` | Go web frameworks, API integrations, learning projects | ~10 |
|
||||
| `AI/` | ML research, AI frameworks (FinRL, DSPy, crawl4ai) | ~6 |
|
||||
| `AGENTS/` | Autonomous agents, coding agents, MCP servers, infra | ~15 |
|
||||
| `QKX/` | Invoice processing, financial automation, payment systems | ~13 |
|
||||
| `XT/` | Climate data, sustainability (Klimatkollen, Garbo) | ~2 |
|
||||
|
||||
See `~/dev/PROJECT_SUMMARY.md` for detailed descriptions of each project.
|
||||
|
||||
### Key active projects
|
||||
|
||||
- **super-koala** (`AGENTS/`) — multi-component agent stack with LangGraph, DSPy, MCP
|
||||
- **azure-tiger** (`QKX/`) — invoice extraction → ISO 20022 payment instructions
|
||||
- **gocrwl** (`AGENTS/`) — Go web crawler with containerized deployment
|
||||
- **koala-ai-stack** (`AGENTS/`) — local AI server infrastructure management
|
||||
- **klimatkollen** (`XT/`) — Swedish municipal climate data platform
|
||||
|
||||
## Knowledge base
|
||||
|
||||
When available, agents can query the shared knowledge base:
|
||||
|
||||
- **MCP**: `mcp://hyperguild.<TAILNET>.ts.net:3100/knowledge`
|
||||
- **HTTP**: `http://hyperguild.<TAILNET>.ts.net:3100/api/v1/search`
|
||||
|
||||
<!-- TODO: replace <TAILNET> placeholder with the real Tailscale tailnet
|
||||
name once hyperguild is deployed. Until then, agents that try to
|
||||
reach the knowledge service on a host where it isn't running will
|
||||
get DNS NXDOMAIN, which is the desired fail-loudly behavior. -->
|
||||
- **Scoping**: defaults to `public` collection; client projects filter to `{client}` + `public`
|
||||
|
||||
## Client work rules
|
||||
|
||||
When working on a project tagged with a client name:
|
||||
1. Never send code, data, or context to cloud APIs — use local models only
|
||||
2. Never reference other client projects or their data
|
||||
3. Keep all artifacts within the client's git org / directory
|
||||
4. Treat everything as confidential unless told otherwise
|
||||
|
||||
## Harness-agnostic principles
|
||||
|
||||
This context is designed to work with any AI coding tool:
|
||||
- Claude Code, Cursor, Aider, Open WebUI, Charmbracelet Mods/Crush
|
||||
- Pi Coding Agent, Mistral Vibe, Antigravity
|
||||
- Any tool that accepts a system prompt or reads a markdown context file
|
||||
|
||||
The canonical source is always `.context/AGENT.md` (root) and `.context/PROJECT.md` (per-project).
|
||||
Derived files are committed (see *How context propagates* below) so a `git pull` on any host yields full agent context with no setup.
|
||||
|
||||
## How context propagates
|
||||
|
||||
Canonical sources of truth:
|
||||
- Universal: `~/dev/.context/AGENT.md` (this file)
|
||||
- Project: `<repo>/.context/PROJECT.md` (per-repo)
|
||||
|
||||
Derived files (committed, regenerated by `task context:sync`):
|
||||
- `CLAUDE.md`, `AGENTS.md`, `.cursorrules`, `.aider.conventions.md`,
|
||||
`.context/system-prompt.txt`
|
||||
|
||||
Workflow:
|
||||
1. Edit a canonical file. Run `task context:sync`. Commit canonical and
|
||||
derived together. Push.
|
||||
2. On any other host, `git pull` brings both. Claude Code (tree-walking)
|
||||
uses `CLAUDE.md`; Crush / Pi / Antigravity (cwd-only) use `AGENTS.md`;
|
||||
Cursor uses `.cursorrules`; Aider uses `.aider.conventions.md`.
|
||||
3. `task check` runs `context:sync` then asserts `git status --porcelain`
|
||||
is empty over the derived files (catches both modified-tracked drift
|
||||
and missing-untracked adapters). A drift fails the check with a
|
||||
message telling you to stage the regenerated files.
|
||||
|
||||
Escape hatch: a derived file containing `<!-- HANDROLLED: do not regenerate -->`
|
||||
near the top is skipped by sync. Used for ops repos where the derived file
|
||||
is the canonical safety doc (e.g. `~/dev/AI/infra/CLAUDE.md`).
|
||||
|
||||
Behavior rules in this file and per-project rules in `PROJECT.md` apply
|
||||
unconditionally on every host, every harness.
|
||||
|
||||
## Engineering Skills
|
||||
|
||||
Shared engineering skills are available in `~/dev/.skills/`. Load on demand via the index.
|
||||
|
||||
See `~/dev/.skills/SKILLS_INDEX.md` for the full list with descriptions and "use when" triggers.
|
||||
|
||||
Key skills:
|
||||
- **TDD**: always write tests first — load `tdd` skill
|
||||
- **Code Review**: load `code-review` skill before any review
|
||||
- **SOLID/Clean Code**: load `solid` or `clean-code` skill for design work
|
||||
- **Problem first**: load `problem-analysis` skill before coding non-trivial features
|
||||
|
||||
---
|
||||
|
||||
# Project context
|
||||
|
||||
<!-- Canonical project context. Edit this, run `task context:sync`.
|
||||
Root agent context from ~/dev/.context/AGENT.md is automatically
|
||||
prepended for harnesses that don't walk the directory tree. -->
|
||||
|
||||
## Identity
|
||||
|
||||
- **Name**: supervisor
|
||||
- **Owner**: Mathias
|
||||
- **Client**: personal
|
||||
- **Repo**:
|
||||
- **Status**: active
|
||||
|
||||
## Stack
|
||||
|
||||
- **Primary language**: Go
|
||||
- **UI layer**: HTMX + Templ (when applicable)
|
||||
- **Fallback languages**: Python, TypeScript (justify in PR if used)
|
||||
- **Build**: Task (taskfile.dev), not Make
|
||||
- **Containers**: Docker (compose for dev, k3s for deploy)
|
||||
- **Target infra**: koala (GPU workloads), iguana (services), flamingo (edge)
|
||||
|
||||
## Conventions
|
||||
|
||||
### Code style
|
||||
- Go: follow `golines`, `gofumpt`, `golangci-lint` with project config
|
||||
- Tests: table-driven, in `_test.go` next to source, `testify` for assertions
|
||||
- Errors: wrap with `fmt.Errorf("operation: %w", err)`, no naked returns
|
||||
- Naming: stdlib conventions, no stuttering (`http.Client` not `http.HTTPClient`)
|
||||
|
||||
### Architecture preferences
|
||||
- Prefer standard library over frameworks (net/http over gin/echo)
|
||||
- Dependency injection via constructor functions, not containers
|
||||
- Configuration via environment variables, parsed at startup into a typed struct
|
||||
- Structured logging via `slog`
|
||||
|
||||
### Git
|
||||
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
|
||||
- Branch naming: `feat/short-description`, `fix/short-description`
|
||||
- PRs: one concern per PR, description explains *why* not *what*
|
||||
|
||||
### Security
|
||||
- No secrets in code, ever — use env vars or SOPS-encrypted files
|
||||
- Client data never leaves local network unless explicitly cleared
|
||||
- Dependencies: audit with `govulncheck` before adding
|
||||
|
||||
## Knowledge base access
|
||||
|
||||
This project can query the shared knowledge base via MCP or HTTP:
|
||||
|
||||
- **MCP endpoint**: `mcp://localhost:3100/knowledge`
|
||||
- **HTTP fallback**: `http://localhost:3100/api/v1/search`
|
||||
- **Scoping**: queries are filtered to collection `personal` + `public`
|
||||
|
||||
## Agent instructions
|
||||
|
||||
When acting as a coding agent on this project:
|
||||
|
||||
1. Read this file and all `SKILL.md` files in `.skills/` before starting work
|
||||
2. Run `task check` before committing (lint + test + vet)
|
||||
3. If unsure about a convention, check `DECISIONS.md` or ask
|
||||
4. Never modify files outside the project root without explicit permission
|
||||
5. When adding a dependency, explain why in the commit message
|
||||
6. For client projects: never send code or context to cloud APIs — use local models via LiteLLM
|
||||
247
.context/system-prompt.txt
Normal file
247
.context/system-prompt.txt
Normal file
@@ -0,0 +1,247 @@
|
||||
You are a coding assistant working on a specific project.
|
||||
Follow all conventions from both the root agent context and project context.
|
||||
|
||||
---
|
||||
|
||||
# Agent context — Mathias workspace
|
||||
|
||||
<!-- Canonical root context for all AI coding agents.
|
||||
Lives at: ~/dev/.context/AGENT.md
|
||||
Applies to every project under ~/dev/ unless overridden.
|
||||
|
||||
Run `task context:sync` from ~/dev/ to regenerate harness-specific files.
|
||||
Project-level context in .context/PROJECT.md layers on top of this. -->
|
||||
|
||||
## Who I am
|
||||
|
||||
I'm Mathias, a digital product manager and technology consultant based in Sweden.
|
||||
I build software, research emerging tech, and deliver consulting engagements
|
||||
for clients under NDA. I work across AI/ML, financial automation, web applications,
|
||||
and climate/sustainability tech.
|
||||
|
||||
## How I work with agents
|
||||
|
||||
- I think like a product manager — I care about *why* before *how*
|
||||
- I want agents to be opinionated and push back, not just execute blindly
|
||||
- I prefer concise responses; skip ceremony and get to the point
|
||||
- When I say "build this", I mean production-quality with tests, not a demo
|
||||
- Ask me before making irreversible changes or adding heavy dependencies
|
||||
- I work with confidential client data — never send it to cloud APIs unless I explicitly say it's OK
|
||||
|
||||
## Behavior rules
|
||||
|
||||
These rules apply to every task across every project, regardless of harness.
|
||||
|
||||
1. **No assumptions.** Don't hide confusion — surface it. Surface tradeoffs explicitly.
|
||||
Think before coding; if the problem is unclear, ask or state assumptions before acting.
|
||||
2. **Minimum viable code.** Solve with the smallest change that works. Nothing
|
||||
speculative, no "while we're here" cleanups, no premature abstractions. Simplicity first.
|
||||
3. **Surgical changes.** Touch only what the task requires. Leave unrelated code,
|
||||
files, and formatting alone. Diffs should be small and reviewable.
|
||||
4. **Goal-driven execution.** Define clear success criteria up front for every task.
|
||||
Loop — implement, verify, refine — until those criteria are met. Don't claim
|
||||
completion without evidence (tests pass, command output, observed behavior).
|
||||
|
||||
## Default stack
|
||||
|
||||
| Layer | Default | Fallback | Last resort |
|
||||
|-------|---------|----------|-------------|
|
||||
| Language | Go | Python | TypeScript, Java, C |
|
||||
| UI | HTMX + Templ | Server-rendered HTML | React (only if SPA is justified) |
|
||||
| Build | Task (taskfile.dev) | Make | — |
|
||||
| Containers | Docker Compose (dev), k3s (prod) | — | — |
|
||||
| DB | PostgreSQL + sqlc | SQLite | — |
|
||||
| Search | Qdrant (vector), BM25 | — | — |
|
||||
| Logging | slog (structured) | — | — |
|
||||
| Testing | Table-driven, testify | — | — |
|
||||
|
||||
Exploratory: Rust, Zig — I'll tell you when I want these.
|
||||
|
||||
## Code conventions
|
||||
|
||||
- **Go style**: golines, gofumpt, golangci-lint
|
||||
- **Errors**: `fmt.Errorf("operation: %w", err)` — never naked, never log-and-return
|
||||
- **Naming**: stdlib conventions, no stuttering
|
||||
- **Architecture**: prefer stdlib over frameworks, constructor injection, env-var config parsed into typed structs
|
||||
- **Git**: conventional commits (`feat:`, `fix:`, `chore:`), one concern per PR, PR describes *why* not *what*
|
||||
- **Security**: no secrets in code, govulncheck before adding deps, SOPS for encrypted config
|
||||
- **Dependencies**: prefer stdlib. testify, slog, templ, sqlc are pre-approved; anything else needs justification in the commit message
|
||||
|
||||
## Infrastructure
|
||||
|
||||
Three machines on Tailscale:
|
||||
|
||||
| Machine | Role | Key specs |
|
||||
|---------|------|-----------|
|
||||
| koala | GPU inference, heavy compute | RTX 5070, runs llama-swap, Qdrant |
|
||||
| iguana | Services, builds | M2 Ultra Mac |
|
||||
| flamingo | Daily driver, edge | Mac mini, ~/dev is here |
|
||||
|
||||
- **Model routing**: LiteLLM in front of llama-swap (local) + cloud APIs (when permitted)
|
||||
- **Orchestration**: k3s cluster across all three machines
|
||||
- **Networking**: Tailscale mesh
|
||||
|
||||
## Project landscape
|
||||
|
||||
All development repos live at `~/dev/` (softlink from `~/Documents/local-dev/`).
|
||||
|
||||
Organized in thematic folders:
|
||||
|
||||
| Folder | Focus | Count |
|
||||
|--------|-------|-------|
|
||||
| `GO/` | Go web frameworks, API integrations, learning projects | ~10 |
|
||||
| `AI/` | ML research, AI frameworks (FinRL, DSPy, crawl4ai) | ~6 |
|
||||
| `AGENTS/` | Autonomous agents, coding agents, MCP servers, infra | ~15 |
|
||||
| `QKX/` | Invoice processing, financial automation, payment systems | ~13 |
|
||||
| `XT/` | Climate data, sustainability (Klimatkollen, Garbo) | ~2 |
|
||||
|
||||
See `~/dev/PROJECT_SUMMARY.md` for detailed descriptions of each project.
|
||||
|
||||
### Key active projects
|
||||
|
||||
- **super-koala** (`AGENTS/`) — multi-component agent stack with LangGraph, DSPy, MCP
|
||||
- **azure-tiger** (`QKX/`) — invoice extraction → ISO 20022 payment instructions
|
||||
- **gocrwl** (`AGENTS/`) — Go web crawler with containerized deployment
|
||||
- **koala-ai-stack** (`AGENTS/`) — local AI server infrastructure management
|
||||
- **klimatkollen** (`XT/`) — Swedish municipal climate data platform
|
||||
|
||||
## Knowledge base
|
||||
|
||||
When available, agents can query the shared knowledge base:
|
||||
|
||||
- **MCP**: `mcp://hyperguild.<TAILNET>.ts.net:3100/knowledge`
|
||||
- **HTTP**: `http://hyperguild.<TAILNET>.ts.net:3100/api/v1/search`
|
||||
|
||||
<!-- TODO: replace <TAILNET> placeholder with the real Tailscale tailnet
|
||||
name once hyperguild is deployed. Until then, agents that try to
|
||||
reach the knowledge service on a host where it isn't running will
|
||||
get DNS NXDOMAIN, which is the desired fail-loudly behavior. -->
|
||||
- **Scoping**: defaults to `public` collection; client projects filter to `{client}` + `public`
|
||||
|
||||
## Client work rules
|
||||
|
||||
When working on a project tagged with a client name:
|
||||
1. Never send code, data, or context to cloud APIs — use local models only
|
||||
2. Never reference other client projects or their data
|
||||
3. Keep all artifacts within the client's git org / directory
|
||||
4. Treat everything as confidential unless told otherwise
|
||||
|
||||
## Harness-agnostic principles
|
||||
|
||||
This context is designed to work with any AI coding tool:
|
||||
- Claude Code, Cursor, Aider, Open WebUI, Charmbracelet Mods/Crush
|
||||
- Pi Coding Agent, Mistral Vibe, Antigravity
|
||||
- Any tool that accepts a system prompt or reads a markdown context file
|
||||
|
||||
The canonical source is always `.context/AGENT.md` (root) and `.context/PROJECT.md` (per-project).
|
||||
Derived files are committed (see *How context propagates* below) so a `git pull` on any host yields full agent context with no setup.
|
||||
|
||||
## How context propagates
|
||||
|
||||
Canonical sources of truth:
|
||||
- Universal: `~/dev/.context/AGENT.md` (this file)
|
||||
- Project: `<repo>/.context/PROJECT.md` (per-repo)
|
||||
|
||||
Derived files (committed, regenerated by `task context:sync`):
|
||||
- `CLAUDE.md`, `AGENTS.md`, `.cursorrules`, `.aider.conventions.md`,
|
||||
`.context/system-prompt.txt`
|
||||
|
||||
Workflow:
|
||||
1. Edit a canonical file. Run `task context:sync`. Commit canonical and
|
||||
derived together. Push.
|
||||
2. On any other host, `git pull` brings both. Claude Code (tree-walking)
|
||||
uses `CLAUDE.md`; Crush / Pi / Antigravity (cwd-only) use `AGENTS.md`;
|
||||
Cursor uses `.cursorrules`; Aider uses `.aider.conventions.md`.
|
||||
3. `task check` runs `context:sync` then asserts `git status --porcelain`
|
||||
is empty over the derived files (catches both modified-tracked drift
|
||||
and missing-untracked adapters). A drift fails the check with a
|
||||
message telling you to stage the regenerated files.
|
||||
|
||||
Escape hatch: a derived file containing `<!-- HANDROLLED: do not regenerate -->`
|
||||
near the top is skipped by sync. Used for ops repos where the derived file
|
||||
is the canonical safety doc (e.g. `~/dev/AI/infra/CLAUDE.md`).
|
||||
|
||||
Behavior rules in this file and per-project rules in `PROJECT.md` apply
|
||||
unconditionally on every host, every harness.
|
||||
|
||||
## Engineering Skills
|
||||
|
||||
Shared engineering skills are available in `~/dev/.skills/`. Load on demand via the index.
|
||||
|
||||
See `~/dev/.skills/SKILLS_INDEX.md` for the full list with descriptions and "use when" triggers.
|
||||
|
||||
Key skills:
|
||||
- **TDD**: always write tests first — load `tdd` skill
|
||||
- **Code Review**: load `code-review` skill before any review
|
||||
- **SOLID/Clean Code**: load `solid` or `clean-code` skill for design work
|
||||
- **Problem first**: load `problem-analysis` skill before coding non-trivial features
|
||||
|
||||
---
|
||||
|
||||
# Project context
|
||||
|
||||
<!-- Canonical project context. Edit this, run `task context:sync`.
|
||||
Root agent context from ~/dev/.context/AGENT.md is automatically
|
||||
prepended for harnesses that don't walk the directory tree. -->
|
||||
|
||||
## Identity
|
||||
|
||||
- **Name**: supervisor
|
||||
- **Owner**: Mathias
|
||||
- **Client**: personal
|
||||
- **Repo**:
|
||||
- **Status**: active
|
||||
|
||||
## Stack
|
||||
|
||||
- **Primary language**: Go
|
||||
- **UI layer**: HTMX + Templ (when applicable)
|
||||
- **Fallback languages**: Python, TypeScript (justify in PR if used)
|
||||
- **Build**: Task (taskfile.dev), not Make
|
||||
- **Containers**: Docker (compose for dev, k3s for deploy)
|
||||
- **Target infra**: koala (GPU workloads), iguana (services), flamingo (edge)
|
||||
|
||||
## Conventions
|
||||
|
||||
### Code style
|
||||
- Go: follow `golines`, `gofumpt`, `golangci-lint` with project config
|
||||
- Tests: table-driven, in `_test.go` next to source, `testify` for assertions
|
||||
- Errors: wrap with `fmt.Errorf("operation: %w", err)`, no naked returns
|
||||
- Naming: stdlib conventions, no stuttering (`http.Client` not `http.HTTPClient`)
|
||||
|
||||
### Architecture preferences
|
||||
- Prefer standard library over frameworks (net/http over gin/echo)
|
||||
- Dependency injection via constructor functions, not containers
|
||||
- Configuration via environment variables, parsed at startup into a typed struct
|
||||
- Structured logging via `slog`
|
||||
|
||||
### Git
|
||||
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
|
||||
- Branch naming: `feat/short-description`, `fix/short-description`
|
||||
- PRs: one concern per PR, description explains *why* not *what*
|
||||
|
||||
### Security
|
||||
- No secrets in code, ever — use env vars or SOPS-encrypted files
|
||||
- Client data never leaves local network unless explicitly cleared
|
||||
- Dependencies: audit with `govulncheck` before adding
|
||||
|
||||
## Knowledge base access
|
||||
|
||||
This project can query the shared knowledge base via MCP or HTTP:
|
||||
|
||||
- **MCP endpoint**: `mcp://localhost:3100/knowledge`
|
||||
- **HTTP fallback**: `http://localhost:3100/api/v1/search`
|
||||
- **Scoping**: queries are filtered to collection `personal` + `public`
|
||||
|
||||
## Agent instructions
|
||||
|
||||
When acting as a coding agent on this project:
|
||||
|
||||
1. Read this file and all `SKILL.md` files in `.skills/` before starting work
|
||||
2. Run `task check` before committing (lint + test + vet)
|
||||
3. If unsure about a convention, check `DECISIONS.md` or ask
|
||||
4. Never modify files outside the project root without explicit permission
|
||||
5. When adding a dependency, explain why in the commit message
|
||||
6. For client projects: never send code or context to cloud APIs — use local models via LiteLLM
|
||||
|
||||
---
|
||||
243
.cursorrules
Normal file
243
.cursorrules
Normal file
@@ -0,0 +1,243 @@
|
||||
# Cursor rules — auto-generated
|
||||
# Do not edit. Run: task context:sync
|
||||
|
||||
# Agent context — Mathias workspace
|
||||
|
||||
<!-- Canonical root context for all AI coding agents.
|
||||
Lives at: ~/dev/.context/AGENT.md
|
||||
Applies to every project under ~/dev/ unless overridden.
|
||||
|
||||
Run `task context:sync` from ~/dev/ to regenerate harness-specific files.
|
||||
Project-level context in .context/PROJECT.md layers on top of this. -->
|
||||
|
||||
## Who I am
|
||||
|
||||
I'm Mathias, a digital product manager and technology consultant based in Sweden.
|
||||
I build software, research emerging tech, and deliver consulting engagements
|
||||
for clients under NDA. I work across AI/ML, financial automation, web applications,
|
||||
and climate/sustainability tech.
|
||||
|
||||
## How I work with agents
|
||||
|
||||
- I think like a product manager — I care about *why* before *how*
|
||||
- I want agents to be opinionated and push back, not just execute blindly
|
||||
- I prefer concise responses; skip ceremony and get to the point
|
||||
- When I say "build this", I mean production-quality with tests, not a demo
|
||||
- Ask me before making irreversible changes or adding heavy dependencies
|
||||
- I work with confidential client data — never send it to cloud APIs unless I explicitly say it's OK
|
||||
|
||||
## Behavior rules
|
||||
|
||||
These rules apply to every task across every project, regardless of harness.
|
||||
|
||||
1. **No assumptions.** Don't hide confusion — surface it. Surface tradeoffs explicitly.
|
||||
Think before coding; if the problem is unclear, ask or state assumptions before acting.
|
||||
2. **Minimum viable code.** Solve with the smallest change that works. Nothing
|
||||
speculative, no "while we're here" cleanups, no premature abstractions. Simplicity first.
|
||||
3. **Surgical changes.** Touch only what the task requires. Leave unrelated code,
|
||||
files, and formatting alone. Diffs should be small and reviewable.
|
||||
4. **Goal-driven execution.** Define clear success criteria up front for every task.
|
||||
Loop — implement, verify, refine — until those criteria are met. Don't claim
|
||||
completion without evidence (tests pass, command output, observed behavior).
|
||||
|
||||
## Default stack
|
||||
|
||||
| Layer | Default | Fallback | Last resort |
|
||||
|-------|---------|----------|-------------|
|
||||
| Language | Go | Python | TypeScript, Java, C |
|
||||
| UI | HTMX + Templ | Server-rendered HTML | React (only if SPA is justified) |
|
||||
| Build | Task (taskfile.dev) | Make | — |
|
||||
| Containers | Docker Compose (dev), k3s (prod) | — | — |
|
||||
| DB | PostgreSQL + sqlc | SQLite | — |
|
||||
| Search | Qdrant (vector), BM25 | — | — |
|
||||
| Logging | slog (structured) | — | — |
|
||||
| Testing | Table-driven, testify | — | — |
|
||||
|
||||
Exploratory: Rust, Zig — I'll tell you when I want these.
|
||||
|
||||
## Code conventions
|
||||
|
||||
- **Go style**: golines, gofumpt, golangci-lint
|
||||
- **Errors**: `fmt.Errorf("operation: %w", err)` — never naked, never log-and-return
|
||||
- **Naming**: stdlib conventions, no stuttering
|
||||
- **Architecture**: prefer stdlib over frameworks, constructor injection, env-var config parsed into typed structs
|
||||
- **Git**: conventional commits (`feat:`, `fix:`, `chore:`), one concern per PR, PR describes *why* not *what*
|
||||
- **Security**: no secrets in code, govulncheck before adding deps, SOPS for encrypted config
|
||||
- **Dependencies**: prefer stdlib. testify, slog, templ, sqlc are pre-approved; anything else needs justification in the commit message
|
||||
|
||||
## Infrastructure
|
||||
|
||||
Three machines on Tailscale:
|
||||
|
||||
| Machine | Role | Key specs |
|
||||
|---------|------|-----------|
|
||||
| koala | GPU inference, heavy compute | RTX 5070, runs llama-swap, Qdrant |
|
||||
| iguana | Services, builds | M2 Ultra Mac |
|
||||
| flamingo | Daily driver, edge | Mac mini, ~/dev is here |
|
||||
|
||||
- **Model routing**: LiteLLM in front of llama-swap (local) + cloud APIs (when permitted)
|
||||
- **Orchestration**: k3s cluster across all three machines
|
||||
- **Networking**: Tailscale mesh
|
||||
|
||||
## Project landscape
|
||||
|
||||
All development repos live at `~/dev/` (softlink from `~/Documents/local-dev/`).
|
||||
|
||||
Organized in thematic folders:
|
||||
|
||||
| Folder | Focus | Count |
|
||||
|--------|-------|-------|
|
||||
| `GO/` | Go web frameworks, API integrations, learning projects | ~10 |
|
||||
| `AI/` | ML research, AI frameworks (FinRL, DSPy, crawl4ai) | ~6 |
|
||||
| `AGENTS/` | Autonomous agents, coding agents, MCP servers, infra | ~15 |
|
||||
| `QKX/` | Invoice processing, financial automation, payment systems | ~13 |
|
||||
| `XT/` | Climate data, sustainability (Klimatkollen, Garbo) | ~2 |
|
||||
|
||||
See `~/dev/PROJECT_SUMMARY.md` for detailed descriptions of each project.
|
||||
|
||||
### Key active projects
|
||||
|
||||
- **super-koala** (`AGENTS/`) — multi-component agent stack with LangGraph, DSPy, MCP
|
||||
- **azure-tiger** (`QKX/`) — invoice extraction → ISO 20022 payment instructions
|
||||
- **gocrwl** (`AGENTS/`) — Go web crawler with containerized deployment
|
||||
- **koala-ai-stack** (`AGENTS/`) — local AI server infrastructure management
|
||||
- **klimatkollen** (`XT/`) — Swedish municipal climate data platform
|
||||
|
||||
## Knowledge base
|
||||
|
||||
When available, agents can query the shared knowledge base:
|
||||
|
||||
- **MCP**: `mcp://hyperguild.<TAILNET>.ts.net:3100/knowledge`
|
||||
- **HTTP**: `http://hyperguild.<TAILNET>.ts.net:3100/api/v1/search`
|
||||
|
||||
<!-- TODO: replace <TAILNET> placeholder with the real Tailscale tailnet
|
||||
name once hyperguild is deployed. Until then, agents that try to
|
||||
reach the knowledge service on a host where it isn't running will
|
||||
get DNS NXDOMAIN, which is the desired fail-loudly behavior. -->
|
||||
- **Scoping**: defaults to `public` collection; client projects filter to `{client}` + `public`
|
||||
|
||||
## Client work rules
|
||||
|
||||
When working on a project tagged with a client name:
|
||||
1. Never send code, data, or context to cloud APIs — use local models only
|
||||
2. Never reference other client projects or their data
|
||||
3. Keep all artifacts within the client's git org / directory
|
||||
4. Treat everything as confidential unless told otherwise
|
||||
|
||||
## Harness-agnostic principles
|
||||
|
||||
This context is designed to work with any AI coding tool:
|
||||
- Claude Code, Cursor, Aider, Open WebUI, Charmbracelet Mods/Crush
|
||||
- Pi Coding Agent, Mistral Vibe, Antigravity
|
||||
- Any tool that accepts a system prompt or reads a markdown context file
|
||||
|
||||
The canonical source is always `.context/AGENT.md` (root) and `.context/PROJECT.md` (per-project).
|
||||
Derived files are committed (see *How context propagates* below) so a `git pull` on any host yields full agent context with no setup.
|
||||
|
||||
## How context propagates
|
||||
|
||||
Canonical sources of truth:
|
||||
- Universal: `~/dev/.context/AGENT.md` (this file)
|
||||
- Project: `<repo>/.context/PROJECT.md` (per-repo)
|
||||
|
||||
Derived files (committed, regenerated by `task context:sync`):
|
||||
- `CLAUDE.md`, `AGENTS.md`, `.cursorrules`, `.aider.conventions.md`,
|
||||
`.context/system-prompt.txt`
|
||||
|
||||
Workflow:
|
||||
1. Edit a canonical file. Run `task context:sync`. Commit canonical and
|
||||
derived together. Push.
|
||||
2. On any other host, `git pull` brings both. Claude Code (tree-walking)
|
||||
uses `CLAUDE.md`; Crush / Pi / Antigravity (cwd-only) use `AGENTS.md`;
|
||||
Cursor uses `.cursorrules`; Aider uses `.aider.conventions.md`.
|
||||
3. `task check` runs `context:sync` then asserts `git status --porcelain`
|
||||
is empty over the derived files (catches both modified-tracked drift
|
||||
and missing-untracked adapters). A drift fails the check with a
|
||||
message telling you to stage the regenerated files.
|
||||
|
||||
Escape hatch: a derived file containing `<!-- HANDROLLED: do not regenerate -->`
|
||||
near the top is skipped by sync. Used for ops repos where the derived file
|
||||
is the canonical safety doc (e.g. `~/dev/AI/infra/CLAUDE.md`).
|
||||
|
||||
Behavior rules in this file and per-project rules in `PROJECT.md` apply
|
||||
unconditionally on every host, every harness.
|
||||
|
||||
## Engineering Skills
|
||||
|
||||
Shared engineering skills are available in `~/dev/.skills/`. Load on demand via the index.
|
||||
|
||||
See `~/dev/.skills/SKILLS_INDEX.md` for the full list with descriptions and "use when" triggers.
|
||||
|
||||
Key skills:
|
||||
- **TDD**: always write tests first — load `tdd` skill
|
||||
- **Code Review**: load `code-review` skill before any review
|
||||
- **SOLID/Clean Code**: load `solid` or `clean-code` skill for design work
|
||||
- **Problem first**: load `problem-analysis` skill before coding non-trivial features
|
||||
|
||||
---
|
||||
|
||||
# Project context
|
||||
|
||||
<!-- Canonical project context. Edit this, run `task context:sync`.
|
||||
Root agent context from ~/dev/.context/AGENT.md is automatically
|
||||
prepended for harnesses that don't walk the directory tree. -->
|
||||
|
||||
## Identity
|
||||
|
||||
- **Name**: supervisor
|
||||
- **Owner**: Mathias
|
||||
- **Client**: personal
|
||||
- **Repo**:
|
||||
- **Status**: active
|
||||
|
||||
## Stack
|
||||
|
||||
- **Primary language**: Go
|
||||
- **UI layer**: HTMX + Templ (when applicable)
|
||||
- **Fallback languages**: Python, TypeScript (justify in PR if used)
|
||||
- **Build**: Task (taskfile.dev), not Make
|
||||
- **Containers**: Docker (compose for dev, k3s for deploy)
|
||||
- **Target infra**: koala (GPU workloads), iguana (services), flamingo (edge)
|
||||
|
||||
## Conventions
|
||||
|
||||
### Code style
|
||||
- Go: follow `golines`, `gofumpt`, `golangci-lint` with project config
|
||||
- Tests: table-driven, in `_test.go` next to source, `testify` for assertions
|
||||
- Errors: wrap with `fmt.Errorf("operation: %w", err)`, no naked returns
|
||||
- Naming: stdlib conventions, no stuttering (`http.Client` not `http.HTTPClient`)
|
||||
|
||||
### Architecture preferences
|
||||
- Prefer standard library over frameworks (net/http over gin/echo)
|
||||
- Dependency injection via constructor functions, not containers
|
||||
- Configuration via environment variables, parsed at startup into a typed struct
|
||||
- Structured logging via `slog`
|
||||
|
||||
### Git
|
||||
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
|
||||
- Branch naming: `feat/short-description`, `fix/short-description`
|
||||
- PRs: one concern per PR, description explains *why* not *what*
|
||||
|
||||
### Security
|
||||
- No secrets in code, ever — use env vars or SOPS-encrypted files
|
||||
- Client data never leaves local network unless explicitly cleared
|
||||
- Dependencies: audit with `govulncheck` before adding
|
||||
|
||||
## Knowledge base access
|
||||
|
||||
This project can query the shared knowledge base via MCP or HTTP:
|
||||
|
||||
- **MCP endpoint**: `mcp://localhost:3100/knowledge`
|
||||
- **HTTP fallback**: `http://localhost:3100/api/v1/search`
|
||||
- **Scoping**: queries are filtered to collection `personal` + `public`
|
||||
|
||||
## Agent instructions
|
||||
|
||||
When acting as a coding agent on this project:
|
||||
|
||||
1. Read this file and all `SKILL.md` files in `.skills/` before starting work
|
||||
2. Run `task check` before committing (lint + test + vet)
|
||||
3. If unsure about a convention, check `DECISIONS.md` or ask
|
||||
4. Never modify files outside the project root without explicit permission
|
||||
5. When adding a dependency, explain why in the commit message
|
||||
6. For client projects: never send code or context to cloud APIs — use local models via LiteLLM
|
||||
10
.dockerignore
Normal file
10
.dockerignore
Normal file
@@ -0,0 +1,10 @@
|
||||
.git
|
||||
.gitea
|
||||
.worktrees
|
||||
.DS_Store
|
||||
*.log
|
||||
.env*
|
||||
.vscode
|
||||
.idea
|
||||
bin/
|
||||
brain/
|
||||
93
.gitea/workflows/cd.yml
Normal file
93
.gitea/workflows/cd.yml
Normal file
@@ -0,0 +1,93 @@
|
||||
name: cd
|
||||
|
||||
on:
|
||||
workflow_run:
|
||||
workflows: ["CI"]
|
||||
types: [completed]
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
name: Build and deploy
|
||||
runs-on: self-hosted
|
||||
if: ${{ github.event.workflow_run.conclusion == 'success' && github.event.workflow_run.event == 'push' }}
|
||||
env:
|
||||
SERVICE: supervisor
|
||||
IMAGE: gitea.d-ma.be/mathias/supervisor
|
||||
INGESTION_IMAGE: gitea.d-ma.be/mathias/ingestion
|
||||
INFRA_REPO: git@gitea.d-ma.be:mathias/infra.git
|
||||
BUILDKIT_HOST: unix:///run/buildkit/buildkitd.sock
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Build and push supervisor image
|
||||
run: |
|
||||
set -e
|
||||
trap 'rm -f /tmp/supervisor-image.tar' EXIT
|
||||
IMAGE_TAG="${{ github.sha }}"
|
||||
echo "Building ${IMAGE}:${IMAGE_TAG}"
|
||||
|
||||
buildctl --addr "${BUILDKIT_HOST}" build \
|
||||
--frontend dockerfile.v0 \
|
||||
--local context=. \
|
||||
--local dockerfile=. \
|
||||
--opt build-arg:VERSION="${IMAGE_TAG}" \
|
||||
--output type=oci,dest=/tmp/supervisor-image.tar
|
||||
|
||||
skopeo copy \
|
||||
oci-archive:/tmp/supervisor-image.tar \
|
||||
docker://${IMAGE}:${IMAGE_TAG} \
|
||||
--dest-creds "${{ secrets.REGISTRY_CREDS }}"
|
||||
|
||||
echo "Built and pushed ${IMAGE}:${IMAGE_TAG}"
|
||||
|
||||
- name: Build and push ingestion image
|
||||
run: |
|
||||
set -e
|
||||
trap 'rm -f /tmp/ingestion-image.tar' EXIT
|
||||
IMAGE_TAG="${{ github.sha }}"
|
||||
echo "Building ${INGESTION_IMAGE}:${IMAGE_TAG}"
|
||||
|
||||
buildctl --addr "${BUILDKIT_HOST}" build \
|
||||
--frontend dockerfile.v0 \
|
||||
--local context=ingestion \
|
||||
--local dockerfile=ingestion \
|
||||
--output type=oci,dest=/tmp/ingestion-image.tar
|
||||
|
||||
skopeo copy \
|
||||
oci-archive:/tmp/ingestion-image.tar \
|
||||
docker://${INGESTION_IMAGE}:${IMAGE_TAG} \
|
||||
--dest-creds "${{ secrets.REGISTRY_CREDS }}"
|
||||
|
||||
echo "Built and pushed ${INGESTION_IMAGE}:${IMAGE_TAG}"
|
||||
|
||||
- name: Update infra repo
|
||||
run: |
|
||||
set -e
|
||||
trap 'rm -rf /tmp/infra-update; rm -f ~/.ssh/infra_deploy_key' EXIT
|
||||
IMAGE_TAG="${{ github.sha }}"
|
||||
mkdir -p ~/.ssh
|
||||
echo "${{ secrets.INFRA_DEPLOY_KEY }}" > ~/.ssh/infra_deploy_key
|
||||
chmod 600 ~/.ssh/infra_deploy_key
|
||||
printf 'Host gitea.d-ma.be\n HostName 127.0.0.1\n Port 30022\n StrictHostKeyChecking no\n' >> ~/.ssh/config
|
||||
|
||||
GIT_SSH_COMMAND="ssh -i ~/.ssh/infra_deploy_key -o IdentitiesOnly=yes" \
|
||||
git clone "${INFRA_REPO}" /tmp/infra-update
|
||||
|
||||
cd /tmp/infra-update
|
||||
|
||||
sed -i "s|gitea.d-ma.be/mathias/supervisor:.*|gitea.d-ma.be/mathias/supervisor:${IMAGE_TAG}|" \
|
||||
"k3s/apps/${SERVICE}/deployment.yaml"
|
||||
|
||||
sed -i "s|gitea.d-ma.be/mathias/ingestion:.*|gitea.d-ma.be/mathias/ingestion:${IMAGE_TAG}|" \
|
||||
"k3s/apps/${SERVICE}/ingestion-deployment.yaml"
|
||||
|
||||
git config user.email "cd-bot@d-ma.be"
|
||||
git config user.name "CD Bot"
|
||||
git add "k3s/apps/${SERVICE}/deployment.yaml" "k3s/apps/${SERVICE}/ingestion-deployment.yaml"
|
||||
git commit -m "chore(deploy): ${SERVICE}+ingestion → ${IMAGE_TAG}"
|
||||
GIT_SSH_COMMAND="ssh -i ~/.ssh/infra_deploy_key -o IdentitiesOnly=yes" \
|
||||
git push
|
||||
|
||||
echo "Infra repo updated: ${SERVICE}+ingestion → ${IMAGE_TAG}"
|
||||
@@ -53,6 +53,6 @@ jobs:
|
||||
chmod 600 ~/.ssh/id_rsa_gh_mirror
|
||||
ssh-keyscan github.com >> ~/.ssh/known_hosts 2>/dev/null
|
||||
GIT_SSH_COMMAND="ssh -i ~/.ssh/id_rsa_gh_mirror -o IdentitiesOnly=yes" \
|
||||
git push git@github.com:mathiasb/hyperguild.git HEAD:main --tags
|
||||
git push git@github.com:mathiasb/hyperguild.git HEAD:main --follow-tags
|
||||
rm ~/.ssh/id_rsa_gh_mirror
|
||||
echo "✓ Mirrored to GitHub"
|
||||
|
||||
9
.gitignore
vendored
9
.gitignore
vendored
@@ -13,15 +13,7 @@ brain/training-data/**/*.jsonl
|
||||
# Go
|
||||
vendor/
|
||||
|
||||
# ── Generated context files (adapter outputs) ──
|
||||
# Canonical sources: .context/PROJECT.md + .skills/*/SKILL.md
|
||||
# Everything below is disposable — regenerate with: task context:sync
|
||||
AGENTS.md
|
||||
CLAUDE.md
|
||||
.cursorrules
|
||||
.aider.conventions.md
|
||||
.aider.conf.yml
|
||||
.context/system-prompt.txt
|
||||
|
||||
# ── Sensitive ──
|
||||
.env
|
||||
@@ -34,6 +26,7 @@ secrets/
|
||||
# ── Documented examples (commit these) ──
|
||||
!.env.example
|
||||
!config/supervisor/CLAUDE.md
|
||||
!brain/CLAUDE.md
|
||||
|
||||
# IDE
|
||||
.idea/
|
||||
|
||||
8
.mcp.json
Normal file
8
.mcp.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"supervisor": {
|
||||
"type": "http",
|
||||
"url": "http://koala:30320/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
240
AGENTS.md
Normal file
240
AGENTS.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Agent context — Mathias workspace
|
||||
|
||||
<!-- Canonical root context for all AI coding agents.
|
||||
Lives at: ~/dev/.context/AGENT.md
|
||||
Applies to every project under ~/dev/ unless overridden.
|
||||
|
||||
Run `task context:sync` from ~/dev/ to regenerate harness-specific files.
|
||||
Project-level context in .context/PROJECT.md layers on top of this. -->
|
||||
|
||||
## Who I am
|
||||
|
||||
I'm Mathias, a digital product manager and technology consultant based in Sweden.
|
||||
I build software, research emerging tech, and deliver consulting engagements
|
||||
for clients under NDA. I work across AI/ML, financial automation, web applications,
|
||||
and climate/sustainability tech.
|
||||
|
||||
## How I work with agents
|
||||
|
||||
- I think like a product manager — I care about *why* before *how*
|
||||
- I want agents to be opinionated and push back, not just execute blindly
|
||||
- I prefer concise responses; skip ceremony and get to the point
|
||||
- When I say "build this", I mean production-quality with tests, not a demo
|
||||
- Ask me before making irreversible changes or adding heavy dependencies
|
||||
- I work with confidential client data — never send it to cloud APIs unless I explicitly say it's OK
|
||||
|
||||
## Behavior rules
|
||||
|
||||
These rules apply to every task across every project, regardless of harness.
|
||||
|
||||
1. **No assumptions.** Don't hide confusion — surface it. Surface tradeoffs explicitly.
|
||||
Think before coding; if the problem is unclear, ask or state assumptions before acting.
|
||||
2. **Minimum viable code.** Solve with the smallest change that works. Nothing
|
||||
speculative, no "while we're here" cleanups, no premature abstractions. Simplicity first.
|
||||
3. **Surgical changes.** Touch only what the task requires. Leave unrelated code,
|
||||
files, and formatting alone. Diffs should be small and reviewable.
|
||||
4. **Goal-driven execution.** Define clear success criteria up front for every task.
|
||||
Loop — implement, verify, refine — until those criteria are met. Don't claim
|
||||
completion without evidence (tests pass, command output, observed behavior).
|
||||
|
||||
## Default stack
|
||||
|
||||
| Layer | Default | Fallback | Last resort |
|
||||
|-------|---------|----------|-------------|
|
||||
| Language | Go | Python | TypeScript, Java, C |
|
||||
| UI | HTMX + Templ | Server-rendered HTML | React (only if SPA is justified) |
|
||||
| Build | Task (taskfile.dev) | Make | — |
|
||||
| Containers | Docker Compose (dev), k3s (prod) | — | — |
|
||||
| DB | PostgreSQL + sqlc | SQLite | — |
|
||||
| Search | Qdrant (vector), BM25 | — | — |
|
||||
| Logging | slog (structured) | — | — |
|
||||
| Testing | Table-driven, testify | — | — |
|
||||
|
||||
Exploratory: Rust, Zig — I'll tell you when I want these.
|
||||
|
||||
## Code conventions
|
||||
|
||||
- **Go style**: golines, gofumpt, golangci-lint
|
||||
- **Errors**: `fmt.Errorf("operation: %w", err)` — never naked, never log-and-return
|
||||
- **Naming**: stdlib conventions, no stuttering
|
||||
- **Architecture**: prefer stdlib over frameworks, constructor injection, env-var config parsed into typed structs
|
||||
- **Git**: conventional commits (`feat:`, `fix:`, `chore:`), one concern per PR, PR describes *why* not *what*
|
||||
- **Security**: no secrets in code, govulncheck before adding deps, SOPS for encrypted config
|
||||
- **Dependencies**: prefer stdlib. testify, slog, templ, sqlc are pre-approved; anything else needs justification in the commit message
|
||||
|
||||
## Infrastructure
|
||||
|
||||
Three machines on Tailscale:
|
||||
|
||||
| Machine | Role | Key specs |
|
||||
|---------|------|-----------|
|
||||
| koala | GPU inference, heavy compute | RTX 5070, runs llama-swap, Qdrant |
|
||||
| iguana | Services, builds | M2 Ultra Mac |
|
||||
| flamingo | Daily driver, edge | Mac mini, ~/dev is here |
|
||||
|
||||
- **Model routing**: LiteLLM in front of llama-swap (local) + cloud APIs (when permitted)
|
||||
- **Orchestration**: k3s cluster across all three machines
|
||||
- **Networking**: Tailscale mesh
|
||||
|
||||
## Project landscape
|
||||
|
||||
All development repos live at `~/dev/` (softlink from `~/Documents/local-dev/`).
|
||||
|
||||
Organized in thematic folders:
|
||||
|
||||
| Folder | Focus | Count |
|
||||
|--------|-------|-------|
|
||||
| `GO/` | Go web frameworks, API integrations, learning projects | ~10 |
|
||||
| `AI/` | ML research, AI frameworks (FinRL, DSPy, crawl4ai) | ~6 |
|
||||
| `AGENTS/` | Autonomous agents, coding agents, MCP servers, infra | ~15 |
|
||||
| `QKX/` | Invoice processing, financial automation, payment systems | ~13 |
|
||||
| `XT/` | Climate data, sustainability (Klimatkollen, Garbo) | ~2 |
|
||||
|
||||
See `~/dev/PROJECT_SUMMARY.md` for detailed descriptions of each project.
|
||||
|
||||
### Key active projects
|
||||
|
||||
- **super-koala** (`AGENTS/`) — multi-component agent stack with LangGraph, DSPy, MCP
|
||||
- **azure-tiger** (`QKX/`) — invoice extraction → ISO 20022 payment instructions
|
||||
- **gocrwl** (`AGENTS/`) — Go web crawler with containerized deployment
|
||||
- **koala-ai-stack** (`AGENTS/`) — local AI server infrastructure management
|
||||
- **klimatkollen** (`XT/`) — Swedish municipal climate data platform
|
||||
|
||||
## Knowledge base
|
||||
|
||||
When available, agents can query the shared knowledge base:
|
||||
|
||||
- **MCP**: `mcp://hyperguild.<TAILNET>.ts.net:3100/knowledge`
|
||||
- **HTTP**: `http://hyperguild.<TAILNET>.ts.net:3100/api/v1/search`
|
||||
|
||||
<!-- TODO: replace <TAILNET> placeholder with the real Tailscale tailnet
|
||||
name once hyperguild is deployed. Until then, agents that try to
|
||||
reach the knowledge service on a host where it isn't running will
|
||||
get DNS NXDOMAIN, which is the desired fail-loudly behavior. -->
|
||||
- **Scoping**: defaults to `public` collection; client projects filter to `{client}` + `public`
|
||||
|
||||
## Client work rules
|
||||
|
||||
When working on a project tagged with a client name:
|
||||
1. Never send code, data, or context to cloud APIs — use local models only
|
||||
2. Never reference other client projects or their data
|
||||
3. Keep all artifacts within the client's git org / directory
|
||||
4. Treat everything as confidential unless told otherwise
|
||||
|
||||
## Harness-agnostic principles
|
||||
|
||||
This context is designed to work with any AI coding tool:
|
||||
- Claude Code, Cursor, Aider, Open WebUI, Charmbracelet Mods/Crush
|
||||
- Pi Coding Agent, Mistral Vibe, Antigravity
|
||||
- Any tool that accepts a system prompt or reads a markdown context file
|
||||
|
||||
The canonical source is always `.context/AGENT.md` (root) and `.context/PROJECT.md` (per-project).
|
||||
Derived files are committed (see *How context propagates* below) so a `git pull` on any host yields full agent context with no setup.
|
||||
|
||||
## How context propagates
|
||||
|
||||
Canonical sources of truth:
|
||||
- Universal: `~/dev/.context/AGENT.md` (this file)
|
||||
- Project: `<repo>/.context/PROJECT.md` (per-repo)
|
||||
|
||||
Derived files (committed, regenerated by `task context:sync`):
|
||||
- `CLAUDE.md`, `AGENTS.md`, `.cursorrules`, `.aider.conventions.md`,
|
||||
`.context/system-prompt.txt`
|
||||
|
||||
Workflow:
|
||||
1. Edit a canonical file. Run `task context:sync`. Commit canonical and
|
||||
derived together. Push.
|
||||
2. On any other host, `git pull` brings both. Claude Code (tree-walking)
|
||||
uses `CLAUDE.md`; Crush / Pi / Antigravity (cwd-only) use `AGENTS.md`;
|
||||
Cursor uses `.cursorrules`; Aider uses `.aider.conventions.md`.
|
||||
3. `task check` runs `context:sync` then asserts `git status --porcelain`
|
||||
is empty over the derived files (catches both modified-tracked drift
|
||||
and missing-untracked adapters). A drift fails the check with a
|
||||
message telling you to stage the regenerated files.
|
||||
|
||||
Escape hatch: a derived file containing `<!-- HANDROLLED: do not regenerate -->`
|
||||
near the top is skipped by sync. Used for ops repos where the derived file
|
||||
is the canonical safety doc (e.g. `~/dev/AI/infra/CLAUDE.md`).
|
||||
|
||||
Behavior rules in this file and per-project rules in `PROJECT.md` apply
|
||||
unconditionally on every host, every harness.
|
||||
|
||||
## Engineering Skills
|
||||
|
||||
Shared engineering skills are available in `~/dev/.skills/`. Load on demand via the index.
|
||||
|
||||
See `~/dev/.skills/SKILLS_INDEX.md` for the full list with descriptions and "use when" triggers.
|
||||
|
||||
Key skills:
|
||||
- **TDD**: always write tests first — load `tdd` skill
|
||||
- **Code Review**: load `code-review` skill before any review
|
||||
- **SOLID/Clean Code**: load `solid` or `clean-code` skill for design work
|
||||
- **Problem first**: load `problem-analysis` skill before coding non-trivial features
|
||||
|
||||
---
|
||||
|
||||
# Project context
|
||||
|
||||
<!-- Canonical project context. Edit this, run `task context:sync`.
|
||||
Root agent context from ~/dev/.context/AGENT.md is automatically
|
||||
prepended for harnesses that don't walk the directory tree. -->
|
||||
|
||||
## Identity
|
||||
|
||||
- **Name**: supervisor
|
||||
- **Owner**: Mathias
|
||||
- **Client**: personal
|
||||
- **Repo**:
|
||||
- **Status**: active
|
||||
|
||||
## Stack
|
||||
|
||||
- **Primary language**: Go
|
||||
- **UI layer**: HTMX + Templ (when applicable)
|
||||
- **Fallback languages**: Python, TypeScript (justify in PR if used)
|
||||
- **Build**: Task (taskfile.dev), not Make
|
||||
- **Containers**: Docker (compose for dev, k3s for deploy)
|
||||
- **Target infra**: koala (GPU workloads), iguana (services), flamingo (edge)
|
||||
|
||||
## Conventions
|
||||
|
||||
### Code style
|
||||
- Go: follow `golines`, `gofumpt`, `golangci-lint` with project config
|
||||
- Tests: table-driven, in `_test.go` next to source, `testify` for assertions
|
||||
- Errors: wrap with `fmt.Errorf("operation: %w", err)`, no naked returns
|
||||
- Naming: stdlib conventions, no stuttering (`http.Client` not `http.HTTPClient`)
|
||||
|
||||
### Architecture preferences
|
||||
- Prefer standard library over frameworks (net/http over gin/echo)
|
||||
- Dependency injection via constructor functions, not containers
|
||||
- Configuration via environment variables, parsed at startup into a typed struct
|
||||
- Structured logging via `slog`
|
||||
|
||||
### Git
|
||||
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
|
||||
- Branch naming: `feat/short-description`, `fix/short-description`
|
||||
- PRs: one concern per PR, description explains *why* not *what*
|
||||
|
||||
### Security
|
||||
- No secrets in code, ever — use env vars or SOPS-encrypted files
|
||||
- Client data never leaves local network unless explicitly cleared
|
||||
- Dependencies: audit with `govulncheck` before adding
|
||||
|
||||
## Knowledge base access
|
||||
|
||||
This project can query the shared knowledge base via MCP or HTTP:
|
||||
|
||||
- **MCP endpoint**: `mcp://localhost:3100/knowledge`
|
||||
- **HTTP fallback**: `http://localhost:3100/api/v1/search`
|
||||
- **Scoping**: queries are filtered to collection `personal` + `public`
|
||||
|
||||
## Agent instructions
|
||||
|
||||
When acting as a coding agent on this project:
|
||||
|
||||
1. Read this file and all `SKILL.md` files in `.skills/` before starting work
|
||||
2. Run `task check` before committing (lint + test + vet)
|
||||
3. If unsure about a convention, check `DECISIONS.md` or ask
|
||||
4. Never modify files outside the project root without explicit permission
|
||||
5. When adding a dependency, explain why in the commit message
|
||||
6. For client projects: never send code or context to cloud APIs — use local models via LiteLLM
|
||||
65
CLAUDE.md
Normal file
65
CLAUDE.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Project context
|
||||
|
||||
<!-- Canonical project context. Edit this, run `task context:sync`.
|
||||
Root agent context from ~/dev/.context/AGENT.md is automatically
|
||||
prepended for harnesses that don't walk the directory tree. -->
|
||||
|
||||
## Identity
|
||||
|
||||
- **Name**: supervisor
|
||||
- **Owner**: Mathias
|
||||
- **Client**: personal
|
||||
- **Repo**:
|
||||
- **Status**: active
|
||||
|
||||
## Stack
|
||||
|
||||
- **Primary language**: Go
|
||||
- **UI layer**: HTMX + Templ (when applicable)
|
||||
- **Fallback languages**: Python, TypeScript (justify in PR if used)
|
||||
- **Build**: Task (taskfile.dev), not Make
|
||||
- **Containers**: Docker (compose for dev, k3s for deploy)
|
||||
- **Target infra**: koala (GPU workloads), iguana (services), flamingo (edge)
|
||||
|
||||
## Conventions
|
||||
|
||||
### Code style
|
||||
- Go: follow `golines`, `gofumpt`, `golangci-lint` with project config
|
||||
- Tests: table-driven, in `_test.go` next to source, `testify` for assertions
|
||||
- Errors: wrap with `fmt.Errorf("operation: %w", err)`, no naked returns
|
||||
- Naming: stdlib conventions, no stuttering (`http.Client` not `http.HTTPClient`)
|
||||
|
||||
### Architecture preferences
|
||||
- Prefer standard library over frameworks (net/http over gin/echo)
|
||||
- Dependency injection via constructor functions, not containers
|
||||
- Configuration via environment variables, parsed at startup into a typed struct
|
||||
- Structured logging via `slog`
|
||||
|
||||
### Git
|
||||
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
|
||||
- Branch naming: `feat/short-description`, `fix/short-description`
|
||||
- PRs: one concern per PR, description explains *why* not *what*
|
||||
|
||||
### Security
|
||||
- No secrets in code, ever — use env vars or SOPS-encrypted files
|
||||
- Client data never leaves local network unless explicitly cleared
|
||||
- Dependencies: audit with `govulncheck` before adding
|
||||
|
||||
## Knowledge base access
|
||||
|
||||
This project can query the shared knowledge base via MCP or HTTP:
|
||||
|
||||
- **MCP endpoint**: `mcp://localhost:3100/knowledge`
|
||||
- **HTTP fallback**: `http://localhost:3100/api/v1/search`
|
||||
- **Scoping**: queries are filtered to collection `personal` + `public`
|
||||
|
||||
## Agent instructions
|
||||
|
||||
When acting as a coding agent on this project:
|
||||
|
||||
1. Read this file and all `SKILL.md` files in `.skills/` before starting work
|
||||
2. Run `task check` before committing (lint + test + vet)
|
||||
3. If unsure about a convention, check `DECISIONS.md` or ask
|
||||
4. Never modify files outside the project root without explicit permission
|
||||
5. When adding a dependency, explain why in the commit message
|
||||
6. For client projects: never send code or context to cloud APIs — use local models via LiteLLM
|
||||
23
DECISIONS.md
23
DECISIONS.md
@@ -44,6 +44,29 @@ Record *why* things are the way they are. Future-you will thank present-you.
|
||||
|
||||
**Consequences**: More operational complexity than Chroma, but isolation is non-negotiable for client work.
|
||||
|
||||
## 2026-04-22 — Hyperguild scope reset: drop parametric learning, simplify brain
|
||||
|
||||
**Context**: After shipping Phases 1–4 (MCP server, 6 skills, model orchestration, session logging, CD pipeline), we critically reviewed what was theater vs genuinely useful.
|
||||
|
||||
**Decisions**:
|
||||
|
||||
1. **Drop the parametric learning pipeline.** SFT/DPO/RL extraction, `brain/training-data/` directory structure, Axolotl/LLaMA-Factory fine-tuning loop — all cut. The loop requires thousands of high-quality examples to move the needle, which a solo consultant won't generate. Better base models ship faster than any fine-tuning effort could keep up with. This is a research project, not a productivity tool.
|
||||
|
||||
2. **Simplify the brain to plain markdown.** `brain/knowledge/` replaces `brain/wiki/ + brain/raw/ + brain/training-data/`. The trainer and retrospective workers write markdown entries. `brain_query` searches markdown. No ingestion pipeline, no tagging for significance review, no structured JSONL formats.
|
||||
|
||||
3. **Measure the escalation chain before assuming it's useful.** Local model (phi4) only belongs in a skill's chain if it passes Claude verification at a meaningful rate. Where it fails >70% of the time, it adds cost not value. Per-skill hit rate logging is the prerequisite to honest chain configuration.
|
||||
|
||||
4. **Keep what's real**: MCP tool surface, session logging with attempt records, tier detection, CD pipeline, bridge to Claude Code.
|
||||
|
||||
**What to build next** (in priority order):
|
||||
- `brain_query` injection into skill handlers before spawning workers — this makes the declarative brain actually function
|
||||
- `protocols.md` — behavioral contract injected into every worker prompt
|
||||
- Per-skill pass rate logging and chain tuning
|
||||
|
||||
**Consequences**: Simpler system with a shorter feedback loop. The brain becomes real only when skill handlers query it. Training data ambitions deferred indefinitely — revisit if local model capabilities improve enough that fine-tuning becomes worthwhile.
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-08 — Mistral Vibe gets its own adapter
|
||||
|
||||
**Context**: Vibe doesn't read `AGENTS.md` — it uses `~/.vibe/prompts/` and `~/.vibe/agents/` with TOML config.
|
||||
|
||||
50
Dockerfile
Normal file
50
Dockerfile
Normal file
@@ -0,0 +1,50 @@
|
||||
# syntax=docker/dockerfile:1
|
||||
|
||||
# ── Build stage ───────────────────────────────────────────────────────────────
|
||||
FROM golang:1.26-bookworm AS builder
|
||||
|
||||
ARG VERSION=dev
|
||||
WORKDIR /src
|
||||
|
||||
COPY go.mod go.sum ./
|
||||
RUN go mod download
|
||||
|
||||
COPY . .
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
|
||||
go build -trimpath -ldflags="-s -w -X main.version=${VERSION}" \
|
||||
-o /out/supervisor ./cmd/supervisor
|
||||
|
||||
# ── Runtime stage ─────────────────────────────────────────────────────────────
|
||||
# Node.js 22 slim — needed for claude CLI subprocess
|
||||
FROM node:22-slim
|
||||
|
||||
# Install claude CLI (provides the `claude` binary the supervisor shells out to)
|
||||
RUN npm install -g @anthropic-ai/claude-code \
|
||||
&& claude --version \
|
||||
&& echo "claude CLI installed"
|
||||
|
||||
# Copy supervisor binary
|
||||
COPY --from=builder /out/supervisor /usr/local/bin/supervisor
|
||||
|
||||
# Bake in config (models.yaml + skill discipline files)
|
||||
COPY config/ /app/config/
|
||||
|
||||
# Run as non-root
|
||||
RUN groupadd -r supervisor && useradd -r -g supervisor -d /app supervisor
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# brain/ is writable state — mount a PersistentVolume here
|
||||
VOLUME /app/brain
|
||||
|
||||
ENV SUPERVISOR_CONFIG_DIR=/app/config/supervisor
|
||||
ENV SUPERVISOR_MODELS_FILE=/app/config/models.yaml
|
||||
ENV SUPERVISOR_BRAIN_DIR=/app/brain
|
||||
ENV SUPERVISOR_SESSIONS_DIR=/app/brain/sessions
|
||||
ENV SUPERVISOR_PORT=3200
|
||||
|
||||
USER supervisor
|
||||
|
||||
EXPOSE 3200
|
||||
|
||||
ENTRYPOINT ["/usr/local/bin/supervisor"]
|
||||
4
Procfile
4
Procfile
@@ -1,2 +1,2 @@
|
||||
ingestion: cd ingestion && INGEST_BRAIN_DIR=../brain INGEST_PORT=3300 go run ./cmd/server/
|
||||
supervisor: SUPERVISOR_CONFIG_DIR=./config/supervisor SUPERVISOR_MODELS_FILE=./config/models.yaml SUPERVISOR_SESSIONS_DIR=./brain/sessions INGEST_BASE_URL=http://localhost:3300 go run ./cmd/supervisor/
|
||||
ingestion: cd ingestion && INGEST_BRAIN_DIR=../brain INGEST_PORT=3300 INGEST_WATCH_INTERVAL=30 go run ./cmd/server/
|
||||
supervisor: SUPERVISOR_CONFIG_DIR=./config/supervisor SUPERVISOR_MODELS_FILE=./config/models.yaml SUPERVISOR_SESSIONS_DIR=./brain/sessions INGEST_BASE_URL=http://localhost:3300 INGEST_SVC_URL=http://localhost:3300 go run ./cmd/supervisor/
|
||||
|
||||
18
README.md
18
README.md
@@ -10,10 +10,10 @@ into a searchable brain.
|
||||
```
|
||||
Your Claude Code session (in any project)
|
||||
│
|
||||
│ MCP tools (over stdio bridge → HTTP)
|
||||
│ MCP over HTTP (Tailscale)
|
||||
▼
|
||||
supervisor :3200 — skill workers: tdd, retrospective
|
||||
ingestion :3300 — brain HTTP API: query wiki, write notes
|
||||
supervisor :3200 (NodePort 30320 on koala) — skill workers: tdd, retrospective
|
||||
ingestion :3300 — brain HTTP API: query wiki, write notes
|
||||
│
|
||||
▼
|
||||
brain/
|
||||
@@ -55,18 +55,18 @@ Create `.mcp.json` in your project root:
|
||||
{
|
||||
"mcpServers": {
|
||||
"supervisor": {
|
||||
"command": "/Users/mathias/dev/AI/supervisor/bin/supervisor-bridge",
|
||||
"env": {
|
||||
"SUPERVISOR_URL": "http://localhost:3200/mcp"
|
||||
}
|
||||
"type": "http",
|
||||
"url": "http://koala:30320/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Build the bridge binary once: `task bridge:build`
|
||||
The supervisor MCP server is reachable over Tailscale at `koala:30320` (NodePort
|
||||
to the in-cluster service on port 3200). No local binary or stdio shim is
|
||||
required — Claude Code talks to it directly via HTTP.
|
||||
|
||||
Then open Claude Code in your project — run `/mcp` to confirm `supervisor` is listed.
|
||||
Open Claude Code in your project — run `/mcp` to confirm `supervisor` is listed.
|
||||
|
||||
## A typical TDD session
|
||||
|
||||
|
||||
20
Taskfile.yml
20
Taskfile.yml
@@ -57,7 +57,6 @@ tasks:
|
||||
desc: Build all binaries
|
||||
cmds:
|
||||
- task: supervisor:build
|
||||
- task: bridge:build
|
||||
- task: ingestion:build
|
||||
|
||||
supervisor:build:
|
||||
@@ -65,11 +64,6 @@ tasks:
|
||||
cmds:
|
||||
- go build -trimpath -ldflags="-s -w -X main.version={{.VERSION}}" -o bin/supervisor ./cmd/supervisor
|
||||
|
||||
bridge:build:
|
||||
desc: Build stdio↔HTTP bridge for Claude Code MCP integration
|
||||
cmds:
|
||||
- go build -trimpath -ldflags="-s -w" -o bin/supervisor-bridge ./cmd/bridge
|
||||
|
||||
ingestion:build:
|
||||
desc: Build ingestion server binary
|
||||
dir: ingestion
|
||||
@@ -79,8 +73,20 @@ tasks:
|
||||
# ── Quality ────────────────────────────────────────────────────────────────
|
||||
|
||||
check:
|
||||
desc: Run all checks (lint + test + vet) across all modules
|
||||
desc: Run all checks (context freshness + lint + test + vet) across all modules
|
||||
cmds:
|
||||
- task: context:sync
|
||||
- cmd: |
|
||||
drift=$(git status --porcelain -- AGENTS.md CLAUDE.md .cursorrules .aider.conventions.md .context/system-prompt.txt 2>/dev/null)
|
||||
if [ -n "$drift" ]; then
|
||||
echo "ERROR: derived adapters drifted from canonical context." >&2
|
||||
echo "$drift" >&2
|
||||
echo "" >&2
|
||||
echo "Run: git add AGENTS.md CLAUDE.md .cursorrules .aider.conventions.md .context/system-prompt.txt" >&2
|
||||
echo " git commit -m 'chore: re-sync context adapters'" >&2
|
||||
exit 1
|
||||
fi
|
||||
echo "✓ context: canonical and adapters are in sync"
|
||||
- task: lint
|
||||
- task: test
|
||||
- task: vet
|
||||
|
||||
137
brain/schema.md
Normal file
137
brain/schema.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Brain Wiki Schema
|
||||
|
||||
This document defines the three page types in the brain wiki.
|
||||
The LLM must follow this schema exactly when generating wiki pages.
|
||||
|
||||
## Output Format
|
||||
|
||||
Return a JSON array. Each element:
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "exact page title",
|
||||
"type": "source | concept | entity",
|
||||
"subtype": "see below — omit for concept",
|
||||
"domain": "see domains — omit if none fits",
|
||||
"content": "Markdown body only — no frontmatter, no path"
|
||||
}
|
||||
```
|
||||
|
||||
- `subtype` for **source**: `article | pdf | book | video | note | project`
|
||||
- `subtype` for **entity**: `person | company | tool | model | framework | technology`
|
||||
- The pipeline computes slugs and frontmatter — never include them in output.
|
||||
|
||||
## Wikilink Format
|
||||
|
||||
All cross-references use `[[Display Name]]` — just the display name, no slug, no pipe.
|
||||
|
||||
Rules:
|
||||
- Only link to pages in the inventory or pages you are creating in this response
|
||||
- The pipeline converts `[[Display Name]]` to `[[slug|Display Name]]` automatically
|
||||
- Section links must match their section type (Related Concepts → concept pages only, etc.)
|
||||
|
||||
Examples: `[[Domain Driven Design]]`, `[[Ryan Singer]]`, `[[Shape Up]]`
|
||||
|
||||
## Domains
|
||||
|
||||
Use one of: `ai-llm`, `software-engineering`, `product-strategy`, `finance-markets`,
|
||||
`personal`, `consulting`, `climate`, `infrastructure`, `security`
|
||||
|
||||
---
|
||||
|
||||
## Source Pages — wiki/sources/<slug>.md
|
||||
|
||||
One page per ingested source. Books are NEVER split across multiple source pages — update the existing one.
|
||||
|
||||
Body sections (in this order):
|
||||
|
||||
### Summary
|
||||
2–3 sentences. Core argument or finding.
|
||||
|
||||
### Key Claims
|
||||
Bulleted list. Paraphrase — no verbatim quotes or code.
|
||||
|
||||
### Concepts Introduced or Reinforced
|
||||
Wikilinks to concept pages ONLY. One per line.
|
||||
|
||||
### Entities Mentioned
|
||||
Wikilinks to entity pages ONLY. One per line.
|
||||
|
||||
### Open Questions Raised
|
||||
Gaps or follow-up questions from this source.
|
||||
|
||||
For books only, also add:
|
||||
|
||||
### Chapters
|
||||
One bullet per chapter with 1–2 sentence summary.
|
||||
|
||||
### Argument Arc
|
||||
Overall narrative as it becomes clear across chapters.
|
||||
|
||||
### Updates
|
||||
Dated entries appended on re-ingestion. NEVER rewrite — only append.
|
||||
|
||||
---
|
||||
|
||||
## Concept Pages — wiki/concepts/<slug>.md
|
||||
|
||||
One page per idea, framework, methodology, or pattern.
|
||||
|
||||
Body sections (in this order):
|
||||
|
||||
### Definition
|
||||
One-paragraph plain-language explanation.
|
||||
|
||||
### Why It Matters
|
||||
Practical significance. Why should anyone care?
|
||||
|
||||
### Related Concepts
|
||||
Wikilinks to concept pages ONLY.
|
||||
|
||||
### Related Entities
|
||||
Wikilinks to entity pages ONLY.
|
||||
|
||||
### Sources
|
||||
Wikilinks to source pages ONLY.
|
||||
|
||||
### Evolving Notes
|
||||
Updated as new sources arrive. Append, do not rewrite.
|
||||
|
||||
---
|
||||
|
||||
## Entity Pages — wiki/entities/<slug>.md
|
||||
|
||||
One page per person, tool, organisation, technology, or product.
|
||||
|
||||
Body sections (in this order):
|
||||
|
||||
### Description
|
||||
One-line description.
|
||||
|
||||
### Relevance
|
||||
Why this entity matters to this knowledge base.
|
||||
|
||||
### Key Positions, Products, or Claims
|
||||
With dates where known.
|
||||
|
||||
### Related Concepts
|
||||
Wikilinks to concept pages ONLY.
|
||||
|
||||
### Related Entities
|
||||
Wikilinks to entity pages ONLY.
|
||||
|
||||
### Sources
|
||||
Wikilinks to source pages ONLY.
|
||||
|
||||
---
|
||||
|
||||
## Non-Negotiable Rules
|
||||
|
||||
1. Output ONLY a valid JSON array — no markdown fences, no prose before or after
|
||||
2. Each element: `{"title": "...", "type": "...", "subtype": "...", "domain": "...", "content": "..."}`
|
||||
3. Never include slugs, paths, or frontmatter in output — the pipeline handles these
|
||||
4. Wikilinks: `[[Display Name]]` only — no pipe, no slug
|
||||
5. Dates always YYYY-MM-DD (used only in content body where contextually relevant)
|
||||
6. Never reproduce verbatim code — describe the pattern or technique
|
||||
7. Section links must match their section type
|
||||
8. One source page per book — if inventory shows it exists, include it as an UPDATE
|
||||
@@ -1,59 +0,0 @@
|
||||
// bridge is a stdio↔HTTP adapter that lets Claude Code connect to the
|
||||
// supervisor MCP server via the stdio transport.
|
||||
//
|
||||
// Claude Code spawns this binary as a subprocess and communicates over
|
||||
// stdin/stdout. Each newline-delimited JSON-RPC message from stdin is
|
||||
// forwarded to the supervisor HTTP server and the response is written back.
|
||||
//
|
||||
// Usage:
|
||||
//
|
||||
// SUPERVISOR_URL=http://localhost:3200/mcp bridge
|
||||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"bytes"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"os"
|
||||
)
|
||||
|
||||
func main() {
|
||||
url := os.Getenv("SUPERVISOR_URL")
|
||||
if url == "" {
|
||||
url = "http://localhost:3200/mcp"
|
||||
}
|
||||
|
||||
client := &http.Client{}
|
||||
scanner := bufio.NewScanner(os.Stdin)
|
||||
scanner.Buffer(make([]byte, 1024*1024), 1024*1024)
|
||||
|
||||
for scanner.Scan() {
|
||||
line := scanner.Bytes()
|
||||
if len(bytes.TrimSpace(line)) == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(line))
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "bridge: build request: %v\n", err)
|
||||
continue
|
||||
}
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
|
||||
resp, err := client.Do(req)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "bridge: request failed: %v\n", err)
|
||||
continue
|
||||
}
|
||||
_, _ = io.Copy(os.Stdout, resp.Body)
|
||||
_ = resp.Body.Close()
|
||||
_, _ = os.Stdout.Write([]byte("\n"))
|
||||
}
|
||||
|
||||
if err := scanner.Err(); err != nil {
|
||||
fmt.Fprintf(os.Stderr, "bridge: scanner: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
@@ -13,6 +13,10 @@ import (
|
||||
"github.com/mathiasbq/supervisor/internal/skills/brain"
|
||||
"github.com/mathiasbq/supervisor/internal/skills/org"
|
||||
"github.com/mathiasbq/supervisor/internal/skills/retrospective"
|
||||
skilldebug "github.com/mathiasbq/supervisor/internal/skills/debug"
|
||||
"github.com/mathiasbq/supervisor/internal/skills/review"
|
||||
"github.com/mathiasbq/supervisor/internal/skills/spec"
|
||||
"github.com/mathiasbq/supervisor/internal/skills/trainer"
|
||||
"github.com/mathiasbq/supervisor/internal/skills/sessionlog"
|
||||
"github.com/mathiasbq/supervisor/internal/skills/tdd"
|
||||
"github.com/mathiasbq/supervisor/internal/tier"
|
||||
@@ -33,12 +37,17 @@ func main() {
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
systemPrompt, err := os.ReadFile(cfg.ConfigDir + "/CLAUDE.md")
|
||||
protocolsPrompt, err := os.ReadFile(cfg.ConfigDir + "/protocols.md")
|
||||
if err != nil {
|
||||
logger.Error("read supervisor CLAUDE.md", "path", cfg.ConfigDir+"/CLAUDE.md", "err", err)
|
||||
logger.Error("read protocols.md", "path", cfg.ConfigDir+"/protocols.md", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// prependProtocols prepends the shared protocols to a skill discipline file.
|
||||
prependProtocols := func(skillPrompt []byte) string {
|
||||
return string(protocolsPrompt) + "\n---\n\n" + string(skillPrompt)
|
||||
}
|
||||
|
||||
tddPrompt, err := os.ReadFile(cfg.ConfigDir + "/tdd.md")
|
||||
if err != nil {
|
||||
logger.Error("read tdd.md", "path", cfg.ConfigDir+"/tdd.md", "err", err)
|
||||
@@ -51,11 +60,36 @@ func main() {
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
executor := iexec.New(iexec.Config{
|
||||
SystemPrompt: string(systemPrompt),
|
||||
LiteLLMBaseURL: cfg.LiteLLMBaseURL,
|
||||
LiteLLMAPIKey: cfg.LiteLLMAPIKey,
|
||||
})
|
||||
reviewPrompt, err := os.ReadFile(cfg.ConfigDir + "/review.md")
|
||||
if err != nil {
|
||||
logger.Error("read review.md", "path", cfg.ConfigDir+"/review.md", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
debugPrompt, err := os.ReadFile(cfg.ConfigDir + "/debug.md")
|
||||
if err != nil {
|
||||
logger.Error("read debug.md", "path", cfg.ConfigDir+"/debug.md", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
specPrompt, err := os.ReadFile(cfg.ConfigDir + "/spec.md")
|
||||
if err != nil {
|
||||
logger.Error("read spec.md", "path", cfg.ConfigDir+"/spec.md", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
trainerReaderPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-reader.md")
|
||||
if err != nil {
|
||||
logger.Error("read trainer-reader.md", "path", cfg.ConfigDir+"/trainer-reader.md", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
trainerWriterPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-writer.md")
|
||||
if err != nil {
|
||||
logger.Error("read trainer-writer.md", "path", cfg.ConfigDir+"/trainer-writer.md", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
litellm := iexec.NewLiteLLM(cfg.LiteLLMBaseURL, cfg.LiteLLMAPIKey, 0)
|
||||
|
||||
tierFn := func(ctx context.Context) tier.Info {
|
||||
return tier.Detect(ctx, "https://api.anthropic.com", cfg.LiteLLMBaseURL)
|
||||
@@ -63,13 +97,16 @@ func main() {
|
||||
|
||||
reg := registry.New()
|
||||
reg.Register(tdd.New(tdd.Config{
|
||||
SystemPrompt: string(systemPrompt),
|
||||
SkillPrompt: string(tddPrompt),
|
||||
DefaultModel: models.Resolve("tdd", ""),
|
||||
ExecutorFn: executor.Run,
|
||||
SkillPrompt: prependProtocols(tddPrompt),
|
||||
DefaultModel: models.ModelFor("tdd", ""),
|
||||
CompleteFunc: litellm.Complete,
|
||||
SessionsDir: cfg.SessionsDir,
|
||||
IngestBaseURL: cfg.IngestBaseURL,
|
||||
}))
|
||||
reg.Register(brain.New(brain.Config{
|
||||
IngestBaseURL: cfg.IngestBaseURL,
|
||||
IngestBaseURL: cfg.IngestBaseURL,
|
||||
IngestSvcURL: cfg.IngestSvcURL,
|
||||
KBRetrievalURL: cfg.KBRetrievalURL,
|
||||
}))
|
||||
reg.Register(org.New(org.Config{
|
||||
TierFn: tierFn,
|
||||
@@ -78,10 +115,39 @@ func main() {
|
||||
SessionsDir: cfg.SessionsDir,
|
||||
}))
|
||||
reg.Register(retrospective.New(retrospective.Config{
|
||||
SkillPrompt: string(retroPrompt),
|
||||
DefaultModel: models.Resolve("retrospective", ""),
|
||||
SkillPrompt: prependProtocols(retroPrompt),
|
||||
DefaultModel: models.ModelFor("retrospective", ""),
|
||||
SessionsDir: cfg.SessionsDir,
|
||||
ExecutorFn: executor.Run,
|
||||
CompleteFunc: litellm.Complete,
|
||||
}))
|
||||
reg.Register(review.New(review.Config{
|
||||
SkillPrompt: prependProtocols(reviewPrompt),
|
||||
DefaultModel: models.ModelFor("review", ""),
|
||||
CompleteFunc: litellm.Complete,
|
||||
SessionsDir: cfg.SessionsDir,
|
||||
IngestBaseURL: cfg.IngestBaseURL,
|
||||
}))
|
||||
reg.Register(skilldebug.New(skilldebug.Config{
|
||||
SkillPrompt: prependProtocols(debugPrompt),
|
||||
DefaultModel: models.ModelFor("debug", ""),
|
||||
CompleteFunc: litellm.Complete,
|
||||
SessionsDir: cfg.SessionsDir,
|
||||
IngestBaseURL: cfg.IngestBaseURL,
|
||||
}))
|
||||
reg.Register(spec.New(spec.Config{
|
||||
SkillPrompt: prependProtocols(specPrompt),
|
||||
DefaultModel: models.ModelFor("spec", ""),
|
||||
CompleteFunc: litellm.Complete,
|
||||
SessionsDir: cfg.SessionsDir,
|
||||
IngestBaseURL: cfg.IngestBaseURL,
|
||||
}))
|
||||
reg.Register(trainer.New(trainer.Config{
|
||||
ReaderPrompt: prependProtocols(trainerReaderPrompt),
|
||||
WriterPrompt: prependProtocols(trainerWriterPrompt),
|
||||
DefaultModel: models.ModelFor("trainer", ""),
|
||||
CompleteFunc: litellm.Complete,
|
||||
SessionsDir: cfg.SessionsDir,
|
||||
BrainDir: cfg.BrainDir,
|
||||
}))
|
||||
|
||||
srv := mcp.NewServer(reg)
|
||||
@@ -89,7 +155,7 @@ func main() {
|
||||
mux.Handle("/mcp", srv)
|
||||
|
||||
addr := ":" + cfg.Port
|
||||
logger.Info("supervisor starting", "addr", addr)
|
||||
logger.Info("supervisor starting", "addr", addr, "version", "v0.5.0")
|
||||
if err := http.ListenAndServe(addr, mux); err != nil {
|
||||
logger.Error("server stopped", "err", err)
|
||||
os.Exit(1)
|
||||
|
||||
@@ -1,11 +1,26 @@
|
||||
# Model routing table — three-layer priority:
|
||||
# 1. model param in MCP tool call (caller override)
|
||||
# 2. per-skill entry here
|
||||
# 3. default (fallback)
|
||||
default: ollama/qwen3-coder-30b-tuned
|
||||
# Model selection — first entry per skill is used.
|
||||
# Override per-call by passing model in the MCP tool args.
|
||||
# Model names come from LiteLLM /v1/models (host/name format).
|
||||
|
||||
default_chain:
|
||||
- iguana/qwen3-coder-next
|
||||
|
||||
skills:
|
||||
tdd: ollama/qwen3-coder-30b-tuned
|
||||
review: ollama/devstral-tuned
|
||||
debug: ollama/deepseek-r1-tuned
|
||||
retrospective: ollama/qwen3-coder-30b-tuned
|
||||
tdd:
|
||||
chain:
|
||||
- koala/qwen3-coder-30b
|
||||
review:
|
||||
chain:
|
||||
- iguana/devstral
|
||||
debug:
|
||||
chain:
|
||||
- iguana/deepseek-r1-14b
|
||||
spec:
|
||||
chain:
|
||||
- koala/phi4-14b
|
||||
retrospective:
|
||||
chain:
|
||||
- iguana/qwen3-coder-next
|
||||
trainer:
|
||||
chain:
|
||||
- iguana/qwen3-coder-next
|
||||
|
||||
31
config/supervisor/debug.md
Normal file
31
config/supervisor/debug.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Debug Discipline
|
||||
|
||||
You are a systematic debugger. Form hypotheses before suggesting fixes.
|
||||
|
||||
## Iron laws
|
||||
1. Never suggest "try X and see what happens" — every hypothesis must have a specific expected outcome if correct
|
||||
2. Generate exactly 3-5 hypotheses, ordered by likelihood (most likely first)
|
||||
3. Never fix the bug — diagnose only; the caller decides what to do with the hypotheses
|
||||
|
||||
## Output contract
|
||||
Return JSON result with:
|
||||
- `status`: "pass" (hypotheses generated) or "error" (error too ambiguous to analyse)
|
||||
- `phase`: "debug"
|
||||
- `skill`: "debug"
|
||||
- `file_path`: the most relevant file to the error (read it)
|
||||
- `runner_output`: your hypotheses, formatted as:
|
||||
```
|
||||
HYPOTHESIS 1 (likelihood: high): <mechanism>
|
||||
VERIFY: <exact command or file to check> → expected if correct: <specific output>
|
||||
|
||||
HYPOTHESIS 2 (likelihood: medium): <mechanism>
|
||||
VERIFY: <exact command or file to check> → expected if correct: <specific output>
|
||||
```
|
||||
- `verified`: false — verification is the caller's job
|
||||
- `message`: "N hypotheses for: <one-line error summary>"
|
||||
|
||||
## Rules
|
||||
1. Read the error and any context files provided before forming hypotheses
|
||||
2. Identify the failure mode first — what actually went wrong, not just what the error says
|
||||
3. For each hypothesis: name the mechanism, explain why it would produce this exact error, give a concrete verification command with expected output
|
||||
4. If the error is clearly a typo or trivial mistake, still form 3 hypotheses — surface the most likely cause as #1
|
||||
@@ -1,27 +1,31 @@
|
||||
# The Hyperguild Way
|
||||
# Hyperguild Skill Protocols
|
||||
|
||||
These protocols are injected into every worker invocation. They define how you behave as a member of the hyperguild.
|
||||
**IMPORTANT: DO NOT OUTPUT JSON. DO NOT USE JSON CODE BLOCKS.**
|
||||
Your response must be plain markdown text. No `{"status":...}`, no ` ```json `, nothing.
|
||||
If you output JSON you will be ignored. Respond in prose and markdown only.
|
||||
|
||||
## Output contract
|
||||
---
|
||||
|
||||
Every response is raw JSON matching the response schema. No preamble, no prose, no markdown. Malformed output is treated as a failed invocation.
|
||||
## Role
|
||||
|
||||
## Quality gate
|
||||
You are a consultant. You analyse, suggest, and explain.
|
||||
Claude Code has the tools to read files, run commands, and write code.
|
||||
You provide the thinking; Claude Code provides the action.
|
||||
|
||||
`verified: true` only when a subprocess exit code confirms the outcome. Never self-assess. "I think the tests pass" is not verified.
|
||||
## Output
|
||||
|
||||
## Escalation
|
||||
Write in clear markdown. Lead with the key finding. Use headers and bullet lists
|
||||
where they help. Be concise — Claude Code reads your full response.
|
||||
|
||||
If stuck after 3 attempts, return `status: error` with a clear `message` explaining why. Do not retry silently. Do not fabricate a passing result.
|
||||
Do not make up file contents, test results, or command output you have not seen.
|
||||
If you lack context to give a useful answer, say so and state what you need.
|
||||
|
||||
## Working offline
|
||||
## Context blocks
|
||||
|
||||
If brain context is absent from your prompt, proceed using your discipline file only. Note the gap in your `message` field: "no brain context available".
|
||||
You may receive one or both of these blocks before your task:
|
||||
|
||||
## Handoff format
|
||||
**`## Relevant knowledge`** — patterns and decisions from past sessions. Let them
|
||||
inform your approach. Do not contradict them without reason.
|
||||
|
||||
Structure your output so the next worker in a chain can consume it without transformation. Use the standard result schema. Do not add extra fields.
|
||||
|
||||
## Session logging
|
||||
|
||||
The Go skill handler records your invocation in the session log automatically. You do not need to do this yourself.
|
||||
**`## Session history`** — what has already happened in this session. Build on it,
|
||||
do not repeat it.
|
||||
|
||||
@@ -1,40 +1,33 @@
|
||||
# Retrospective Worker Discipline
|
||||
# Retrospective Discipline
|
||||
|
||||
You are the retrospective worker. Your job is to review a completed coding session and identify knowledge worth preserving in the hyperguild brain.
|
||||
You review a completed coding session and identify knowledge worth preserving.
|
||||
|
||||
## What you receive
|
||||
|
||||
- A session log in JSON format listing every skill invocation: what was attempted, what failed, what passed, how long it took.
|
||||
|
||||
## What you produce
|
||||
|
||||
For each significant learning, call brain_write with a structured markdown note. Then return a JSON result summarising what you wrote.
|
||||
A session log in JSON format listing every skill invocation: what was attempted,
|
||||
what failed, what passed, how long it took.
|
||||
|
||||
## What is worth preserving
|
||||
|
||||
- Patterns that worked and should be repeated
|
||||
- Failures that revealed something non-obvious about the codebase or the discipline
|
||||
- Failures that revealed something non-obvious about the codebase or the approach
|
||||
- Decisions made during the session (architectural, structural, tooling)
|
||||
- Anything that contradicts or extends what the brain already knows
|
||||
- Anything that contradicts or extends established patterns
|
||||
|
||||
## What is NOT worth preserving
|
||||
|
||||
- Routine TDD cycles with no surprises
|
||||
- Routine cycles with no surprises
|
||||
- Single-attempt passes with no interesting context
|
||||
- Mechanical operations (file moves, renames, formatting)
|
||||
|
||||
## Output format
|
||||
|
||||
Return JSON matching the standard result schema:
|
||||
Respond in markdown. For each learning worth preserving:
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "pass",
|
||||
"phase": "retrospective",
|
||||
"skill": "retrospective",
|
||||
"verified": true,
|
||||
"message": "wrote N entries to brain/raw/"
|
||||
}
|
||||
```
|
||||
**Learning:** One sentence describing what was learned.
|
||||
**Context:** Why this session surfaced it — what made it non-obvious.
|
||||
**Recommendation:** What should be done differently or repeated going forward.
|
||||
|
||||
`verified` is true when you successfully called brain_write at least once and received a confirmation. If the session had nothing worth writing, return `verified: true` with `message: "no novel learnings in this session"`.
|
||||
End with a summary: "N learnings worth writing to brain" or "No novel learnings in this session."
|
||||
|
||||
The caller will decide which learnings to write to the brain using brain_write.
|
||||
|
||||
25
config/supervisor/review.md
Normal file
25
config/supervisor/review.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Code Review Discipline
|
||||
|
||||
You are a disciplined code reviewer. Read files carefully before commenting.
|
||||
|
||||
## Iron laws — any violation is a blocking issue
|
||||
1. No security vulnerabilities: command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries
|
||||
2. No silently swallowed errors — `err != nil` without wrapping or handling is always wrong
|
||||
3. No missing validation at system boundaries (user input, external APIs, file reads)
|
||||
|
||||
## Output format
|
||||
|
||||
Respond in markdown. Group findings by severity:
|
||||
|
||||
**CRITICAL:** Issues that violate an iron law or will cause data loss / security breach.
|
||||
**WARNING:** Issues that will likely cause bugs or maintenance problems.
|
||||
**SUGGESTION:** Style, clarity, or optional improvements.
|
||||
|
||||
For each finding include the file and line number. If nothing is wrong, explain specifically which iron law checks you ran and why they passed — never rubber-stamp.
|
||||
|
||||
## Rules
|
||||
1. Read every file listed before writing feedback
|
||||
2. Check iron laws first — if any are violated, flag them before anything else
|
||||
3. Then check: correctness, test coverage for new code, Go style conventions
|
||||
4. Line references required for every finding
|
||||
5. End with a one-line summary: "N critical, M warnings, K suggestions" or "Clean — no issues found"
|
||||
37
config/supervisor/spec.md
Normal file
37
config/supervisor/spec.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Spec Writing Discipline
|
||||
|
||||
You write structured implementation specs. Nothing is left ambiguous.
|
||||
|
||||
## Iron laws
|
||||
1. Success criteria must be measurable — "the system is fast" is banned; "p99 < 200ms under 100 RPS" is valid
|
||||
2. Always include an explicit "Out of scope" section — if you don't draw the boundary, the developer will guess wrong
|
||||
3. Every technical decision in the approach must have a rationale
|
||||
|
||||
## Output format
|
||||
|
||||
Write the spec as markdown using this structure:
|
||||
|
||||
```
|
||||
# [Feature] Spec
|
||||
|
||||
## Problem statement
|
||||
What problem does this solve? For whom? Why now?
|
||||
|
||||
## Success criteria
|
||||
- [ ] Criterion 1 — measurable and verifiable
|
||||
- [ ] Criterion 2 — measurable and verifiable
|
||||
|
||||
## Constraints
|
||||
Non-negotiable requirements the solution must satisfy.
|
||||
|
||||
## Out of scope
|
||||
What we are explicitly NOT doing in this iteration.
|
||||
|
||||
## Technical approach
|
||||
Architecture decisions, key components, rationale for each choice.
|
||||
|
||||
## Risks
|
||||
What could go wrong, and how we'd mitigate it.
|
||||
```
|
||||
|
||||
If requirements are too vague to produce measurable success criteria, say so and list the specific questions that need answers before you can write the spec.
|
||||
@@ -1,26 +1,35 @@
|
||||
# TDD Skill
|
||||
# TDD Discipline
|
||||
|
||||
## Iron Law
|
||||
|
||||
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.
|
||||
|
||||
## Red phase
|
||||
## Red phase — write a failing test
|
||||
|
||||
- Write exactly one test. One behavior. Name must describe the behavior clearly.
|
||||
- Run the test suite. Confirm the test FAILS.
|
||||
- If the test passes immediately: it tests existing behavior or is vacuous.
|
||||
Return status "fail" with message explaining why the test is wrong.
|
||||
- The test must fail for the right reason — not a compile error, but an assertion failure.
|
||||
- Do not write any implementation code in this phase.
|
||||
|
||||
## Green phase
|
||||
Respond with:
|
||||
- The test code to write (file path + content)
|
||||
- The exact failure you expect to see when running it
|
||||
- Why that failure confirms the test is meaningful
|
||||
|
||||
## Green phase — make the test pass
|
||||
|
||||
- Write the minimal code to make the failing test pass. Nothing more.
|
||||
- YAGNI: no extra parameters, no future-proofing, no clever abstractions.
|
||||
- Run the test suite. Confirm it PASSES.
|
||||
- If tests fail: fix the implementation, not the test. Max 3 attempts.
|
||||
|
||||
## Refactor phase
|
||||
Respond with:
|
||||
- The implementation code to write (file path + content)
|
||||
- Confirmation of which test it targets and how it satisfies the assertion
|
||||
|
||||
## Refactor phase — improve without changing behavior
|
||||
|
||||
- Improve structure, naming, or clarity only. No new behavior.
|
||||
- Tests must remain green after every change.
|
||||
- If tests break during refactor: revert that change, return status "fail".
|
||||
|
||||
Respond with:
|
||||
- Specific refactoring suggestions with rationale
|
||||
- Which files to touch and what to change
|
||||
- Any risks that could break existing tests
|
||||
|
||||
26
config/supervisor/trainer-reader.md
Normal file
26
config/supervisor/trainer-reader.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Trainer Reader Discipline
|
||||
|
||||
You scan session logs and identify candidate learning moments worth preserving in the brain.
|
||||
|
||||
## What to look for
|
||||
|
||||
- **Patterns that worked**: the approach was clean and correct — worth reinforcing
|
||||
- **Corrections**: something was first done wrong, then corrected — both sides are valuable
|
||||
|
||||
## Scoring (1–5)
|
||||
|
||||
- 5: novel pattern, clearly correct, generalises across projects
|
||||
- 4: good pattern, correct, somewhat project-specific but still useful
|
||||
- 3: correct but obvious — include only if especially clean
|
||||
- 2 or below: skip
|
||||
|
||||
## Output format
|
||||
|
||||
Respond in markdown. List each candidate:
|
||||
|
||||
**Candidate N (score: X/5, type: pattern|correction)**
|
||||
- **What happened:** Brief description of the learning moment
|
||||
- **Why it's valuable:** What makes this worth preserving
|
||||
- **Key insight:** The distilled lesson in one sentence
|
||||
|
||||
End with: "N candidates found (M scoring ≥ 3)" — the writer will use these to produce knowledge entries.
|
||||
31
config/supervisor/trainer-writer.md
Normal file
31
config/supervisor/trainer-writer.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Trainer Writer Discipline
|
||||
|
||||
You receive candidate learning moments from the reader and write knowledge entries for the brain.
|
||||
|
||||
## Quality gate (apply before writing each entry)
|
||||
|
||||
- The lesson must be phrased so it could apply to any project, not just this one
|
||||
- No project-specific paths, variable names, or identifiers
|
||||
- The insight must be stated clearly enough that someone reading it cold would understand it
|
||||
|
||||
## Output format
|
||||
|
||||
For each candidate that passes the quality gate, write a knowledge entry in this format:
|
||||
|
||||
```
|
||||
# [Topic]
|
||||
|
||||
## Lesson
|
||||
[The key insight in 1-3 sentences]
|
||||
|
||||
## When it applies
|
||||
[Conditions under which this pattern is relevant]
|
||||
|
||||
## Example
|
||||
[A brief, generic example that illustrates the lesson]
|
||||
```
|
||||
|
||||
After presenting all entries, end with a summary:
|
||||
"N entries ready for brain_write" or "0 entries passed quality gate — [reason]"
|
||||
|
||||
The caller will write passing entries to the brain using brain_write.
|
||||
241
docs/multi-model-routing.md
Normal file
241
docs/multi-model-routing.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# Multi-Model Routing for supervisor
|
||||
|
||||
Reference document for implementing multi-model access within the supervisor project.
|
||||
Researched April 2026. Constraints: Claude Max subscription (ToS must be respected).
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Route tasks to specialized, cheaper, or local models during agent and skill flows — without
|
||||
violating Anthropic's terms or introducing unnecessary infrastructure risk.
|
||||
|
||||
---
|
||||
|
||||
## Hard Constraints
|
||||
|
||||
- Claude Max subscription is in use. Anthropic's April 2026 terms **prohibit using the
|
||||
subscription with third-party harnesses that spoof the Anthropic API surface**.
|
||||
- `ANTHROPIC_BASE_URL` → LiteLLM workaround is explicitly out of scope.
|
||||
- Claude must remain the reasoning engine. Other models are tools, not replacements.
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Available
|
||||
|
||||
| Machine | Role | Relevant services |
|
||||
|---------|------|-------------------|
|
||||
| koala | GPU inference | llama-swap, Ollama, Qdrant, LiteLLM proxy |
|
||||
| iguana | Services, builds | k3s, general services |
|
||||
| flamingo | Daily driver | Claude Code runs here |
|
||||
|
||||
LiteLLM proxy on koala exposes 100+ models (local + cloud) through a unified API.
|
||||
All machines connected via Tailscale.
|
||||
|
||||
---
|
||||
|
||||
## Approved Patterns
|
||||
|
||||
### Pattern 1 — Native Claude model tiering (zero build)
|
||||
|
||||
Claude Code subagents support per-agent model selection via frontmatter.
|
||||
Use this for cost routing within the Claude model family.
|
||||
|
||||
```yaml
|
||||
# ~/.claude/agents/explorer.md
|
||||
---
|
||||
name: explorer
|
||||
description: File reading, code search, codebase mapping — use for all exploration tasks
|
||||
model: haiku
|
||||
---
|
||||
```
|
||||
|
||||
- `haiku` for exploration, summarization, classification
|
||||
- `sonnet` (default) for main reasoning and implementation
|
||||
- `opus` for deep analysis, architecture decisions
|
||||
|
||||
**When to use**: Always. Add `model: haiku` to any subagent that does read-heavy or
|
||||
classification work. Cheapest and fastest path to cost control.
|
||||
|
||||
---
|
||||
|
||||
### Pattern 2 — MCP tools wrapping local models (primary build target)
|
||||
|
||||
Expose local models on koala as named MCP tools. Claude remains the orchestrator and
|
||||
reasoning engine — it calls local models as tools the same way it calls any other tool.
|
||||
|
||||
This is the intended MCP use case and carries zero ToS risk.
|
||||
|
||||
**Semantic contract**: Claude decides *when* to delegate based on the tool description.
|
||||
Write descriptions that tell Claude what the model is good for.
|
||||
|
||||
#### MCP server implementation
|
||||
|
||||
Small Python server, run on koala or flamingo, registered in Claude Code settings.
|
||||
|
||||
```python
|
||||
# supervisor/scripts/mcp_local_models.py
|
||||
import mcp
|
||||
import requests
|
||||
|
||||
server = mcp.Server("local-models")
|
||||
|
||||
LITELLM_BASE = "http://koala:4000"
|
||||
OLLAMA_BASE = "http://koala:11434"
|
||||
|
||||
def _litellm_chat(model: str, prompt: str) -> str:
|
||||
r = requests.post(f"{LITELLM_BASE}/v1/chat/completions", json={
|
||||
"model": model,
|
||||
"messages": [{"role": "user", "content": prompt}],
|
||||
"max_tokens": 2048,
|
||||
})
|
||||
r.raise_for_status()
|
||||
return r.json()["choices"][0]["message"]["content"]
|
||||
|
||||
|
||||
@server.tool()
|
||||
def ask_local_llama(prompt: str) -> str:
|
||||
"""Ask the local Llama model on koala.
|
||||
Use for: bulk summarization, first-pass analysis, classification, simple Q&A,
|
||||
anything that does not require deep reasoning or up-to-date knowledge.
|
||||
Faster and cheaper than cloud models for routine subtasks."""
|
||||
return _litellm_chat("llama3-local", prompt)
|
||||
|
||||
|
||||
@server.tool()
|
||||
def ask_coding_model(code: str, question: str) -> str:
|
||||
"""Ask a code-specialized local model.
|
||||
Use for: syntax checking, boilerplate generation, code formatting questions,
|
||||
simple refactors where pattern-matching is sufficient."""
|
||||
return _litellm_chat("codellama-local", f"Code:\n{code}\n\nQuestion: {question}")
|
||||
|
||||
|
||||
@server.tool()
|
||||
def list_available_local_models() -> list[str]:
|
||||
"""List all models currently available on the local LiteLLM proxy."""
|
||||
r = requests.get(f"{LITELLM_BASE}/v1/models")
|
||||
r.raise_for_status()
|
||||
return [m["id"] for m in r.json()["data"]]
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run_stdio_server(server)
|
||||
```
|
||||
|
||||
#### Register in Claude Code
|
||||
|
||||
Add to `~/.claude/settings.json` (or project-level `.claude/settings.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"local-models": {
|
||||
"command": "python3",
|
||||
"args": ["/path/to/supervisor/scripts/mcp_local_models.py"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### LiteLLM config additions needed on koala
|
||||
|
||||
```yaml
|
||||
# litellm config.yaml — add model entries for local models
|
||||
model_list:
|
||||
- model_name: llama3-local
|
||||
litellm_params:
|
||||
model: ollama/llama3.2
|
||||
api_base: http://localhost:11434
|
||||
|
||||
- model_name: codellama-local
|
||||
litellm_params:
|
||||
model: ollama/codellama
|
||||
api_base: http://localhost:11434
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Pattern 3 — External orchestration scripts (for pipeline workflows)
|
||||
|
||||
For multi-model pipelines that don't need to live inside a Claude Code session.
|
||||
These scripts use their own API key (separate from Max subscription — API billing),
|
||||
so they can call Claude API + LiteLLM freely.
|
||||
|
||||
Claude Code invokes them via the Bash tool.
|
||||
|
||||
```
|
||||
Claude Code → [Bash tool] → ./scripts/orchestrate.py → {Claude API, LiteLLM, local models}
|
||||
```
|
||||
|
||||
```python
|
||||
# supervisor/scripts/orchestrate.py
|
||||
import anthropic
|
||||
import requests
|
||||
|
||||
claude = anthropic.Anthropic() # reads ANTHROPIC_API_KEY — separate from Max subscription
|
||||
|
||||
def analyze_document(path: str) -> str:
|
||||
with open(path) as f:
|
||||
content = f.read()
|
||||
|
||||
# Step 1: local Llama extracts structure (fast, cheap)
|
||||
structure = requests.post("http://koala:4000/v1/chat/completions", json={
|
||||
"model": "llama3-local",
|
||||
"messages": [{"role": "user", "content": f"Extract key sections from:\n{content}"}],
|
||||
}).json()["choices"][0]["message"]["content"]
|
||||
|
||||
# Step 2: Claude synthesizes and reasons over it
|
||||
synthesis = claude.messages.create(
|
||||
model="claude-sonnet-4-6",
|
||||
max_tokens=2048,
|
||||
messages=[{"role": "user", "content": f"Synthesize these findings:\n{structure}"}]
|
||||
)
|
||||
return synthesis.content[0].text
|
||||
```
|
||||
|
||||
**When to use**: Batch processing, automated pipelines, workflows triggered by cron or
|
||||
external events. Not for interactive Claude Code sessions.
|
||||
|
||||
---
|
||||
|
||||
## What to Skip
|
||||
|
||||
| Approach | Why skip |
|
||||
|----------|----------|
|
||||
| `ANTHROPIC_BASE_URL` → LiteLLM | ToS violation with Max subscription (April 2026 terms) |
|
||||
| Third-party harnesses (OpenClaw etc.) | Explicitly banned for subscription users |
|
||||
| A2A in Claude Code | Not implemented by Anthropic yet — revisit late 2026 |
|
||||
| OpenAI agent handoffs | Loses execution context, not worth the complexity |
|
||||
|
||||
---
|
||||
|
||||
## Protocol Landscape (for awareness, not immediate action)
|
||||
|
||||
- **MCP** — production, 97M monthly downloads, your primary tool-access protocol. LiteLLM
|
||||
natively supports it as both MCP gateway and MCP client as of v1.60+.
|
||||
- **A2A v1.0** — Google/Linux Foundation, 150+ orgs in production, but Anthropic has not
|
||||
shipped it in Claude Code. The intent is agent-to-agent peer delegation (vs MCP's
|
||||
agent-to-tool). Worth watching for H2 2026.
|
||||
- **AGNTCY** — Cisco/Linux Foundation, discovery and identity layer beneath MCP+A2A.
|
||||
Potentially relevant for multi-machine routing across koala/iguana/flamingo once mature.
|
||||
|
||||
---
|
||||
|
||||
## Build Priority
|
||||
|
||||
| Step | Effort | Value | When |
|
||||
|------|--------|-------|------|
|
||||
| Add `model: haiku` to explorer subagents | 10 min | Immediate cost saving | Now |
|
||||
| Write MCP server for local models | 2–3h | Local model access in sessions | Soon |
|
||||
| Register MCP server in Claude Code settings | 15 min | Activates pattern 2 | With above |
|
||||
| Write orchestration script template | 1–2h | Pipeline workflows | When needed |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- LiteLLM MCP docs: https://docs.litellm.ai/docs/mcp
|
||||
- Community MCP wrapper for LiteLLM: https://github.com/itsDarianNgo/mcp-server-litellm
|
||||
- Ollama MCP server: https://github.com/rawveg/ollama-mcp
|
||||
- A2A protocol status: https://www.linuxfoundation.org/press/a2a-protocol-surpasses-150-organizations-lands-in-major-cloud-platforms-and-sees-enterprise-production-use-in-first-year
|
||||
- AGNTCY: https://github.com/agntcy
|
||||
2138
docs/superpowers/plans/2026-04-17-hyperguild-phase1.md
Normal file
2138
docs/superpowers/plans/2026-04-17-hyperguild-phase1.md
Normal file
File diff suppressed because it is too large
Load Diff
1871
docs/superpowers/plans/2026-04-19-hyperguild-phase2.md
Normal file
1871
docs/superpowers/plans/2026-04-19-hyperguild-phase2.md
Normal file
File diff suppressed because it is too large
Load Diff
923
docs/superpowers/plans/2026-04-20-cd-pipeline.md
Normal file
923
docs/superpowers/plans/2026-04-20-cd-pipeline.md
Normal file
@@ -0,0 +1,923 @@
|
||||
# CD Pipeline Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Build a GitOps CD pipeline that automatically builds a container image on `main` push and deploys it to k3s on koala via Flux.
|
||||
|
||||
**Architecture:** BuildKit runs as a systemd service on koala (same host as the Gitea runner); CD pushes images to the Gitea registry and commits image tag updates to the infra repo; Flux reconciles within 60s. App secrets (including ANTHROPIC_API_KEY) are SOPS-encrypted in the infra repo and decrypted by Flux at apply time.
|
||||
|
||||
**Tech Stack:** Go 1.26, Node.js 22 (for claude CLI), BuildKit (buildctl), Gitea Actions, Flux (kustomize-controller), SOPS + age, k3s/containerd
|
||||
|
||||
---
|
||||
|
||||
## Environment context
|
||||
|
||||
This plan spans three environments. Each task header notes which environment it runs in:
|
||||
|
||||
- **[this-repo]** — `/Users/mathias/Documents/local-dev/AI/supervisor` on flamingo
|
||||
- **[koala-ssh]** — `ssh koala` (run commands via `ssh koala "..."`)
|
||||
- **[infra-repo]** — `gitea.d-ma.be/mathias/infra` (clone to a temp dir, work there, push)
|
||||
- **[gitea-ui]** — Gitea web UI at `https://gitea.d-ma.be`
|
||||
- **[kubectl]** — kubectl from flamingo (home LAN)
|
||||
|
||||
---
|
||||
|
||||
## File map
|
||||
|
||||
**This repo (supervisor):**
|
||||
- Create: `Dockerfile`
|
||||
- Create: `.gitea/workflows/cd.yml`
|
||||
|
||||
**koala host:**
|
||||
- Create: `/etc/systemd/system/buildkitd.service` (or user-level equivalent)
|
||||
- Create: `/root/.config/buildkit/buildkitd.toml` (registry auth config)
|
||||
|
||||
**Infra repo (`gitea.d-ma.be/mathias/infra`):**
|
||||
- Create: `apps/supervisor/namespace.yaml`
|
||||
- Create: `apps/supervisor/deployment.yaml`
|
||||
- Create: `apps/supervisor/service.yaml`
|
||||
- Create: `apps/supervisor/secrets.enc.yaml` (SOPS-encrypted)
|
||||
- Create: `apps/supervisor/kustomization.yaml`
|
||||
- Create: `apps/imagepullsecret/secret.enc.yaml` (SOPS-encrypted)
|
||||
- Create: `apps/imagepullsecret/kustomization.yaml`
|
||||
- Modify: `clusters/koala/kustomization.yaml` (add supervisor + imagepullsecret)
|
||||
- Modify: `flux-system/kustomization.yaml` or relevant Flux Kustomization CRD (add SOPS decryption)
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Dockerfile [this-repo]
|
||||
|
||||
The supervisor binary depends on the `claude` CLI as a subprocess. The image uses a multi-stage build: Go builder stage compiles the binary; the runtime stage is Node.js (for `npm install -g @anthropic-ai/claude-code`). Config files are baked in. The `brain/` directory is a volume mount.
|
||||
|
||||
**Files:**
|
||||
- Create: `Dockerfile`
|
||||
|
||||
- [ ] **Step 1: Verify no Dockerfile exists**
|
||||
|
||||
```bash
|
||||
ls Dockerfile 2>/dev/null || echo "confirmed: no Dockerfile"
|
||||
```
|
||||
|
||||
Expected: `confirmed: no Dockerfile`
|
||||
|
||||
- [ ] **Step 2: Create the Dockerfile**
|
||||
|
||||
```dockerfile
|
||||
# syntax=docker/dockerfile:1
|
||||
|
||||
# ── Build stage ───────────────────────────────────────────────────────────────
|
||||
FROM golang:1.26-bookworm AS builder
|
||||
|
||||
ARG VERSION=dev
|
||||
WORKDIR /src
|
||||
|
||||
COPY go.mod go.sum ./
|
||||
RUN go mod download
|
||||
|
||||
COPY . .
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
|
||||
go build -trimpath -ldflags="-s -w -X main.version=${VERSION}" \
|
||||
-o /out/supervisor ./cmd/supervisor
|
||||
|
||||
# ── Runtime stage ─────────────────────────────────────────────────────────────
|
||||
# Node.js 22 slim — needed for claude CLI subprocess
|
||||
FROM node:22-slim
|
||||
|
||||
# Install claude CLI (provides the `claude` binary the supervisor shells out to)
|
||||
RUN npm install -g @anthropic-ai/claude-code \
|
||||
&& claude --version \
|
||||
&& echo "claude CLI installed"
|
||||
|
||||
# Copy supervisor binary
|
||||
COPY --from=builder /out/supervisor /usr/local/bin/supervisor
|
||||
|
||||
# Bake in config (models.yaml + skill discipline files)
|
||||
COPY config/ /app/config/
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# brain/ is writable state — mount a PersistentVolume here
|
||||
VOLUME /app/brain
|
||||
|
||||
ENV SUPERVISOR_CONFIG_DIR=/app/config/supervisor
|
||||
ENV SUPERVISOR_MODELS_FILE=/app/config/models.yaml
|
||||
ENV SUPERVISOR_BRAIN_DIR=/app/brain
|
||||
ENV SUPERVISOR_SESSIONS_DIR=/app/brain/sessions
|
||||
ENV SUPERVISOR_PORT=3200
|
||||
|
||||
EXPOSE 3200
|
||||
|
||||
ENTRYPOINT ["/usr/local/bin/supervisor"]
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Build locally to verify it compiles (no push)**
|
||||
|
||||
```bash
|
||||
# buildctl must be available locally, OR use docker if available on flamingo
|
||||
docker build --target builder -t supervisor-build-test . && echo "build stage OK"
|
||||
# If no docker on flamingo, skip this step and verify at Task 3 on koala instead
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add Dockerfile
|
||||
git commit -m "feat: add multi-stage Dockerfile with claude CLI runtime"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: BuildKit systemd service on koala [koala-ssh]
|
||||
|
||||
Install `buildkitd` as a root-level systemd service on koala. The Gitea runner process runs as root (confirmed by PID/cgroup), so the root socket at `/run/buildkit/buildkitd.sock` is accessible to it.
|
||||
|
||||
**Files:**
|
||||
- Create: `/etc/systemd/system/buildkitd.service` on koala
|
||||
- Create: `/etc/buildkit/buildkitd.toml` on koala (registry auth)
|
||||
|
||||
- [ ] **Step 1: Check if buildkitd is already installed**
|
||||
|
||||
```bash
|
||||
ssh koala "buildkitd --version 2>/dev/null || echo 'not installed'"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Install buildkitd on koala**
|
||||
|
||||
Download the latest buildkit release binary (arm64 or amd64 — koala has x86_64):
|
||||
|
||||
```bash
|
||||
ssh koala "
|
||||
BUILDKIT_VERSION=v0.21.0
|
||||
curl -sSL https://github.com/moby/buildkit/releases/download/\${BUILDKIT_VERSION}/buildkit-\${BUILDKIT_VERSION}.linux-amd64.tar.gz \
|
||||
| tar -xz -C /usr/local/
|
||||
buildkitd --version
|
||||
"
|
||||
```
|
||||
|
||||
Expected output includes: `buildkitd github.com/moby/buildkit v0.21.0`
|
||||
|
||||
- [ ] **Step 3: Create buildkitd.toml with Gitea registry auth**
|
||||
|
||||
The `[registry]` block configures auth for pushing to `gitea.d-ma.be`. The actual credentials come from `~/.docker/config.json` (which buildkitd reads automatically) — this toml just enables the registry:
|
||||
|
||||
```bash
|
||||
ssh koala "
|
||||
mkdir -p /etc/buildkit
|
||||
cat > /etc/buildkit/buildkitd.toml << 'EOF'
|
||||
[worker.containerd]
|
||||
enabled = false
|
||||
|
||||
[worker.oci]
|
||||
enabled = true
|
||||
|
||||
[registry.\"gitea.d-ma.be\"]
|
||||
http = false
|
||||
insecure = false
|
||||
EOF
|
||||
"
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Create systemd unit**
|
||||
|
||||
```bash
|
||||
ssh koala "
|
||||
cat > /etc/systemd/system/buildkitd.service << 'EOF'
|
||||
[Unit]
|
||||
Description=BuildKit daemon
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
ExecStart=/usr/local/bin/buildkitd --config /etc/buildkit/buildkitd.toml
|
||||
Restart=on-failure
|
||||
LimitNOFILE=1048576
|
||||
LimitNPROC=1048576
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
systemctl daemon-reload
|
||||
systemctl enable buildkitd
|
||||
systemctl start buildkitd
|
||||
"
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Verify the socket exists and is responsive**
|
||||
|
||||
```bash
|
||||
ssh koala "
|
||||
systemctl status buildkitd --no-pager
|
||||
buildctl --addr unix:///run/buildkit/buildkitd.sock debug info
|
||||
"
|
||||
```
|
||||
|
||||
Expected: service `active (running)`, buildctl shows BuildKit version info.
|
||||
|
||||
- [ ] **Step 6: Smoke-test build with trivial Dockerfile**
|
||||
|
||||
```bash
|
||||
ssh koala "
|
||||
echo 'FROM alpine:3.21
|
||||
RUN echo hello' | buildctl --addr unix:///run/buildkit/buildkitd.sock build \
|
||||
--frontend dockerfile.v0 \
|
||||
--local context=/ \
|
||||
--opt filename=Dockerfile \
|
||||
--output type=image,name=localhost/smoke-test:latest
|
||||
echo 'smoke test OK'
|
||||
"
|
||||
```
|
||||
|
||||
Expected: `smoke test OK`
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Gitea registry push auth for buildctl [koala-ssh]
|
||||
|
||||
`buildctl` reads Docker-style credentials from `/root/.docker/config.json`. Create the credentials file so the runner can push to `gitea.d-ma.be`.
|
||||
|
||||
**Prerequisites:** A Gitea user token or password with `write:packages` scope for the `mathias` org. Create one in Gitea → User Settings → Applications → Generate Token (scopes: `write:packages`).
|
||||
|
||||
- [ ] **Step 1: Create Gitea access token**
|
||||
|
||||
In Gitea UI (`https://gitea.d-ma.be`) → top-right avatar → Settings → Applications → Generate New Token.
|
||||
- Token name: `buildkit-push`
|
||||
- Scopes: `write:packages` (container registry write)
|
||||
- Copy the token — it won't be shown again.
|
||||
|
||||
- [ ] **Step 2: Write docker config.json on koala**
|
||||
|
||||
Replace `<TOKEN>` with the token from Step 1:
|
||||
|
||||
```bash
|
||||
ssh koala "
|
||||
mkdir -p /root/.docker
|
||||
TOKEN=<TOKEN>
|
||||
AUTH=\$(echo -n 'mathias:'\${TOKEN} | base64)
|
||||
cat > /root/.docker/config.json << EOF
|
||||
{
|
||||
\"auths\": {
|
||||
\"gitea.d-ma.be\": {
|
||||
\"auth\": \"\${AUTH}\"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
chmod 600 /root/.docker/config.json
|
||||
echo 'credentials written'
|
||||
"
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Verify push works**
|
||||
|
||||
```bash
|
||||
ssh koala "
|
||||
echo 'FROM alpine:3.21' | buildctl --addr unix:///run/buildkit/buildkitd.sock build \
|
||||
--frontend dockerfile.v0 \
|
||||
--local context=/ \
|
||||
--opt filename=Dockerfile \
|
||||
--output type=image,name=gitea.d-ma.be/mathias/supervisor:push-test,push=true
|
||||
echo 'push OK'
|
||||
"
|
||||
```
|
||||
|
||||
Expected: `push OK`. Verify in Gitea UI: `https://gitea.d-ma.be/mathias/supervisor/packages` should show a `push-test` tag.
|
||||
|
||||
- [ ] **Step 4: Delete the test image tag**
|
||||
|
||||
In Gitea UI → supervisor repo → Packages tab → delete the `push-test` tag.
|
||||
|
||||
---
|
||||
|
||||
## Task 4: age keypair + Flux SOPS decryption [kubectl + flamingo]
|
||||
|
||||
Flux decrypts SOPS-encrypted secrets at apply time. It needs the age private key stored as a k8s Secret in the `flux-system` namespace.
|
||||
|
||||
- [ ] **Step 1: Verify age is installed**
|
||||
|
||||
```bash
|
||||
age --version || brew install age
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Generate age keypair**
|
||||
|
||||
```bash
|
||||
age-keygen -o /tmp/supervisor-age.key
|
||||
cat /tmp/supervisor-age.key
|
||||
```
|
||||
|
||||
Output includes two lines:
|
||||
```
|
||||
# public key: age1xxxxxx...
|
||||
AGE-SECRET-KEY-1xxxxxxx...
|
||||
```
|
||||
|
||||
**Copy the public key** (the `age1...` value) — you'll need it in Task 7 for encrypting secrets.
|
||||
**Store the private key file securely** — back it up outside the cluster (e.g., 1Password or encrypted note).
|
||||
|
||||
- [ ] **Step 3: Create the SOPS age secret in flux-system**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic sops-age \
|
||||
--from-file=age.agekey=/tmp/supervisor-age.key \
|
||||
-n flux-system
|
||||
kubectl get secret sops-age -n flux-system
|
||||
```
|
||||
|
||||
Expected: secret exists with `age.agekey` key.
|
||||
|
||||
- [ ] **Step 4: Shred the temp key file**
|
||||
|
||||
```bash
|
||||
shred -u /tmp/supervisor-age.key
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Check what Flux Kustomization CRDs exist in the infra repo**
|
||||
|
||||
```bash
|
||||
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra-sops-setup
|
||||
ls /tmp/infra-sops-setup/flux-system/
|
||||
```
|
||||
|
||||
Look for a `kustomization.yaml` or `gotk-sync.yaml` that defines the main Flux Kustomization resource pointing at the `clusters/koala/` path.
|
||||
|
||||
- [ ] **Step 6: Patch the Flux Kustomization to enable SOPS decryption**
|
||||
|
||||
Find the Kustomization resource that syncs `clusters/koala/`. It will look like:
|
||||
|
||||
```yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: flux-system
|
||||
namespace: flux-system
|
||||
spec:
|
||||
path: ./clusters/koala
|
||||
...
|
||||
```
|
||||
|
||||
Add the `decryption` block:
|
||||
|
||||
```yaml
|
||||
decryption:
|
||||
provider: sops
|
||||
secretRef:
|
||||
name: sops-age
|
||||
```
|
||||
|
||||
Edit the file in `/tmp/infra-sops-setup/flux-system/` and commit:
|
||||
|
||||
```bash
|
||||
cd /tmp/infra-sops-setup
|
||||
# Edit the relevant Kustomization yaml to add decryption block (shown above)
|
||||
git add flux-system/
|
||||
git commit -m "feat: enable SOPS decryption via age key in flux-system"
|
||||
git push
|
||||
```
|
||||
|
||||
- [ ] **Step 7: Verify Flux picks up the change**
|
||||
|
||||
```bash
|
||||
flux reconcile source git flux-system
|
||||
flux get kustomizations
|
||||
```
|
||||
|
||||
Expected: `flux-system` Kustomization shows `Ready True` with no errors.
|
||||
|
||||
- [ ] **Step 8: Clean up temp clone**
|
||||
|
||||
```bash
|
||||
rm -rf /tmp/infra-sops-setup
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Infra repo — supervisor app manifests [infra-repo]
|
||||
|
||||
Create the full k8s manifest set for the supervisor service in the infra repo. The deployment uses an `IMAGE_TAG` placeholder; the CD job patches this with the actual git sha before pushing.
|
||||
|
||||
**Prerequisites:** age public key from Task 4 Step 2.
|
||||
|
||||
- [ ] **Step 1: Clone the infra repo**
|
||||
|
||||
```bash
|
||||
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra-supervisor
|
||||
cd /tmp/infra-supervisor
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Create namespace**
|
||||
|
||||
```bash
|
||||
mkdir -p apps/supervisor
|
||||
cat > apps/supervisor/namespace.yaml << 'EOF'
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: supervisor
|
||||
EOF
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Create deployment**
|
||||
|
||||
The `brain` volume is a `hostPath` on koala (simplest for a single-node service; add a PVC later if needed). The image uses `imagePullSecrets` to pull from the Gitea registry.
|
||||
|
||||
```bash
|
||||
cat > apps/supervisor/deployment.yaml << 'EOF'
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: supervisor
|
||||
namespace: supervisor
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: supervisor
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: supervisor
|
||||
spec:
|
||||
nodeSelector:
|
||||
kubernetes.io/hostname: koala
|
||||
imagePullSecrets:
|
||||
- name: gitea-registry
|
||||
containers:
|
||||
- name: supervisor
|
||||
image: gitea.d-ma.be/mathias/supervisor:IMAGE_TAG
|
||||
ports:
|
||||
- containerPort: 3200
|
||||
envFrom:
|
||||
- secretRef:
|
||||
name: supervisor-secrets
|
||||
env:
|
||||
- name: SUPERVISOR_PORT
|
||||
value: "3200"
|
||||
- name: LITELLM_BASE_URL
|
||||
value: "http://iguana:4000"
|
||||
- name: LLAMA_SWAP_URL
|
||||
value: "http://koala:8080"
|
||||
- name: INGEST_BASE_URL
|
||||
value: "http://localhost:3300"
|
||||
volumeMounts:
|
||||
- name: brain
|
||||
mountPath: /app/brain
|
||||
volumes:
|
||||
- name: brain
|
||||
hostPath:
|
||||
path: /var/lib/supervisor/brain
|
||||
type: DirectoryOrCreate
|
||||
EOF
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Create service**
|
||||
|
||||
```bash
|
||||
cat > apps/supervisor/service.yaml << 'EOF'
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: supervisor
|
||||
namespace: supervisor
|
||||
spec:
|
||||
selector:
|
||||
app: supervisor
|
||||
ports:
|
||||
- port: 3200
|
||||
targetPort: 3200
|
||||
type: ClusterIP
|
||||
EOF
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Create kustomization.yaml for supervisor**
|
||||
|
||||
```bash
|
||||
cat > apps/supervisor/kustomization.yaml << 'EOF'
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
- secrets.enc.yaml
|
||||
EOF
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Ensure clusters/koala/kustomization.yaml exists and includes supervisor**
|
||||
|
||||
Check if the file exists:
|
||||
|
||||
```bash
|
||||
cat clusters/koala/kustomization.yaml 2>/dev/null || echo "need to create"
|
||||
```
|
||||
|
||||
If it exists, add supervisor and imagepullsecret resources. If it does not exist, create it:
|
||||
|
||||
```bash
|
||||
cat > clusters/koala/kustomization.yaml << 'EOF'
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- ../../apps/imagepullsecret
|
||||
- ../../apps/supervisor
|
||||
EOF
|
||||
```
|
||||
|
||||
If it already exists, add the two resource lines (preserving existing entries).
|
||||
|
||||
- [ ] **Step 7: Commit (without secrets — those come in Task 6)**
|
||||
|
||||
```bash
|
||||
cd /tmp/infra-supervisor
|
||||
git add apps/supervisor/ clusters/koala/
|
||||
git commit -m "feat(supervisor): add k8s manifests for supervisor service"
|
||||
git push
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 6: SOPS-encrypted secrets in infra repo [infra-repo + flamingo]
|
||||
|
||||
Two encrypted secret files: the imagePullSecret for the Gitea container registry, and the supervisor app secrets (ANTHROPIC_API_KEY, LITELLM_API_KEY).
|
||||
|
||||
**Prerequisites:**
|
||||
- age public key from Task 4 Step 2 (format: `age1xxxxx...`)
|
||||
- `sops` installed (`brew install sops` if missing)
|
||||
- Gitea registry token (same one used in Task 3, or create a read-only one for pulling)
|
||||
|
||||
- [ ] **Step 1: Verify sops is installed**
|
||||
|
||||
```bash
|
||||
sops --version || brew install sops
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Create .sops.yaml in infra repo root**
|
||||
|
||||
This tells sops which key to use for all files in the repo:
|
||||
|
||||
```bash
|
||||
cd /tmp/infra-supervisor
|
||||
cat > .sops.yaml << 'EOF'
|
||||
creation_rules:
|
||||
- age: age1REPLACE_WITH_YOUR_PUBLIC_KEY
|
||||
EOF
|
||||
git add .sops.yaml
|
||||
git commit -m "chore: add sops config (age key)"
|
||||
git push
|
||||
```
|
||||
|
||||
Replace `age1REPLACE_WITH_YOUR_PUBLIC_KEY` with the actual age public key from Task 4.
|
||||
|
||||
- [ ] **Step 3: Create and encrypt the imagePullSecret**
|
||||
|
||||
The imagePullSecret is a namespace-less Secret (it will be targeted per namespace via Kustomize). Create it in the `imagepullsecret` app:
|
||||
|
||||
```bash
|
||||
mkdir -p apps/imagepullsecret
|
||||
|
||||
# Create a registry pull token in Gitea: Settings → Applications → Generate Token
|
||||
# Scopes: read:packages
|
||||
# Use that token here (or reuse the buildkit-push token — read access is enough for pulling)
|
||||
PULL_TOKEN=<gitea-read-packages-token>
|
||||
PULL_AUTH=$(echo -n "mathias:${PULL_TOKEN}" | base64)
|
||||
|
||||
cat > /tmp/gitea-pull-secret.yaml << EOF
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: gitea-registry
|
||||
namespace: supervisor
|
||||
type: kubernetes.io/dockerconfigjson
|
||||
stringData:
|
||||
.dockerconfigjson: |
|
||||
{
|
||||
"auths": {
|
||||
"gitea.d-ma.be": {
|
||||
"auth": "${PULL_AUTH}"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
sops --encrypt /tmp/gitea-pull-secret.yaml > apps/imagepullsecret/secret.enc.yaml
|
||||
rm /tmp/gitea-pull-secret.yaml
|
||||
|
||||
cat > apps/imagepullsecret/kustomization.yaml << 'EOF'
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- secret.enc.yaml
|
||||
EOF
|
||||
```
|
||||
|
||||
Verify the encrypted file looks correct (should show `sops:` metadata at the bottom):
|
||||
|
||||
```bash
|
||||
tail -20 apps/imagepullsecret/secret.enc.yaml
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Create and encrypt supervisor app secrets**
|
||||
|
||||
```bash
|
||||
# ANTHROPIC_API_KEY: your Anthropic API key
|
||||
# LITELLM_API_KEY: the key your LiteLLM instance expects (can be any string if it's local)
|
||||
cat > /tmp/supervisor-secrets.yaml << 'EOF'
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: supervisor-secrets
|
||||
namespace: supervisor
|
||||
type: Opaque
|
||||
stringData:
|
||||
ANTHROPIC_API_KEY: "REPLACE_WITH_REAL_KEY"
|
||||
LITELLM_API_KEY: "REPLACE_WITH_REAL_KEY"
|
||||
EOF
|
||||
|
||||
# Edit /tmp/supervisor-secrets.yaml to insert real values, then:
|
||||
sops --encrypt /tmp/supervisor-secrets.yaml > apps/supervisor/secrets.enc.yaml
|
||||
rm /tmp/supervisor-secrets.yaml
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
tail -20 apps/supervisor/secrets.enc.yaml
|
||||
# Should show encrypted values and sops metadata
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Commit encrypted secrets**
|
||||
|
||||
```bash
|
||||
cd /tmp/infra-supervisor
|
||||
git add apps/imagepullsecret/ apps/supervisor/secrets.enc.yaml .sops.yaml
|
||||
git commit -m "feat: add SOPS-encrypted imagePullSecret and supervisor app secrets"
|
||||
git push
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Verify Flux reconciles and creates the secrets**
|
||||
|
||||
Wait ~60s then:
|
||||
|
||||
```bash
|
||||
flux reconcile kustomization flux-system --with-source
|
||||
kubectl get secrets -n supervisor
|
||||
```
|
||||
|
||||
Expected: `gitea-registry` and `supervisor-secrets` appear in the `supervisor` namespace.
|
||||
|
||||
- [ ] **Step 7: Clean up temp clone**
|
||||
|
||||
```bash
|
||||
rm -rf /tmp/infra-supervisor
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 7: Gitea org-level secrets [gitea-ui + koala-ssh]
|
||||
|
||||
Set the three secrets that all repos in the `mathias` org will inherit. These go in the Gitea org (not individual repos).
|
||||
|
||||
**Files:** No files — Gitea UI configuration.
|
||||
|
||||
- [ ] **Step 1: Generate SSH deploy key for infra repo**
|
||||
|
||||
On flamingo:
|
||||
|
||||
```bash
|
||||
ssh-keygen -t ed25519 -C "cd-bot infra deploy key" -f /tmp/infra-deploy-key -N ""
|
||||
cat /tmp/infra-deploy-key # private key → INFRA_DEPLOY_KEY secret
|
||||
cat /tmp/infra-deploy-key.pub # public key → add to Gitea infra repo as deploy key
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add public key to infra repo as a deploy key (write access)**
|
||||
|
||||
In Gitea UI: `https://gitea.d-ma.be/mathias/infra` → Settings → Deploy Keys → Add Deploy Key.
|
||||
- Title: `cd-bot`
|
||||
- Key: paste content of `/tmp/infra-deploy-key.pub`
|
||||
- Enable write access: ✓
|
||||
|
||||
- [ ] **Step 3: Set org-level secrets in Gitea**
|
||||
|
||||
In Gitea UI: `https://gitea.d-ma.be/org/mathias/settings/secrets` → Add Secret.
|
||||
|
||||
Set these three secrets:
|
||||
|
||||
| Secret name | Value |
|
||||
|-------------|-------|
|
||||
| `INFRA_DEPLOY_KEY` | content of `/tmp/infra-deploy-key` (private key, including `-----BEGIN...` lines) |
|
||||
| `BUILDKIT_REGISTRY_AUTH` | same base64 auth string as used in Task 3 Step 2 (format: `mathias:<token>` base64-encoded) |
|
||||
|
||||
Note: `BUILDKIT_REGISTRY_AUTH` is redundant if `/root/.docker/config.json` is already on the runner host from Task 3 — but setting it as a secret allows the `cd.yml` to explicitly pass it to `buildctl` for clarity and rotation.
|
||||
|
||||
- [ ] **Step 4: Clean up temp key files**
|
||||
|
||||
```bash
|
||||
shred -u /tmp/infra-deploy-key /tmp/infra-deploy-key.pub
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Verify secrets appear in Gitea**
|
||||
|
||||
In Gitea UI: `https://gitea.d-ma.be/org/mathias/settings/secrets` — confirm both secrets are listed (values are hidden, only names shown).
|
||||
|
||||
---
|
||||
|
||||
## Task 8: cd.yml workflow [this-repo]
|
||||
|
||||
Create the CD workflow that triggers after CI passes, builds the image with buildctl, and commits the updated tag to the infra repo.
|
||||
|
||||
**Files:**
|
||||
- Create: `.gitea/workflows/cd.yml`
|
||||
|
||||
- [ ] **Step 1: Create cd.yml**
|
||||
|
||||
```bash
|
||||
cat > .gitea/workflows/cd.yml << 'EOF'
|
||||
name: cd
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
name: Build and deploy
|
||||
needs: [check] # 'check' is the job name in ci.yml
|
||||
runs-on: self-hosted
|
||||
env:
|
||||
SERVICE: supervisor
|
||||
REGISTRY: gitea.d-ma.be
|
||||
IMAGE: gitea.d-ma.be/mathias/supervisor
|
||||
INFRA_REPO: git@gitea.d-ma.be:mathias/infra.git
|
||||
BUILDKIT_HOST: unix:///run/buildkit/buildkitd.sock
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Build and push image
|
||||
run: |
|
||||
IMAGE_TAG="${{ github.sha }}"
|
||||
echo "Building ${IMAGE}:${IMAGE_TAG}"
|
||||
buildctl --addr "${BUILDKIT_HOST}" build \
|
||||
--frontend dockerfile.v0 \
|
||||
--local context=. \
|
||||
--local dockerfile=. \
|
||||
--opt build-arg:VERSION="${IMAGE_TAG}" \
|
||||
--output "type=image,name=${IMAGE}:${IMAGE_TAG},push=true"
|
||||
echo "IMAGE_TAG=${IMAGE_TAG}" >> $GITHUB_OUTPUT
|
||||
id: build
|
||||
|
||||
- name: Update infra repo
|
||||
run: |
|
||||
IMAGE_TAG="${{ github.sha }}"
|
||||
# Write SSH key for infra repo
|
||||
mkdir -p ~/.ssh
|
||||
echo "${{ secrets.INFRA_DEPLOY_KEY }}" > ~/.ssh/infra_deploy_key
|
||||
chmod 600 ~/.ssh/infra_deploy_key
|
||||
ssh-keyscan gitea.d-ma.be >> ~/.ssh/known_hosts 2>/dev/null
|
||||
|
||||
# Clone infra repo
|
||||
GIT_SSH_COMMAND="ssh -i ~/.ssh/infra_deploy_key -o IdentitiesOnly=yes" \
|
||||
git clone "${INFRA_REPO}" /tmp/infra-update
|
||||
|
||||
# Patch the image tag
|
||||
cd /tmp/infra-update
|
||||
sed -i "s|gitea.d-ma.be/mathias/supervisor:.*|gitea.d-ma.be/mathias/supervisor:${IMAGE_TAG}|" \
|
||||
"apps/${SERVICE}/deployment.yaml"
|
||||
|
||||
# Commit and push
|
||||
git config user.email "cd-bot@d-ma.be"
|
||||
git config user.name "CD Bot"
|
||||
git add "apps/${SERVICE}/deployment.yaml"
|
||||
git commit -m "chore(deploy): ${SERVICE} → ${IMAGE_TAG}"
|
||||
GIT_SSH_COMMAND="ssh -i ~/.ssh/infra_deploy_key -o IdentitiesOnly=yes" \
|
||||
git push
|
||||
|
||||
# Clean up
|
||||
rm -rf /tmp/infra-update
|
||||
rm ~/.ssh/infra_deploy_key
|
||||
echo "Infra repo updated: ${SERVICE} → ${IMAGE_TAG}"
|
||||
EOF
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify the `needs` job name matches ci.yml**
|
||||
|
||||
```bash
|
||||
grep "^ [a-z].*:$" .gitea/workflows/ci.yml
|
||||
```
|
||||
|
||||
The output should show `check:` as the quality-gate job name. The `cd.yml` uses `needs: [check]` — confirm this matches.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add .gitea/workflows/cd.yml
|
||||
git commit -m "feat: add CD workflow (buildctl → Gitea registry → infra repo update)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 9: End-to-end smoke test
|
||||
|
||||
Trigger the full pipeline and verify each stage.
|
||||
|
||||
- [ ] **Step 1: Push to main to trigger CI + CD**
|
||||
|
||||
```bash
|
||||
git push origin main
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Monitor CI job in Gitea**
|
||||
|
||||
Open `https://gitea.d-ma.be/mathias/supervisor/actions` — wait for the `ci` workflow `check` job to pass.
|
||||
|
||||
- [ ] **Step 3: Monitor CD job**
|
||||
|
||||
In the same actions view, the `cd` workflow should start after `ci` passes. Check the `Build and push image` step output for:
|
||||
|
||||
```
|
||||
Building gitea.d-ma.be/mathias/supervisor:<sha>
|
||||
```
|
||||
|
||||
And the `Update infra repo` step for:
|
||||
|
||||
```
|
||||
Infra repo updated: supervisor → <sha>
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Verify image in Gitea registry**
|
||||
|
||||
```
|
||||
https://gitea.d-ma.be/mathias/supervisor/packages
|
||||
```
|
||||
|
||||
Should show a new tag matching the commit sha.
|
||||
|
||||
- [ ] **Step 5: Verify infra repo commit**
|
||||
|
||||
```bash
|
||||
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra-verify
|
||||
cd /tmp/infra-verify
|
||||
git log --oneline -3
|
||||
```
|
||||
|
||||
Expected: most recent commit message is `chore(deploy): supervisor → <sha>`.
|
||||
|
||||
```bash
|
||||
grep "image:" apps/supervisor/deployment.yaml
|
||||
```
|
||||
|
||||
Expected: `image: gitea.d-ma.be/mathias/supervisor:<sha>`
|
||||
|
||||
- [ ] **Step 6: Verify Flux reconciles**
|
||||
|
||||
```bash
|
||||
flux get kustomizations
|
||||
```
|
||||
|
||||
Expected: `flux-system` shows `Ready True` and `Applied revision: main/<infra-sha>`.
|
||||
|
||||
```bash
|
||||
kubectl get pods -n supervisor
|
||||
```
|
||||
|
||||
Expected: supervisor pod is `Running` with the new image sha.
|
||||
|
||||
- [ ] **Step 7: Verify pod started correctly**
|
||||
|
||||
```bash
|
||||
kubectl logs -n supervisor deployment/supervisor --tail=20
|
||||
```
|
||||
|
||||
Expected: supervisor startup logs (MCP server listening on port 3200, no errors).
|
||||
|
||||
- [ ] **Step 8: Clean up verify clone**
|
||||
|
||||
```bash
|
||||
rm -rf /tmp/infra-verify
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 10: Post-deploy — registry retention policy [gitea-ui]
|
||||
|
||||
Prevent the Gitea container registry from filling up by setting a tag retention policy.
|
||||
|
||||
- [ ] **Step 1: Set tag retention in Gitea**
|
||||
|
||||
In Gitea UI: `https://gitea.d-ma.be/mathias/supervisor` → Settings → Packages → Container Registry.
|
||||
|
||||
Set: Keep last **20** tags per image name.
|
||||
|
||||
If Gitea does not expose a UI retention policy, note this for manual cleanup and open a task to automate it (e.g., a weekly Actions job that calls `docker image prune` via the Gitea API).
|
||||
|
||||
- [ ] **Step 2: Verify existing test tags are cleaned up**
|
||||
|
||||
Manually delete any test tags pushed during Task 3 if not already done.
|
||||
|
||||
---
|
||||
|
||||
## Self-review checklist (for plan author — not a task)
|
||||
|
||||
- [x] **Spec coverage:** BuildKit systemd ✓, cd.yml ✓, Flux SOPS ✓, infra repo structure ✓, imagePullSecret ✓, app secrets ✓, Gitea org secrets ✓, error handling (implicit in workflow failures) ✓, registry retention ✓, smoke test ✓
|
||||
- [x] **Placeholders:** `REPLACE_WITH_YOUR_PUBLIC_KEY` and `REPLACE_WITH_REAL_KEY` are intentional — real values come from user's secrets; marked clearly
|
||||
- [x] **Type consistency:** No shared types across tasks (infra-only plan)
|
||||
- [x] **Known gaps:** `needs: [check]` assumes ci.yml job name is `check` — verified in Task 8 Step 2. The `sed` image tag patch assumes no other image line in deployment.yaml — the deployment template only has one `image:` line.
|
||||
1617
docs/superpowers/plans/2026-04-20-model-orchestration-plan.md
Normal file
1617
docs/superpowers/plans/2026-04-20-model-orchestration-plan.md
Normal file
File diff suppressed because it is too large
Load Diff
2608
docs/superpowers/plans/2026-04-22-brain-ingestion-pipeline.md
Normal file
2608
docs/superpowers/plans/2026-04-22-brain-ingestion-pipeline.md
Normal file
File diff suppressed because it is too large
Load Diff
858
docs/superpowers/plans/2026-04-22-brain-ingestion-quality.md
Normal file
858
docs/superpowers/plans/2026-04-22-brain-ingestion-quality.md
Normal file
@@ -0,0 +1,858 @@
|
||||
# Brain Ingestion Quality: PDF Extraction + Entity Resolution
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Fix PDF ingestion (currently passes raw bytes to LLM) and add fuzzy entity resolution (prevents slug proliferation at scale).
|
||||
|
||||
**Architecture:** Two independent improvements wired into the existing pipeline. A new `extract` package handles text extraction by file type (pdftotext subprocess, passthrough for .md/.txt). A new `resolve.go` in the `pipeline` package normalizes proposed entity/concept titles against the loaded inventory to reuse existing slugs instead of creating duplicates. Both changes are wired into `watcher.go` and `api/handler.go` with no new dependencies except `poppler-utils` in the Docker image.
|
||||
|
||||
**Tech Stack:** Go stdlib (`os/exec`, `bufio`, `strings`), testify, poppler-utils (`pdftotext`)
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
**New files:**
|
||||
- `ingestion/internal/extract/extract.go` — `Text(path string) (string, error)` dispatcher
|
||||
- `ingestion/internal/extract/pdf.go` — `pdftotext` subprocess extraction
|
||||
- `ingestion/internal/extract/extract_test.go` — table-driven tests for all paths
|
||||
- `ingestion/internal/pipeline/resolve.go` — `Resolve(proposed []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) []wiki.Page`
|
||||
- `ingestion/internal/pipeline/resolve_test.go` — table-driven tests
|
||||
|
||||
**Modified files:**
|
||||
- `ingestion/internal/wiki/types.go` — add `Aliases []string` to `Entry`
|
||||
- `ingestion/internal/wiki/inventory.go` — `readFrontmatter` reads both title and aliases
|
||||
- `ingestion/internal/wiki/inventory_test.go` — add alias coverage
|
||||
- `ingestion/internal/pipeline/pipeline.go` — call `Resolve` after `ParsePages`
|
||||
- `ingestion/internal/watcher/watcher.go` — call `extract.Text` instead of `os.ReadFile`
|
||||
- `ingestion/internal/api/handler.go` — call `extract.Text` for path-based ingestion
|
||||
- `ingestion/Dockerfile` — `apk add poppler-utils`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: `extract` package — Text() dispatcher with .md/.txt passthrough
|
||||
|
||||
**Files:**
|
||||
- Create: `ingestion/internal/extract/extract.go`
|
||||
- Create: `ingestion/internal/extract/extract_test.go`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
```go
|
||||
// ingestion/internal/extract/extract_test.go
|
||||
package extract
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestText_Markdown(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "note.md")
|
||||
require.NoError(t, os.WriteFile(path, []byte("# Hello\n\nWorld."), 0o644))
|
||||
|
||||
got, err := Text(path)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "# Hello\n\nWorld.", got)
|
||||
}
|
||||
|
||||
func TestText_Txt(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "note.txt")
|
||||
require.NoError(t, os.WriteFile(path, []byte("plain text"), 0o644))
|
||||
|
||||
got, err := Text(path)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "plain text", got)
|
||||
}
|
||||
|
||||
func TestText_UnsupportedExtension(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "data.csv")
|
||||
require.NoError(t, os.WriteFile(path, []byte("a,b,c"), 0o644))
|
||||
|
||||
_, err := Text(path)
|
||||
assert.ErrorContains(t, err, "unsupported")
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/extract/... -v
|
||||
```
|
||||
Expected: compile error — package does not exist yet.
|
||||
|
||||
- [ ] **Step 3: Implement extract.go**
|
||||
|
||||
```go
|
||||
// ingestion/internal/extract/extract.go
|
||||
package extract
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// Text reads the file at path and returns its plain-text content.
|
||||
// Supported extensions: .md, .txt (passthrough), .pdf (via pdftotext).
|
||||
func Text(path string) (string, error) {
|
||||
ext := strings.ToLower(fileExt(path))
|
||||
switch ext {
|
||||
case ".md", ".txt":
|
||||
b, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("read %s: %w", path, err)
|
||||
}
|
||||
return string(b), nil
|
||||
case ".pdf":
|
||||
return extractPDF(path)
|
||||
default:
|
||||
return "", fmt.Errorf("unsupported file extension: %s", ext)
|
||||
}
|
||||
}
|
||||
|
||||
// fileExt returns the file extension including the dot, lowercased.
|
||||
func fileExt(path string) string {
|
||||
for i := len(path) - 1; i >= 0; i-- {
|
||||
if path[i] == '.' {
|
||||
return path[i:]
|
||||
}
|
||||
if path[i] == '/' || path[i] == '\\' {
|
||||
break
|
||||
}
|
||||
}
|
||||
return ""
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Add pdf.go stub so it compiles**
|
||||
|
||||
```go
|
||||
// ingestion/internal/extract/pdf.go
|
||||
package extract
|
||||
|
||||
import "fmt"
|
||||
|
||||
func extractPDF(_ string) (string, error) {
|
||||
return "", fmt.Errorf("PDF extraction not implemented")
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run tests to verify they pass**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/extract/... -v
|
||||
```
|
||||
Expected: PASS — 3 tests passing.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
cd ingestion && git add internal/extract/
|
||||
git commit -m "feat(extract): add Text() dispatcher with md/txt passthrough"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: PDF extraction via pdftotext
|
||||
|
||||
**Files:**
|
||||
- Modify: `ingestion/internal/extract/pdf.go`
|
||||
- Modify: `ingestion/internal/extract/extract_test.go`
|
||||
|
||||
- [ ] **Step 1: Add PDF test (skip if pdftotext absent)**
|
||||
|
||||
Append to `extract_test.go`:
|
||||
|
||||
```go
|
||||
func TestText_PDF(t *testing.T) {
|
||||
if _, err := exec.LookPath("pdftotext"); err != nil {
|
||||
t.Skip("pdftotext not available")
|
||||
}
|
||||
// Use a known PDF fixture; if none, create a minimal one via echo.
|
||||
// The test verifies the round-trip: a PDF containing "Hello PDF" yields that string.
|
||||
dir := t.TempDir()
|
||||
pdfPath := filepath.Join(dir, "test.pdf")
|
||||
|
||||
// Generate a minimal single-page PDF using a here-doc approach.
|
||||
// This is a valid minimal PDF containing the text "Hello PDF".
|
||||
minimalPDF := "%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj\n" +
|
||||
"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj\n" +
|
||||
"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R/Contents 4 0 R/Resources<</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>>>>>>>endobj\n" +
|
||||
"4 0 obj<</Length 44>>\nstream\nBT /F1 12 Tf 100 700 Td (Hello PDF) Tj ET\nendstream\nendobj\n" +
|
||||
"xref\n0 5\n0000000000 65535 f\n0000000009 00000 n\n0000000058 00000 n\n0000000115 00000 n\n0000000310 00000 n\n" +
|
||||
"trailer<</Size 5/Root 1 0 R>>\nstartxref\n406\n%%EOF\n"
|
||||
require.NoError(t, os.WriteFile(pdfPath, []byte(minimalPDF), 0o644))
|
||||
|
||||
got, err := Text(pdfPath)
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, got, "Hello PDF")
|
||||
}
|
||||
```
|
||||
|
||||
Add `"os/exec"` to imports in `extract_test.go`.
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails (or skips)**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/extract/... -v -run TestText_PDF
|
||||
```
|
||||
Expected: SKIP (pdftotext not installed locally) or FAIL with "not implemented".
|
||||
|
||||
- [ ] **Step 3: Implement pdf.go**
|
||||
|
||||
```go
|
||||
// ingestion/internal/extract/pdf.go
|
||||
package extract
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"os/exec"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// extractPDF runs pdftotext on path and returns the extracted text.
|
||||
// pdftotext must be installed (package: poppler-utils on Alpine/Debian, poppler on Homebrew).
|
||||
func extractPDF(path string) (string, error) {
|
||||
cmd := exec.Command("pdftotext", "-q", path, "-")
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
|
||||
if err := cmd.Run(); err != nil {
|
||||
errMsg := strings.TrimSpace(stderr.String())
|
||||
if errMsg == "" {
|
||||
errMsg = err.Error()
|
||||
}
|
||||
return "", fmt.Errorf("pdftotext: %s", errMsg)
|
||||
}
|
||||
|
||||
return strings.TrimSpace(stdout.String()), nil
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run all extract tests**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/extract/... -v
|
||||
```
|
||||
Expected: PASS (PDF test skips if pdftotext absent, passes if present).
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
cd ingestion && git add internal/extract/pdf.go internal/extract/extract_test.go
|
||||
git commit -m "feat(extract): implement PDF extraction via pdftotext"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: `Entry.Aliases` + inventory reads aliases from frontmatter
|
||||
|
||||
**Files:**
|
||||
- Modify: `ingestion/internal/wiki/types.go`
|
||||
- Modify: `ingestion/internal/wiki/inventory.go`
|
||||
- Modify: `ingestion/internal/wiki/inventory_test.go`
|
||||
|
||||
- [ ] **Step 1: Write failing test for alias loading**
|
||||
|
||||
Add to `inventory_test.go`:
|
||||
|
||||
```go
|
||||
func TestLoadInventory_ReadsAliases(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
|
||||
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "entities", "ryan-singer.md"),
|
||||
[]byte("---\ntitle: Ryan Singer\naliases:\n - Singer\n - R. Singer\n---\n\n## Description\n\nDesigner.\n"),
|
||||
0o644,
|
||||
))
|
||||
|
||||
inv, err := LoadInventory(dir)
|
||||
require.NoError(t, err)
|
||||
|
||||
require.Len(t, inv[PageTypeEntity], 1)
|
||||
e := inv[PageTypeEntity][0]
|
||||
assert.Equal(t, "Ryan Singer", e.Title)
|
||||
assert.Equal(t, []string{"Singer", "R. Singer"}, e.Aliases)
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/wiki/... -v -run TestLoadInventory_ReadsAliases
|
||||
```
|
||||
Expected: compile error — `Entry` has no `Aliases` field.
|
||||
|
||||
- [ ] **Step 3: Add Aliases to Entry in types.go**
|
||||
|
||||
```go
|
||||
// Entry is a summary of an existing wiki page used to build the inventory.
|
||||
type Entry struct {
|
||||
Slug string
|
||||
Title string
|
||||
Aliases []string
|
||||
Type PageType
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Replace readTitle with readFrontmatter in inventory.go**
|
||||
|
||||
Replace the `readTitle` function and its call site:
|
||||
|
||||
```go
|
||||
// readFrontmatter extracts title and aliases from YAML frontmatter.
|
||||
// Falls back to slug for title and empty aliases on any error.
|
||||
func readFrontmatter(path, fallbackSlug string) (title string, aliases []string) {
|
||||
title = fallbackSlug
|
||||
f, err := os.Open(path)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
defer f.Close()
|
||||
|
||||
scanner := bufio.NewScanner(f)
|
||||
inFM := false
|
||||
inAliases := false
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
if strings.TrimSpace(line) == "---" {
|
||||
if !inFM {
|
||||
inFM = true
|
||||
continue
|
||||
}
|
||||
break // end of frontmatter
|
||||
}
|
||||
if !inFM {
|
||||
continue
|
||||
}
|
||||
|
||||
// Detect alias list items (lines starting with " - ").
|
||||
if inAliases {
|
||||
trimmed := strings.TrimSpace(line)
|
||||
if strings.HasPrefix(trimmed, "- ") {
|
||||
aliases = append(aliases, strings.TrimPrefix(trimmed, "- "))
|
||||
continue
|
||||
}
|
||||
inAliases = false // end of alias block
|
||||
}
|
||||
|
||||
key, val, ok := strings.Cut(line, ":")
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
switch strings.TrimSpace(key) {
|
||||
case "title":
|
||||
title = strings.Trim(strings.TrimSpace(val), `"'`)
|
||||
case "aliases":
|
||||
inAliases = true
|
||||
}
|
||||
}
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
Update `LoadInventory` to use `readFrontmatter`:
|
||||
|
||||
```go
|
||||
title, aliases := readFrontmatter(path, slug)
|
||||
result[pt] = append(result[pt], Entry{Slug: slug, Title: title, Aliases: aliases, Type: pt})
|
||||
```
|
||||
|
||||
Remove the old `readTitle` function entirely.
|
||||
|
||||
- [ ] **Step 5: Run all wiki tests**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/wiki/... -v
|
||||
```
|
||||
Expected: PASS — all existing tests plus new alias test.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
cd ingestion && git add internal/wiki/types.go internal/wiki/inventory.go internal/wiki/inventory_test.go
|
||||
git commit -m "feat(wiki): add Aliases to Entry and read from YAML frontmatter"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Fuzzy entity resolution
|
||||
|
||||
**Files:**
|
||||
- Create: `ingestion/internal/pipeline/resolve.go`
|
||||
- Create: `ingestion/internal/pipeline/resolve_test.go`
|
||||
|
||||
- [ ] **Step 1: Write failing tests**
|
||||
|
||||
```go
|
||||
// ingestion/internal/pipeline/resolve_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
func TestResolve_NoMatch(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/new-person.md", Content: "---\ntitle: New Person\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer"}},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/entities/new-person.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_TitleMatchRedirectsSlug(t *testing.T) {
|
||||
// Proposed slug differs from existing but title matches.
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/ryan-singer-the-designer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_AliasMatchRedirectsSlug(t *testing.T) {
|
||||
// Proposed title matches an existing alias.
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/singer.md", Content: "---\ntitle: Singer\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer", "R. Singer"}},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_NormalizationCaseAndArticles(t *testing.T) {
|
||||
// "the shape up method" normalizes to "shape up method" which matches "Shape Up Method".
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/concepts/the-shape-up-method.md", Content: "---\ntitle: The Shape Up Method\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {
|
||||
{Slug: "shape-up-method", Title: "Shape Up Method", Aliases: nil},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/concepts/shape-up-method.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_OnlyMatchesSamePageType(t *testing.T) {
|
||||
// A concept slug must not redirect to an entity with the same normalized name.
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/concepts/ryan-singer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
|
||||
},
|
||||
wiki.PageTypeConcept: {},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
// Not redirected — different page type.
|
||||
assert.Equal(t, "wiki/concepts/ryan-singer.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_EmptyInventory(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/first.md", Content: "---\ntitle: First\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Equal(t, proposed, got)
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/pipeline/... -v -run TestResolve
|
||||
```
|
||||
Expected: compile error — `Resolve` not defined.
|
||||
|
||||
- [ ] **Step 3: Implement resolve.go**
|
||||
|
||||
```go
|
||||
// ingestion/internal/pipeline/resolve.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"path/filepath"
|
||||
"strings"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
// Resolve remaps proposed pages to existing slugs when a fuzzy title match is found.
|
||||
// It only matches within the same page type (entities→entities, concepts→concepts).
|
||||
// Pages with no inventory match are returned unchanged.
|
||||
func Resolve(proposed []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) []wiki.Page {
|
||||
// Build normalized lookup: normalized_title → canonical slug, keyed by page type.
|
||||
type key struct {
|
||||
pt wiki.PageType
|
||||
normalized string
|
||||
}
|
||||
lookup := make(map[key]string) // key → canonical slug
|
||||
for pt, entries := range inventory {
|
||||
for _, e := range entries {
|
||||
k := key{pt: pt, normalized: normalizeTitle(e.Title)}
|
||||
lookup[k] = e.Slug
|
||||
for _, alias := range e.Aliases {
|
||||
ak := key{pt: pt, normalized: normalizeTitle(alias)}
|
||||
if _, exists := lookup[ak]; !exists {
|
||||
lookup[ak] = e.Slug
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
out := make([]wiki.Page, 0, len(proposed))
|
||||
for _, page := range proposed {
|
||||
pt := pageTypeFromPath(page.Path)
|
||||
title := extractTitle(page.Content)
|
||||
k := key{pt: pt, normalized: normalizeTitle(title)}
|
||||
if canonicalSlug, ok := lookup[k]; ok {
|
||||
// Redirect path to canonical slug.
|
||||
dir := filepath.Dir(page.Path)
|
||||
page.Path = dir + "/" + canonicalSlug + ".md"
|
||||
}
|
||||
out = append(out, page)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// normalizeTitle lowercases, removes leading articles, collapses whitespace.
|
||||
// "The Shape Up Method" → "shape up method"
|
||||
func normalizeTitle(s string) string {
|
||||
s = strings.ToLower(strings.TrimSpace(s))
|
||||
// Strip leading articles.
|
||||
for _, article := range []string{"the ", "a ", "an "} {
|
||||
s = strings.TrimPrefix(s, article)
|
||||
}
|
||||
// Collapse internal whitespace and replace hyphens.
|
||||
s = strings.ReplaceAll(s, "-", " ")
|
||||
return strings.Join(strings.Fields(s), " ")
|
||||
}
|
||||
|
||||
// pageTypeFromPath extracts the wiki.PageType from a path like "wiki/entities/foo.md".
|
||||
func pageTypeFromPath(path string) wiki.PageType {
|
||||
parts := strings.Split(filepath.ToSlash(path), "/")
|
||||
if len(parts) >= 2 {
|
||||
return wiki.PageType(parts[1])
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// extractTitle reads the title field from YAML frontmatter in content.
|
||||
// Falls back to empty string if not found.
|
||||
func extractTitle(content string) string {
|
||||
lines := strings.SplitN(content, "\n", 30)
|
||||
inFM := false
|
||||
for _, line := range lines {
|
||||
if strings.TrimSpace(line) == "---" {
|
||||
if !inFM {
|
||||
inFM = true
|
||||
continue
|
||||
}
|
||||
break
|
||||
}
|
||||
if inFM {
|
||||
key, val, ok := strings.Cut(line, ":")
|
||||
if ok && strings.TrimSpace(key) == "title" {
|
||||
return strings.Trim(strings.TrimSpace(val), `"'`)
|
||||
}
|
||||
}
|
||||
}
|
||||
return ""
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run resolve tests**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/pipeline/... -v -run TestResolve
|
||||
```
|
||||
Expected: PASS — 6 tests passing.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
cd ingestion && git add internal/pipeline/resolve.go internal/pipeline/resolve_test.go
|
||||
git commit -m "feat(pipeline): add fuzzy entity resolution to prevent slug proliferation"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Wire Resolve into pipeline.Run
|
||||
|
||||
**Files:**
|
||||
- Modify: `ingestion/internal/pipeline/pipeline.go`
|
||||
|
||||
- [ ] **Step 1: Add Resolve call after ParsePages in Run()**
|
||||
|
||||
In `pipeline.go`, locate the loop that builds `allPages`. After `allPages = append(allPages, pages...)`, we have all pages from all chunks. Resolve must run after all chunks are merged, against the snapshot inventory loaded at the start of the run.
|
||||
|
||||
Replace the `merged := mergeAll(allPages)` line with:
|
||||
|
||||
```go
|
||||
resolved := Resolve(allPages, inventory)
|
||||
merged := mergeAll(resolved)
|
||||
```
|
||||
|
||||
The full relevant section of `Run` after this change:
|
||||
|
||||
```go
|
||||
for _, chunk := range chunks {
|
||||
userPrompt := BuildPrompt(schema, source, chunk, inventory)
|
||||
output, err := cfg.Complete(ctx, systemPrompt, userPrompt)
|
||||
if err != nil {
|
||||
return Result{}, fmt.Errorf("LLM call: %w", err)
|
||||
}
|
||||
pages, warnings := ParsePages(output)
|
||||
allPages = append(allPages, pages...)
|
||||
allWarnings = append(allWarnings, warnings...)
|
||||
}
|
||||
|
||||
resolved := Resolve(allPages, inventory)
|
||||
merged := mergeAll(resolved)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run all pipeline tests**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./internal/pipeline/... -v
|
||||
```
|
||||
Expected: PASS — all existing tests still pass (Resolve is a no-op when inventory is empty or no title matches).
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
cd ingestion && git add internal/pipeline/pipeline.go
|
||||
git commit -m "feat(pipeline): resolve proposed pages against inventory before writing"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Wire extract.Text into watcher and handler
|
||||
|
||||
**Files:**
|
||||
- Modify: `ingestion/internal/watcher/watcher.go`
|
||||
- Modify: `ingestion/internal/api/handler.go`
|
||||
|
||||
- [ ] **Step 1: Update watcher.go**
|
||||
|
||||
In `processFile`, replace:
|
||||
|
||||
```go
|
||||
content, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return fmt.Errorf("read file: %w", err)
|
||||
}
|
||||
|
||||
_, runErr := pipeline.Run(ctx, cfg.Pipeline, cfg.BrainDir, string(content), source, false)
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```go
|
||||
content, err := extract.Text(path)
|
||||
if err != nil {
|
||||
return fmt.Errorf("extract text: %w", err)
|
||||
}
|
||||
|
||||
_, runErr := pipeline.Run(ctx, cfg.Pipeline, cfg.BrainDir, content, source, false)
|
||||
```
|
||||
|
||||
Add import: `"github.com/mathiasbq/hyperguild/ingestion/internal/extract"`
|
||||
|
||||
Remove import: `"os"` if no longer used (check — `os` is still used for `os.MkdirAll`, `os.WriteFile`, `os.Stat`; keep it).
|
||||
|
||||
- [ ] **Step 2: Update handler.go — single-file path**
|
||||
|
||||
In `IngestPath`, the single-file branch reads:
|
||||
|
||||
```go
|
||||
content, readErr := os.ReadFile(req.Path)
|
||||
if readErr != nil {
|
||||
writeError(w, http.StatusInternalServerError, fmt.Sprintf("read file: %v", readErr))
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
Replace with:
|
||||
|
||||
```go
|
||||
content, readErr := extract.Text(req.Path)
|
||||
if readErr != nil {
|
||||
writeError(w, http.StatusInternalServerError, fmt.Sprintf("extract text: %v", readErr))
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Update handler.go — directory walk branch**
|
||||
|
||||
In `IngestPath`, the directory walk reads:
|
||||
|
||||
```go
|
||||
content, readErr := os.ReadFile(path)
|
||||
if readErr != nil {
|
||||
allWarnings = append(allWarnings, fmt.Sprintf("read %s: %v", path, readErr))
|
||||
return nil
|
||||
}
|
||||
source := req.Source
|
||||
if source == "" {
|
||||
source = filepath.Base(path)
|
||||
}
|
||||
result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, string(content), source, req.DryRun)
|
||||
```
|
||||
|
||||
Replace with:
|
||||
|
||||
```go
|
||||
content, readErr := extract.Text(path)
|
||||
if readErr != nil {
|
||||
allWarnings = append(allWarnings, fmt.Sprintf("extract %s: %v", path, readErr))
|
||||
return nil
|
||||
}
|
||||
source := req.Source
|
||||
if source == "" {
|
||||
source = filepath.Base(path)
|
||||
}
|
||||
result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, content, source, req.DryRun)
|
||||
```
|
||||
|
||||
Add import: `"github.com/mathiasbq/hyperguild/ingestion/internal/extract"` to handler.go.
|
||||
|
||||
- [ ] **Step 4: Build to verify no compile errors**
|
||||
|
||||
```bash
|
||||
cd ingestion && go build ./...
|
||||
```
|
||||
Expected: success, no errors.
|
||||
|
||||
- [ ] **Step 5: Run all tests**
|
||||
|
||||
```bash
|
||||
cd ingestion && go test ./...
|
||||
```
|
||||
Expected: PASS — all tests pass (watcher tests use .md files, already covered by extract passthrough).
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
cd ingestion && git add internal/watcher/watcher.go internal/api/handler.go
|
||||
git commit -m "feat(watcher,api): use extract.Text() for file reading — fixes PDF ingestion"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 7: Add poppler-utils to Dockerfile
|
||||
|
||||
**Files:**
|
||||
- Modify: `ingestion/Dockerfile`
|
||||
|
||||
- [ ] **Step 1: Add apk install for poppler-utils**
|
||||
|
||||
In `ingestion/Dockerfile`, add `poppler-utils` to the Alpine runtime stage. The current final stage is:
|
||||
|
||||
```dockerfile
|
||||
FROM alpine:3.21
|
||||
|
||||
COPY --from=builder /out/ingestion /usr/local/bin/ingestion
|
||||
|
||||
RUN addgroup -S ingestion && adduser -S -G ingestion ingestion
|
||||
```
|
||||
|
||||
Replace with:
|
||||
|
||||
```dockerfile
|
||||
FROM alpine:3.21
|
||||
|
||||
RUN apk add --no-cache poppler-utils
|
||||
|
||||
COPY --from=builder /out/ingestion /usr/local/bin/ingestion
|
||||
|
||||
RUN addgroup -S ingestion && adduser -S -G ingestion ingestion
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify Dockerfile builds (local Docker)**
|
||||
|
||||
```bash
|
||||
cd ingestion && docker build -t ingestion:test .
|
||||
```
|
||||
Expected: image builds successfully; `pdftotext` is available inside.
|
||||
|
||||
- [ ] **Step 3: Verify pdftotext is accessible in the image**
|
||||
|
||||
```bash
|
||||
docker run --rm ingestion:test pdftotext -v
|
||||
```
|
||||
Expected: prints version string like `pdftotext version 24.x.x`.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
cd ingestion && git add Dockerfile
|
||||
git commit -m "chore(docker): add poppler-utils for PDF text extraction"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Self-Review
|
||||
|
||||
**Spec coverage check:**
|
||||
|
||||
| Requirement | Task |
|
||||
|---|---|
|
||||
| PDF extraction via pdftotext | Tasks 2, 6, 7 |
|
||||
| .md and .txt passthrough (no regression) | Task 1 |
|
||||
| Unsupported extension → clear error | Task 1 |
|
||||
| Entry.Aliases loaded from frontmatter | Task 3 |
|
||||
| Fuzzy normalization (case, articles, hyphens) | Task 4 |
|
||||
| Alias matching | Task 4 |
|
||||
| Title matching across different proposed slugs | Task 4 |
|
||||
| Cross-page-type isolation (concept ≠ entity) | Task 4 |
|
||||
| Resolve wired into pipeline.Run | Task 5 |
|
||||
| extract.Text wired into watcher | Task 6 |
|
||||
| extract.Text wired into handler (single + dir) | Task 6 |
|
||||
| Dockerfile includes poppler-utils | Task 7 |
|
||||
|
||||
**Placeholder scan:** None found.
|
||||
|
||||
**Type consistency:**
|
||||
- `Resolve([]wiki.Page, map[wiki.PageType][]wiki.Entry) []wiki.Page` — consistent across Tasks 4 and 5.
|
||||
- `extract.Text(path string) (string, error)` — consistent across Tasks 1, 2, and 6.
|
||||
- `Entry.Aliases []string` — added in Task 3, used by Resolve in Task 4 (reads `e.Aliases`).
|
||||
- `readFrontmatter` replaces `readTitle` entirely in Task 3 — no lingering `readTitle` calls.
|
||||
1073
docs/superpowers/plans/2026-04-22-phase4-attempt-wiring.md
Normal file
1073
docs/superpowers/plans/2026-04-22-phase4-attempt-wiring.md
Normal file
File diff suppressed because it is too large
Load Diff
1323
docs/superpowers/plans/2026-04-23-level3-slug-authority.md
Normal file
1323
docs/superpowers/plans/2026-04-23-level3-slug-authority.md
Normal file
File diff suppressed because it is too large
Load Diff
433
docs/superpowers/plans/2026-04-23-source-backrefs.md
Normal file
433
docs/superpowers/plans/2026-04-23-source-backrefs.md
Normal file
@@ -0,0 +1,433 @@
|
||||
# Source Back-References Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** After the LLM produces wiki pages for an ingestion, automatically inject a `## Sources` back-reference on every concept and entity page that the source page links to.
|
||||
|
||||
**Architecture:** A new `injectSourceRefs` post-processing step is inserted between `Resolve` and `mergeAll` in `pipeline.Run`. It finds the source page in the proposed batch, extracts all `[[slug|...]]` wikilinks, then calls `wiki.Merge` with a minimal patch page to add the back-reference. `wiki.Merge` already treats `## Sources` as a bullet section with deduplication — no custom section parsing is needed. For concepts/entities that exist on disk but weren't proposed in the current batch (the common case on re-ingestion), the function loads them from disk and adds them to the pages list so they are updated.
|
||||
|
||||
**Tech Stack:** Go stdlib (`regexp`, `os`, `path/filepath`, `strings`), existing `wiki.Merge` and `wiki.Page` types.
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
**New files:**
|
||||
- `ingestion/internal/pipeline/refs.go` — `injectSourceRefs`, `addSourceRef`, `extractWikilinks`, `findSourcePage`, `findInInventory`
|
||||
- `ingestion/internal/pipeline/refs_test.go` — table-driven tests
|
||||
|
||||
**Modified files:**
|
||||
- `ingestion/internal/pipeline/pipeline.go` — insert `injectSourceRefs` call between `Resolve` and `mergeAll`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: `refs.go` — source back-reference injection
|
||||
|
||||
**Files:**
|
||||
- Create: `ingestion/internal/pipeline/refs_test.go`
|
||||
- Create: `ingestion/internal/pipeline/refs.go`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
```go
|
||||
// ingestion/internal/pipeline/refs_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
// makeInventory builds a minimal inventory for test use.
|
||||
func makeInventory(concepts, entities []string) map[wiki.PageType][]wiki.Entry {
|
||||
inv := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {},
|
||||
wiki.PageTypeEntity: {},
|
||||
wiki.PageTypeSource: {},
|
||||
}
|
||||
for _, slug := range concepts {
|
||||
inv[wiki.PageTypeConcept] = append(inv[wiki.PageTypeConcept], wiki.Entry{Slug: slug, Title: slug})
|
||||
}
|
||||
for _, slug := range entities {
|
||||
inv[wiki.PageTypeEntity] = append(inv[wiki.PageTypeEntity], wiki.Entry{Slug: slug, Title: slug})
|
||||
}
|
||||
return inv
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_NoSourcePage(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFoo.\n"},
|
||||
}
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
assert.Equal(t, pages, got)
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_InjectsIntoProposedConcept(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[domain-driven-design|Domain Driven Design]].\n",
|
||||
},
|
||||
{
|
||||
Path: "wiki/concepts/domain-driven-design.md",
|
||||
Content: "---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA methodology.\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
|
||||
require.Len(t, got, 2)
|
||||
assert.Contains(t, got[1].Content, "## Sources")
|
||||
assert.Contains(t, got[1].Content, "[[my-article|My Article]]")
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_LoadsConceptFromDisk(t *testing.T) {
|
||||
brainDir := t.TempDir()
|
||||
conceptDir := filepath.Join(brainDir, "wiki", "concepts")
|
||||
require.NoError(t, os.MkdirAll(conceptDir, 0o755))
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(conceptDir, "shape-up.md"),
|
||||
[]byte("---\ntitle: Shape Up\n---\n\n## Definition\n\nA methodology.\n"),
|
||||
0o644,
|
||||
))
|
||||
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[shape-up|Shape Up]].\n",
|
||||
},
|
||||
}
|
||||
inv := makeInventory([]string{"shape-up"}, nil)
|
||||
|
||||
got := injectSourceRefs(pages, inv, brainDir)
|
||||
|
||||
// Should have loaded shape-up.md from disk and added it with source ref.
|
||||
require.Len(t, got, 2)
|
||||
var conceptPage wiki.Page
|
||||
for _, p := range got {
|
||||
if p.Path == "wiki/concepts/shape-up.md" {
|
||||
conceptPage = p
|
||||
}
|
||||
}
|
||||
assert.Contains(t, conceptPage.Content, "## Sources")
|
||||
assert.Contains(t, conceptPage.Content, "[[my-article|My Article]]")
|
||||
// Original content preserved.
|
||||
assert.Contains(t, conceptPage.Content, "## Definition")
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_NoSelfReference(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSelf-link [[my-article|My Article]].\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
|
||||
// Only one page — source should not reference itself.
|
||||
assert.Len(t, got, 1)
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_DeduplicatesOnReingestion(t *testing.T) {
|
||||
// Concept already has source ref from a prior ingestion.
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[ddd|DDD]].\n",
|
||||
},
|
||||
{
|
||||
Path: "wiki/concepts/ddd.md",
|
||||
Content: "---\ntitle: DDD\n---\n\n## Definition\n\nA thing.\n\n## Sources\n\n- [[my-article|My Article]]\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
|
||||
require.Len(t, got, 2)
|
||||
// The source ref must appear exactly once.
|
||||
count := 0
|
||||
for _, line := range splitLines(got[1].Content) {
|
||||
if line == "- [[my-article|My Article]]" {
|
||||
count++
|
||||
}
|
||||
}
|
||||
assert.Equal(t, 1, count, "source ref should appear exactly once")
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_InjectsIntoEntity(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/book.md",
|
||||
Content: "---\ntitle: Book\n---\n\n## Summary\n\nBy [[ryan-singer|Ryan Singer]].\n",
|
||||
},
|
||||
{
|
||||
Path: "wiki/entities/ryan-singer.md",
|
||||
Content: "---\ntitle: Ryan Singer\n---\n\n## Description\n\nA designer.\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
|
||||
require.Len(t, got, 2)
|
||||
var entity wiki.Page
|
||||
for _, p := range got {
|
||||
if p.Path == "wiki/entities/ryan-singer.md" {
|
||||
entity = p
|
||||
}
|
||||
}
|
||||
assert.Contains(t, entity.Content, "[[book|Book]]")
|
||||
}
|
||||
|
||||
func TestExtractWikilinks(t *testing.T) {
|
||||
content := "See [[foo|Foo]] and [[bar|Bar]] and [[foo|Foo again]]."
|
||||
got := extractWikilinks(content)
|
||||
assert.True(t, got["foo"])
|
||||
assert.True(t, got["bar"])
|
||||
assert.Len(t, got, 2, "duplicate slugs should be deduplicated")
|
||||
}
|
||||
|
||||
// splitLines is a test helper.
|
||||
func splitLines(s string) []string {
|
||||
var out []string
|
||||
for _, l := range splitNewlines(s) {
|
||||
if l != "" {
|
||||
out = append(out, l)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func splitNewlines(s string) []string {
|
||||
var lines []string
|
||||
start := 0
|
||||
for i, c := range s {
|
||||
if c == '\n' {
|
||||
lines = append(lines, s[start:i])
|
||||
start = i + 1
|
||||
}
|
||||
}
|
||||
lines = append(lines, s[start:])
|
||||
return lines
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify they fail**
|
||||
|
||||
```bash
|
||||
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -run "TestInjectSourceRefs|TestExtractWikilinks" -v
|
||||
```
|
||||
Expected: compile error — `injectSourceRefs` and `extractWikilinks` not defined.
|
||||
|
||||
- [ ] **Step 3: Implement refs.go**
|
||||
|
||||
```go
|
||||
// ingestion/internal/pipeline/refs.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"regexp"
|
||||
"strings"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
var wikilinkRE = regexp.MustCompile(`\[\[([^|\]]+)\|`)
|
||||
|
||||
// injectSourceRefs finds the source page in the proposed batch, extracts its wikilinks,
|
||||
// and injects a back-reference into every linked concept or entity page.
|
||||
// Pages that exist on disk but are not in the current batch are loaded and appended
|
||||
// so they will be updated on write.
|
||||
func injectSourceRefs(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry, brainDir string) []wiki.Page {
|
||||
sourceSlug, sourceTitle, found := findSourcePage(pages)
|
||||
if !found {
|
||||
return pages
|
||||
}
|
||||
|
||||
// Locate source page content for wikilink extraction.
|
||||
var sourceContent string
|
||||
for _, p := range pages {
|
||||
if strings.HasPrefix(p.Path, "wiki/sources/") &&
|
||||
strings.TrimSuffix(filepath.Base(p.Path), ".md") == sourceSlug {
|
||||
sourceContent = p.Content
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
linkedSlugs := extractWikilinks(sourceContent)
|
||||
sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
|
||||
|
||||
// Build slug → index map for proposed pages (excluding wiki/sources/).
|
||||
bySlug := make(map[string]int, len(pages))
|
||||
for i, p := range pages {
|
||||
if !strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||
bySlug[strings.TrimSuffix(filepath.Base(p.Path), ".md")] = i
|
||||
}
|
||||
}
|
||||
|
||||
for slug := range linkedSlugs {
|
||||
if slug == sourceSlug {
|
||||
continue // no self-reference
|
||||
}
|
||||
|
||||
if idx, ok := bySlug[slug]; ok {
|
||||
// Concept/entity is in the proposed batch — inject inline.
|
||||
pages[idx] = addSourceRef(pages[idx], sourceRef)
|
||||
continue
|
||||
}
|
||||
|
||||
// Not in proposed batch — look for it in the inventory (exists on disk).
|
||||
pt, ok := findInInventory(slug, inventory)
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
diskPath := filepath.Join(brainDir, "wiki", string(pt), slug+".md")
|
||||
b, err := os.ReadFile(diskPath)
|
||||
if err != nil {
|
||||
continue // page not found on disk; skip
|
||||
}
|
||||
page := wiki.Page{
|
||||
Path: "wiki/" + string(pt) + "/" + slug + ".md",
|
||||
Content: string(b),
|
||||
}
|
||||
pages = append(pages, addSourceRef(page, sourceRef))
|
||||
}
|
||||
|
||||
return pages
|
||||
}
|
||||
|
||||
// addSourceRef injects sourceRef into the ## Sources bullet section of page.
|
||||
// Uses wiki.Merge so that existing Sources entries are deduplicated and all
|
||||
// other sections are preserved unchanged.
|
||||
func addSourceRef(page wiki.Page, sourceRef string) wiki.Page {
|
||||
patch := wiki.Page{
|
||||
Path: page.Path,
|
||||
Content: "\n## Sources\n\n" + sourceRef + "\n",
|
||||
}
|
||||
return wiki.Merge(page, patch)
|
||||
}
|
||||
|
||||
// extractWikilinks returns the set of slugs referenced as [[slug|...]] in content.
|
||||
func extractWikilinks(content string) map[string]bool {
|
||||
slugs := make(map[string]bool)
|
||||
for _, m := range wikilinkRE.FindAllStringSubmatch(content, -1) {
|
||||
slugs[m[1]] = true
|
||||
}
|
||||
return slugs
|
||||
}
|
||||
|
||||
// findSourcePage returns the slug and title of the first wiki/sources/ page in pages.
|
||||
func findSourcePage(pages []wiki.Page) (slug, title string, found bool) {
|
||||
for _, p := range pages {
|
||||
if strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||
slug = strings.TrimSuffix(filepath.Base(p.Path), ".md")
|
||||
title = extractTitle(p.Content)
|
||||
if title == "" {
|
||||
title = slug
|
||||
}
|
||||
return slug, title, true
|
||||
}
|
||||
}
|
||||
return "", "", false
|
||||
}
|
||||
|
||||
// findInInventory returns the PageType for a slug if it appears in the inventory.
|
||||
func findInInventory(slug string, inventory map[wiki.PageType][]wiki.Entry) (wiki.PageType, bool) {
|
||||
for pt, entries := range inventory {
|
||||
for _, e := range entries {
|
||||
if e.Slug == slug {
|
||||
return pt, true
|
||||
}
|
||||
}
|
||||
}
|
||||
return "", false
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run all pipeline tests**
|
||||
|
||||
```bash
|
||||
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -v
|
||||
```
|
||||
Expected: all existing tests PASS + 7 new refs tests PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs && git add ingestion/internal/pipeline/refs.go ingestion/internal/pipeline/refs_test.go && git commit -m "feat(pipeline): inject source back-references into concept and entity pages"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Wire injectSourceRefs into pipeline.Run
|
||||
|
||||
**Files:**
|
||||
- Modify: `ingestion/internal/pipeline/pipeline.go`
|
||||
|
||||
- [ ] **Step 1: Insert the call**
|
||||
|
||||
In `pipeline.go`, locate:
|
||||
|
||||
```go
|
||||
resolved := Resolve(allPages, inventory)
|
||||
merged := mergeAll(resolved)
|
||||
```
|
||||
|
||||
Replace with:
|
||||
|
||||
```go
|
||||
resolved := Resolve(allPages, inventory)
|
||||
withRefs := injectSourceRefs(resolved, inventory, brainDir)
|
||||
merged := mergeAll(withRefs)
|
||||
```
|
||||
|
||||
No import changes needed — same package.
|
||||
|
||||
- [ ] **Step 2: Run all pipeline tests**
|
||||
|
||||
```bash
|
||||
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -v
|
||||
```
|
||||
Expected: all tests PASS. The existing `TestRun_WritesPages` and `TestRun_DryRunDoesNotWrite` use LLM mocks that return source pages with no wikilinks to concepts — `injectSourceRefs` is a no-op for them.
|
||||
|
||||
- [ ] **Step 3: Run full test suite + lint**
|
||||
|
||||
```bash
|
||||
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./... && golangci-lint run ./...
|
||||
```
|
||||
Expected: all packages PASS, 0 lint issues.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs && git add ingestion/internal/pipeline/pipeline.go && git commit -m "feat(pipeline): wire source back-reference injection into Run"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Self-Review
|
||||
|
||||
**Spec coverage:**
|
||||
|
||||
| Requirement | Task |
|
||||
|---|---|
|
||||
| Concepts get `## Sources` back-link to ingested source | Task 1 |
|
||||
| Entities get `## Sources` back-link | Task 1 (TestInjectSourceRefs_InjectsIntoEntity) |
|
||||
| Existing pages on disk get updated with new source | Task 1 (TestInjectSourceRefs_LoadsConceptFromDisk) |
|
||||
| Re-ingestion of same source does not duplicate the ref | Task 1 (TestInjectSourceRefs_DeduplicatesOnReingestion) |
|
||||
| Source page does not reference itself | Task 1 (TestInjectSourceRefs_NoSelfReference) |
|
||||
| No-op when batch has no source page | Task 1 (TestInjectSourceRefs_NoSourcePage) |
|
||||
| Wired into Run between Resolve and mergeAll | Task 2 |
|
||||
| Full test suite and lint pass | Task 2 Step 3 |
|
||||
|
||||
**Placeholder scan:** None.
|
||||
|
||||
**Type consistency:** `injectSourceRefs([]wiki.Page, map[wiki.PageType][]wiki.Entry, string) []wiki.Page` — used identically in refs.go (definition) and pipeline.go (call site).
|
||||
218
docs/superpowers/specs/2026-04-20-cd-pipeline-design.md
Normal file
218
docs/superpowers/specs/2026-04-20-cd-pipeline-design.md
Normal file
@@ -0,0 +1,218 @@
|
||||
# CD Pipeline Design
|
||||
|
||||
**Date:** 2026-04-20
|
||||
**Status:** Approved for implementation
|
||||
|
||||
## Problem statement
|
||||
|
||||
The supervisor (and future services on the koala k3s cluster) have no automated deployment path after CI passes. Images are not built, the cluster is updated manually, and there is no audit trail for what is running where.
|
||||
|
||||
## Goal
|
||||
|
||||
After a push to `main` passes CI, automatically build a container image, push it to the Gitea registry, and update the cluster via GitOps — with a design that scales to many repos and services without per-repo kubeconfig or secret sprawl.
|
||||
|
||||
## Success criteria
|
||||
|
||||
- [ ] Successful `main` push triggers image build and push to `gitea.d-ma.be/<org>/<repo>:<git-sha>`
|
||||
- [ ] Infra repo receives a commit updating the image tag for the deployed service
|
||||
- [ ] Flux reconciles within 60s of the infra repo commit; pod runs the new image
|
||||
- [ ] Rollback = one commit to infra repo reverting the tag
|
||||
- [ ] Secrets (app secrets, registry pull) are SOPS-encrypted in infra repo; no manual `kubectl create secret`
|
||||
- [ ] Adding a new service requires only: adding `apps/<service>/` to infra repo + `cd.yml` to the app repo
|
||||
- [ ] Zero changes to the k3s cluster networking or runner configuration
|
||||
|
||||
## Constraints
|
||||
|
||||
- Gitea Actions self-hosted runner runs as a **systemd host process** on koala — not a k8s pod; cannot use cluster DNS
|
||||
- k3s uses containerd; no Docker daemon, no nerdctl on koala
|
||||
- Flux is already running (core controllers only); image-reflector/image-automation are NOT installed and will NOT be added
|
||||
- SOPS + age is the secret management standard; no plaintext Secrets in git
|
||||
- All org-level Gitea secrets are shared across repos — minimize the set
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Multi-cluster promotion (koala only for now; infra repo structure supports adding clusters later)
|
||||
- Automated rollback on health check failure (manual rollback via infra repo commit)
|
||||
- Build caching beyond BuildKit's local disk cache
|
||||
- PR preview environments
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
App repo (supervisor, n8n, etc.)
|
||||
↓ push to main
|
||||
Gitea Actions — ci.yml (lint + test)
|
||||
↓ passes
|
||||
Gitea Actions — cd.yml
|
||||
├─ 1. buildctl → BuildKit (unix socket on koala host)
|
||||
│ → pushes gitea.d-ma.be/<org>/<repo>:<git-sha>
|
||||
├─ 2. Clone infra repo (SSH deploy key)
|
||||
│ → patch apps/<service>/deployment.yaml IMAGE_TAG → <git-sha>
|
||||
│ → git commit + push
|
||||
└─ done
|
||||
|
||||
gitea.d-ma.be/mathias/infra (Flux source)
|
||||
↓ Flux source-controller detects new commit (30s interval)
|
||||
kustomize-controller
|
||||
└─ applies apps/<service>/kustomization.yaml → k3s namespace
|
||||
↓
|
||||
pod runs new image (pulls from gitea.d-ma.be with imagePullSecret)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Components
|
||||
|
||||
### 1. BuildKit — systemd service on koala
|
||||
|
||||
BuildKit runs as a rootless systemd service on the koala host, identical to the Gitea runner pattern already in use.
|
||||
|
||||
- Socket: `unix:///run/user/<uid>/buildkit/buildkitd.sock` (rootless) or `/run/buildkit/buildkitd.sock` (root)
|
||||
- Cache: local disk at default BuildKit cache path — persists across builds
|
||||
- Access: `buildctl --addr unix:///run/buildkit/buildkitd.sock` from the runner process (same host, same user)
|
||||
- No k3s involvement for builds
|
||||
|
||||
### 2. Gitea Actions — `cd.yml`
|
||||
|
||||
Separate workflow file; triggers on `main` push after `ci.yml` succeeds.
|
||||
|
||||
```yaml
|
||||
name: cd
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
needs: [ci] # or workflow_run trigger — see implementation plan
|
||||
runs-on: [self-hosted, koala]
|
||||
env:
|
||||
IMAGE: gitea.d-ma.be/${{ github.repository }}:${{ github.sha }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Build and push
|
||||
run: |
|
||||
buildctl --addr unix:///run/buildkit/buildkitd.sock \
|
||||
build \
|
||||
--frontend dockerfile.v0 \
|
||||
--local context=. \
|
||||
--local dockerfile=. \
|
||||
--output type=image,name=$IMAGE,push=true
|
||||
env:
|
||||
BUILDKIT_HOST: unix:///run/buildkit/buildkitd.sock
|
||||
- name: Update infra repo
|
||||
run: |
|
||||
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra
|
||||
cd /tmp/infra
|
||||
sed -i "s|IMAGE_TAG|${{ github.sha }}|g" apps/${{ env.SERVICE_NAME }}/deployment.yaml
|
||||
git config user.email "cd-bot@d-ma.be"
|
||||
git config user.name "CD Bot"
|
||||
git add apps/${{ env.SERVICE_NAME }}/deployment.yaml
|
||||
git commit -m "chore(deploy): ${{ env.SERVICE_NAME }} → ${{ github.sha }}"
|
||||
git push
|
||||
env:
|
||||
GIT_SSH_COMMAND: ssh -i /tmp/infra-deploy-key -o StrictHostKeyChecking=no
|
||||
```
|
||||
|
||||
`SERVICE_NAME` is set per-repo (either hardcoded in `cd.yml` or derived from the repo name).
|
||||
|
||||
### 3. Org-level Gitea secrets
|
||||
|
||||
Three secrets, set once, inherited by all repos:
|
||||
|
||||
| Secret | Purpose |
|
||||
|--------|---------|
|
||||
| `BUILDKIT_REGISTRY_AUTH` | credentials for pushing to `gitea.d-ma.be` (buildctl `--opt` or `~/.docker/config.json`) |
|
||||
| `INFRA_DEPLOY_KEY` | SSH private key with write access to `gitea.d-ma.be/mathias/infra` |
|
||||
| `KUBECONFIG_KOALA` | (optional) kubeconfig for manual `kubectl` steps if ever needed; scoped ServiceAccount |
|
||||
|
||||
### 4. Infra repo structure
|
||||
|
||||
```
|
||||
gitea.d-ma.be/mathias/infra
|
||||
├── clusters/
|
||||
│ └── koala/
|
||||
│ └── kustomization.yaml # points at ../../apps/*/
|
||||
├── apps/
|
||||
│ ├── supervisor/
|
||||
│ │ ├── namespace.yaml
|
||||
│ │ ├── deployment.yaml # image: gitea.d-ma.be/mathias/supervisor:IMAGE_TAG
|
||||
│ │ ├── service.yaml
|
||||
│ │ ├── secrets.enc.yaml # SOPS-encrypted app secrets (ANTHROPIC_API_KEY, etc.)
|
||||
│ │ └── kustomization.yaml
|
||||
│ ├── n8n/
|
||||
│ │ └── ...
|
||||
│ └── imagepullsecret/
|
||||
│ └── secret.enc.yaml # SOPS-encrypted imagePullSecret for gitea.d-ma.be
|
||||
└── flux-system/ # existing Flux bootstrap manifests
|
||||
```
|
||||
|
||||
Adding a new service = add `apps/<service>/` directory. The `clusters/koala/kustomization.yaml` uses a glob or explicit list.
|
||||
|
||||
### 5. SOPS + age for Flux
|
||||
|
||||
Flux decrypts SOPS-encrypted files at apply time using an age key stored as a k8s Secret in the `flux-system` namespace. Setup:
|
||||
|
||||
1. Generate age keypair: `age-keygen`
|
||||
2. Store private key: `kubectl create secret generic sops-age --from-file=age.agekey -n flux-system`
|
||||
3. Configure Flux Kustomization with `decryption.provider: sops`
|
||||
4. Encrypt secrets before committing: `sops --encrypt --age <pubkey> secret.yaml > secret.enc.yaml`
|
||||
|
||||
App secrets (e.g., `ANTHROPIC_API_KEY`) and the registry pull secret live as encrypted files in `apps/<service>/` and `apps/imagepullsecret/` respectively.
|
||||
|
||||
### 6. Image pull secret
|
||||
|
||||
Each app namespace needs a `kubernetes.io/dockerconfigjson` Secret to pull from `gitea.d-ma.be`. This Secret is SOPS-encrypted in `apps/imagepullsecret/` and applied to each app namespace via Kustomize `namespace` field or a shared Kustomize component.
|
||||
|
||||
---
|
||||
|
||||
## Data flow: supervisor deploy
|
||||
|
||||
1. Push to `supervisor` main → CI passes (lint/test/vet)
|
||||
2. CD job builds image: `gitea.d-ma.be/mathias/supervisor:abc1234`
|
||||
3. CD job clones infra repo, patches `apps/supervisor/deployment.yaml`, commits
|
||||
4. Flux source-controller detects infra commit within 30s
|
||||
5. kustomize-controller applies `apps/supervisor/kustomization.yaml`
|
||||
6. Flux decrypts `secrets.enc.yaml` → k8s Secret in `supervisor` namespace
|
||||
7. k3s pulls `gitea.d-ma.be/mathias/supervisor:abc1234` using imagePullSecret
|
||||
8. Pod starts with new image; previous pod terminates
|
||||
|
||||
Rollback: `git revert <tag-commit>` in infra repo → Flux reconciles → old image deployed.
|
||||
|
||||
---
|
||||
|
||||
## Error handling
|
||||
|
||||
| Scenario | Behaviour |
|
||||
|----------|-----------|
|
||||
| CI fails | `cd.yml` does not run (`needs: ci` gate) |
|
||||
| BuildKit unreachable | `buildctl` exits non-zero → workflow fails; infra repo untouched |
|
||||
| Image push fails | Workflow fails; infra repo untouched; cluster unchanged |
|
||||
| Infra repo push conflict | Retry once with rebase; fail and alert if still conflicting |
|
||||
| Flux reconcile error | Notification-controller fires alert; pods stay on previous image |
|
||||
| Pod image pull fails | `ImagePullBackOff`; Flux reports degraded Kustomization |
|
||||
| SOPS decrypt fails | Kustomization fails; Flux reports error; no partial apply |
|
||||
|
||||
---
|
||||
|
||||
## Testing approach
|
||||
|
||||
1. **BuildKit smoke test** — `buildctl build` with a trivial one-line Dockerfile; verify image appears in Gitea registry
|
||||
2. **cd.yml dry run** — trigger manually on a test branch; verify infra repo commit contains correct sha
|
||||
3. **Flux reconcile test** — push infra commit; verify `flux get kustomizations` shows `Ready` and pod runs new image sha
|
||||
4. **Pull secret test** — delete pod, verify it restarts and pulls from Gitea registry without `ImagePullBackOff`
|
||||
5. **SOPS round-trip test** — encrypt a dummy secret, push to infra repo, verify Flux decrypts and `kubectl get secret` shows correct data
|
||||
|
||||
---
|
||||
|
||||
## Risks
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| BuildKit socket path varies by user/rootless mode | Confirm path during setup; hardcode in `cd.yml` |
|
||||
| Infra repo concurrent pushes (multiple repos deploying simultaneously) | Git rebase retry handles this; unlikely at current scale |
|
||||
| age private key lost | Back up to SOPS-accessible location; document recovery procedure |
|
||||
| Registry storage fills up | Set Gitea registry tag retention policy (keep last 20 per repo) |
|
||||
| Gitea deploy key compromised | Rotate via Gitea UI; single key for infra repo only |
|
||||
322
docs/superpowers/specs/2026-04-20-model-orchestration-design.md
Normal file
322
docs/superpowers/specs/2026-04-20-model-orchestration-design.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# Model Orchestration Design
|
||||
|
||||
**Date:** 2026-04-20
|
||||
**Status:** Approved for implementation
|
||||
|
||||
## Problem statement
|
||||
|
||||
The hyperguild supervisor currently spawns a `claude --print` subprocess for every skill call. The model routing config (`models.yaml`) exists but is dead weight — the model name is injected as text into the task prompt and ignored. Every skill call costs Claude tokens regardless of task complexity or data sensitivity.
|
||||
|
||||
## Goal
|
||||
|
||||
Route skill work to the most appropriate model — weighing cost, latency, and quality — with Claude acting as the real supervisor: verifying outputs and deciding when to escalate. Local models on owned hardware handle the common case; Claude escalates through a chain to frontier models only when local quality is insufficient.
|
||||
|
||||
## Success criteria
|
||||
|
||||
- [ ] Each skill dispatches generation to its configured local model via LiteLLM by default
|
||||
- [ ] Claude verifies every local output and either accepts or escalates
|
||||
- [ ] Escalation walks a per-skill chain (local small → local large → Sonnet → Opus) with one attempt per tier
|
||||
- [ ] Every attempt (model, tier, duration, warm state, verdict) is logged in the session JSONL
|
||||
- [ ] Cloud tiers (Sonnet/Opus) self-certify — no separate verifier call
|
||||
- [ ] Zero changes to skill handlers — they call `ExecutorFn` exactly as today
|
||||
- [ ] `LiteLTMBaseURL` already in config; no new env vars required beyond `LLAMA_SWAP_URL`
|
||||
|
||||
## Constraints
|
||||
|
||||
- One attempt per tier before escalating (no retry within a tier)
|
||||
- Anthropic T&C: Claude is called normally via Anthropic API; local models are called directly via LiteLLM HTTP — no API redirection
|
||||
- `models.yaml` remains the single routing config file
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Auto-rerouting based on real-time warm state (logged, not acted on — Phase 4)
|
||||
- Multi-tenant / public service exposure
|
||||
- RAG/CAG model boosting
|
||||
- Managed Agent cloud delegation (chain stub only in Phase 3)
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
MCP tool call (Claude Code)
|
||||
↓
|
||||
Skill handler — calls ExecutorFn (unchanged)
|
||||
↓
|
||||
Orchestrator.Run (implements ExecutorFn)
|
||||
├─ Resolve chain from models.yaml
|
||||
├─ For each model in chain:
|
||||
│ ├─ [ollama/*] → LiteLLM executor → generate
|
||||
│ │ ↓
|
||||
│ │ Claude verifier (task + output + discipline)
|
||||
│ │ ├─ accept → return Result (log attempt)
|
||||
│ │ └─ escalate → next tier (log attempt)
|
||||
│ │
|
||||
│ └─ [claude-*] → Claude executor (current) → generate + self-certify
|
||||
│ └─ return Result (log attempt)
|
||||
│
|
||||
└─ All tiers exhausted → return best attempt with escalation note
|
||||
```
|
||||
|
||||
Claude is always the verifier for local tiers. At cloud tiers, Claude generates and self-certifies — the verifier call is skipped.
|
||||
|
||||
---
|
||||
|
||||
## Components
|
||||
|
||||
### 1. `internal/exec/litellm.go` — LiteLLM executor
|
||||
|
||||
Calls `POST /v1/chat/completions` on the configured LiteLLM server. Implements the same `ExecutorFn` signature as the existing claude executor.
|
||||
|
||||
```go
|
||||
type LiteLLMExecutor struct {
|
||||
BaseURL string
|
||||
APIKey string
|
||||
HTTPClient *http.Client
|
||||
Timeout time.Duration
|
||||
}
|
||||
|
||||
func NewLiteLLM(baseURL, apiKey string, timeout time.Duration) *LiteLLMExecutor
|
||||
|
||||
func (e *LiteLLMExecutor) Run(ctx context.Context, req Request) (Result, error)
|
||||
```
|
||||
|
||||
Request mapping:
|
||||
- `req.SkillPrompt` → system message
|
||||
- `req.TaskPrompt` → user message
|
||||
- `req.Model` → `model` field in the chat completions request
|
||||
|
||||
Response handling: local models are prompted (via the discipline file output contract) to return a JSON object matching the `Result` schema. The executor attempts `json.Unmarshal` into `Result` directly — no envelope unwrapping needed (unlike the `--output-format json` claude envelope). If unmarshalling fails, the executor returns an error that the orchestrator treats as an automatic escalation trigger.
|
||||
|
||||
### 2. `internal/exec/verifier.go` — Claude verifier
|
||||
|
||||
A focused Claude call that judges local model output. Uses the existing `Executor` (claude subprocess) internally.
|
||||
|
||||
```go
|
||||
type Verdict struct {
|
||||
Accept bool `json:"accept"`
|
||||
Feedback string `json:"feedback"` // reason if not accepting; empty if accept
|
||||
}
|
||||
|
||||
type Verifier struct {
|
||||
executor *Executor // the existing claude executor
|
||||
}
|
||||
|
||||
func NewVerifier(executor *Executor) *Verifier
|
||||
|
||||
func (v *Verifier) Verify(ctx context.Context, skillPrompt, taskPrompt string, output Result) (Verdict, error)
|
||||
```
|
||||
|
||||
The verifier prompt gives Claude:
|
||||
1. The skill discipline file (so it knows the iron laws and output contract)
|
||||
2. The original task prompt (informed verification — Claude sees what was asked)
|
||||
3. The generated output
|
||||
4. A short instruction: "Does this output satisfy the discipline's iron laws and output contract? Reply with JSON: `{\"accept\": true|false, \"feedback\": \"...\"}`"
|
||||
|
||||
The verifier uses a lightweight JSON schema for its own output (a `Verdict` schema), keeping the call fast.
|
||||
|
||||
### 3. `internal/exec/orchestrator.go` — chain walker
|
||||
|
||||
Implements `ExecutorFn`. Walks the escalation chain, delegating generation and verification per tier.
|
||||
|
||||
```go
|
||||
type Chain []ChainEntry
|
||||
|
||||
type ChainEntry struct {
|
||||
Model string // e.g. "ollama/phi4", "claude-sonnet-4-5"
|
||||
Tier string // "local" | "subagent" | "managed"
|
||||
IsCloud bool // true for claude-* models; skips verifier
|
||||
}
|
||||
|
||||
type Orchestrator struct {
|
||||
chain Chain
|
||||
litellm *LiteLLMExecutor
|
||||
claude *Executor
|
||||
verifier *Verifier
|
||||
llamaSwapURL string // for warm-state probe
|
||||
}
|
||||
|
||||
func NewOrchestrator(chain Chain, litellm *LiteLLMExecutor, claude *Executor, verifier *Verifier, llamaSwapURL string) *Orchestrator
|
||||
|
||||
func (o *Orchestrator) Run(ctx context.Context, req Request) (Result, error)
|
||||
```
|
||||
|
||||
Algorithm:
|
||||
```
|
||||
for each entry in chain:
|
||||
warm = probe llama-swap (if local tier)
|
||||
start = now()
|
||||
if entry.IsCloud:
|
||||
result, err = claude.Run(ctx, req with entry.Model)
|
||||
log attempt(model, tier, duration, warm, verified=true)
|
||||
if err == nil: return result
|
||||
else:
|
||||
result, err = litellm.Run(ctx, req with entry.Model)
|
||||
duration = now() - start
|
||||
if err != nil:
|
||||
log attempt(model, tier, duration, warm, verified=false)
|
||||
continue // automatic escalation on parse/network error
|
||||
verdict = verifier.Verify(ctx, req.SkillPrompt, req.TaskPrompt, result)
|
||||
log attempt(model, tier, duration, warm, verified=verdict.Accept)
|
||||
if verdict.Accept: return result
|
||||
// inject verifier feedback into next tier's task prompt
|
||||
req.TaskPrompt = req.TaskPrompt + "\n\nPrior attempt feedback: " + verdict.Feedback
|
||||
|
||||
return error("all tiers exhausted")
|
||||
```
|
||||
|
||||
### 4. `internal/config/models.go` — chain parser
|
||||
|
||||
Replaces the current single-model resolution with chain parsing.
|
||||
|
||||
Updated `models.yaml` format:
|
||||
|
||||
```yaml
|
||||
verifier: claude-sonnet-4-6 # fixed verifier for all local tiers
|
||||
|
||||
llama_swap_url: http://koala:8080 # for warm-state probing
|
||||
|
||||
default_chain:
|
||||
- ollama/qwen3-coder-30b-tuned
|
||||
- claude-sonnet-4-5
|
||||
|
||||
skills:
|
||||
tdd:
|
||||
chain:
|
||||
- ollama/qwen3-coder-30b-tuned
|
||||
- claude-sonnet-4-5
|
||||
review:
|
||||
chain:
|
||||
- ollama/devstral-tuned
|
||||
- ollama/gemma4
|
||||
- claude-sonnet-4-5
|
||||
debug:
|
||||
chain:
|
||||
- ollama/deepseek-r1-tuned
|
||||
- claude-sonnet-4-5
|
||||
spec:
|
||||
chain:
|
||||
- ollama/phi4
|
||||
- ollama/gemma4
|
||||
- claude-sonnet-4-5
|
||||
- claude-opus-4-6
|
||||
retrospective:
|
||||
chain:
|
||||
- ollama/qwen3-coder-30b-tuned
|
||||
- claude-sonnet-4-5
|
||||
trainer:
|
||||
chain:
|
||||
- ollama/qwen3-coder-30b-tuned
|
||||
- claude-sonnet-4-5
|
||||
```
|
||||
|
||||
The parser exposes:
|
||||
```go
|
||||
func (m *Models) ChainFor(skill string) Chain
|
||||
func (m *Models) Verifier() string
|
||||
func (m *Models) LlamaSwapURL() string
|
||||
```
|
||||
|
||||
Caller override (`model` param in MCP tool call) pins the chain to a single entry — one model, no escalation. This preserves the existing override behaviour for power users.
|
||||
|
||||
### 5. `internal/session/session.go` — updated `Attempt` struct
|
||||
|
||||
```go
|
||||
type Attempt struct {
|
||||
Attempt int `json:"attempt"`
|
||||
Model string `json:"model"`
|
||||
Tier string `json:"tier"` // local | subagent | managed
|
||||
DurationMs int64 `json:"duration_ms"`
|
||||
WarmStart bool `json:"warm_start"` // model was already loaded in llama-swap
|
||||
Verified bool `json:"verified"`
|
||||
Verdict string `json:"verdict,omitempty"` // accept | escalate | error
|
||||
Feedback string `json:"feedback,omitempty"` // verifier feedback on escalation
|
||||
OutputSummary string `json:"output_summary,omitempty"`
|
||||
RunnerOutput string `json:"runner_output,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
### 6. `cmd/supervisor/main.go` — one wiring change
|
||||
|
||||
```go
|
||||
// Before:
|
||||
reg.Register(review.New(review.Config{ExecutorFn: executor.Run, ...}))
|
||||
|
||||
// After:
|
||||
chain := models.ChainFor("review")
|
||||
orch := exec.NewOrchestrator(chain, litellmExec, claudeExec, verifier, models.LlamaSwapURL())
|
||||
reg.Register(review.New(review.Config{ExecutorFn: orch.Run, ...}))
|
||||
```
|
||||
|
||||
One orchestrator per skill, sharing the same `litellmExec`, `claudeExec`, and `verifier` instances.
|
||||
|
||||
---
|
||||
|
||||
## Data flow example: `review` skill call
|
||||
|
||||
1. Claude Code calls `review` tool with `files: ["internal/foo.go"]`
|
||||
2. Skill handler builds task prompt, calls `orch.Run`
|
||||
3. Orchestrator resolves chain: `[devstral, gemma4, sonnet]`
|
||||
4. Probes llama-swap: devstral is warm
|
||||
5. LiteLLM calls devstral → returns JSON result
|
||||
6. Verifier asks Claude: "does this review satisfy the iron laws?"
|
||||
7. Claude: `{"accept": false, "feedback": "missing line references for all findings"}`
|
||||
8. Orchestrator logs attempt #1 (devstral, local, 4200ms, warm, escalate)
|
||||
9. Injects feedback into task prompt, calls gemma4
|
||||
10. Verifier: `{"accept": true}`
|
||||
11. Orchestrator logs attempt #2 (gemma4, local, 6100ms, cold, accept)
|
||||
12. Returns result to skill handler → MCP response
|
||||
|
||||
Session JSONL records both attempts. You can see: devstral was warm but produced weak output; gemma4 was cold but passed.
|
||||
|
||||
---
|
||||
|
||||
## Observability
|
||||
|
||||
Session JSONL is the primary store. Each `Entry.Attempts` slice records the full escalation trail. To analyse across sessions:
|
||||
|
||||
```bash
|
||||
# Which models are escalating most?
|
||||
jq -r '.attempts[] | select(.verdict == "escalate") | .model' brain/sessions/*.jsonl | sort | uniq -c
|
||||
|
||||
# Average latency per model
|
||||
jq -r '.attempts[] | [.model, .duration_ms] | @tsv' brain/sessions/*.jsonl | awk '{sum[$1]+=$2; n[$1]++} END {for (m in sum) print m, sum[m]/n[m]}'
|
||||
|
||||
# Cold start frequency
|
||||
jq -r '.attempts[] | select(.warm_start == false) | .model' brain/sessions/*.jsonl | sort | uniq -c
|
||||
```
|
||||
|
||||
No new metrics infrastructure needed for Phase 3. Phase 4 can build a dashboard on top of this data.
|
||||
|
||||
---
|
||||
|
||||
## Error handling
|
||||
|
||||
| Scenario | Behaviour |
|
||||
|----------|-----------|
|
||||
| LiteLLM unreachable | Log attempt as error, escalate immediately |
|
||||
| Local model returns unparseable JSON | Log attempt as error, escalate |
|
||||
| Verifier call fails | Log, treat as escalate (safe default) |
|
||||
| All tiers exhausted | Return error to skill handler; skill returns MCP error to caller |
|
||||
| Caller passes `model` override | Single-entry chain, no escalation, no verifier call |
|
||||
|
||||
---
|
||||
|
||||
## Testing approach
|
||||
|
||||
- `TestLiteLLMExecutor`: mock HTTP server returning valid/invalid JSON; verify parse logic and error escalation
|
||||
- `TestVerifier`: fake claude executor returning accept/escalate verdicts; verify prompt construction
|
||||
- `TestOrchestrator`: table-driven — chains of 1/2/3 tiers, various accept/escalate/error combinations; verify attempt log contents and final result
|
||||
- `TestModelsChainFor`: YAML parsing for all skill overrides and default_chain fallback
|
||||
- Integration smoke test: start real LiteLLM (or mock), call `review` tool via MCP, verify attempt log written
|
||||
|
||||
---
|
||||
|
||||
## Risks
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Local models ignore output contract → bad JSON | Discipline files already specify JSON output contract; parse failure auto-escalates |
|
||||
| Verifier Claude call adds latency to every local attempt | Verifier prompt is small and fast; acceptable tradeoff for quality gate |
|
||||
| llama-swap warm probe adds overhead | Probe is a single lightweight HTTP GET; timeout at 200ms, treat failure as `warm_start: false` |
|
||||
| Chain exhaustion leaves caller with no result | Return structured error via MCP; caller can retry with explicit `model` override |
|
||||
@@ -0,0 +1,240 @@
|
||||
# Brain Ingestion Pipeline — Design Spec
|
||||
|
||||
**Date:** 2026-04-22
|
||||
**Status:** approved
|
||||
**Author:** Mathias + Claude
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Add a structured ingestion pipeline to the hyperguild brain. The pipeline accepts raw content (directly or from files) and uses an LLM to produce structured wiki pages in `brain/wiki/` — the declarative layer of the Two-Layer Brain. Three fixed knowledge classes: **concepts**, **entities**, **sources**.
|
||||
|
||||
This spec covers:
|
||||
- Three new packages in the `ingestion` Go module (`llm`, `wiki`, `pipeline`, `watcher`)
|
||||
- Two new HTTP endpoints on the ingestion server (`/ingest`, `/ingest-path`)
|
||||
- A background file watcher for `brain/raw/`
|
||||
- Config additions to both the ingestion server and the supervisor
|
||||
|
||||
It does **not** cover Layer 2 (training data, `brain/training-data/`) — that is the trainer worker's concern.
|
||||
|
||||
---
|
||||
|
||||
## Information Model
|
||||
|
||||
Three fixed wiki page classes, matching the Two-Layer Brain design spec and the existing `ingestion-svc` model:
|
||||
|
||||
### `wiki/sources/<slug>.md`
|
||||
One page per ingested source (project, book, article, note). Updated (not replaced) on re-ingestion.
|
||||
|
||||
Required frontmatter: `title`, `type` (article|pdf|book|video|note|project), `domain`, `source_url`, `date_ingested`, `last_updated`, `aliases`.
|
||||
|
||||
Body sections: Summary · Key Claims · Concepts Introduced or Reinforced · Entities Mentioned · Open Questions Raised. Books add: Chapters · Argument Arc · Updates (dated, append-only).
|
||||
|
||||
### `wiki/concepts/<slug>.md`
|
||||
One page per idea, framework, methodology, or pattern (e.g. Domain Driven Design, TDD, event sourcing).
|
||||
|
||||
Required frontmatter: `title`, `domain`, `last_updated`, `aliases`.
|
||||
|
||||
Body sections: Definition · Why It Matters · Related Concepts · Related Entities · Sources · Evolving Notes.
|
||||
|
||||
### `wiki/entities/<slug>.md`
|
||||
One page per person, tool, organisation, technology, or product.
|
||||
|
||||
Required frontmatter: `title`, `type` (person|company|tool|model|framework|technology), `domain`, `last_updated`, `aliases`.
|
||||
|
||||
Body sections: Description · Relevance · Key Positions/Products/Claims · Related Concepts · Related Entities · Sources.
|
||||
|
||||
### Wikilink format
|
||||
All cross-references use `[[slug|Display Text]]`. Slug = lowercase title, spaces→hyphens, non-alphanumeric stripped. Slugs must resolve to an existing file in the wiki.
|
||||
|
||||
### Supporting files
|
||||
- `brain/wiki/index.md` — auto-rebuilt on every ingest: one-sentence summary per page, grouped by type
|
||||
- `brain/log.md` — append-only audit trail: date, source, pages written, warnings
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### New packages (`ingestion` module)
|
||||
|
||||
```
|
||||
ingestion/internal/
|
||||
llm/ — OpenAI-compatible HTTP client (chat completions, retry on 429,
|
||||
configurable timeout and temperature)
|
||||
wiki/ — Page types, slug utilities, merge logic, inventory loader,
|
||||
index rebuilder, log appender
|
||||
pipeline/ — Orchestrates one ingest run end-to-end (content or extracted file text)
|
||||
watcher/ — Polls brain/raw/ and triggers pipeline on new files
|
||||
```
|
||||
|
||||
The existing `api/` and `search/` packages are updated; no other existing packages change.
|
||||
|
||||
### Brain directory layout
|
||||
|
||||
```
|
||||
brain/
|
||||
wiki/
|
||||
concepts/ ← LLM-structured concept pages
|
||||
entities/ ← LLM-structured entity pages
|
||||
sources/ ← LLM-structured source pages
|
||||
index.md ← auto-rebuilt on each ingest
|
||||
knowledge/ ← quick raw notes via brain_write (BM25-searchable, unchanged)
|
||||
raw/ ← drop zone; watcher picks up files here
|
||||
processed/ ← moved here on success (organised by date: processed/YYYY-MM-DD/)
|
||||
failed/ ← moved here on failure
|
||||
sessions/ ← session logs (retrospective/trainer concern, not touched here)
|
||||
training-data/ ← Layer 2 (trainer worker concern, not touched here)
|
||||
log.md ← append-only audit trail
|
||||
CLAUDE.md ← schema document injected into every ingest prompt
|
||||
```
|
||||
|
||||
If `brain/CLAUDE.md` is absent, the pipeline falls back to an embedded default schema compiled into the binary.
|
||||
|
||||
---
|
||||
|
||||
## API
|
||||
|
||||
### `POST /ingest`
|
||||
|
||||
Ingest content provided directly by the caller.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"content": "...",
|
||||
"source": "shape-up-book",
|
||||
"dry_run": false
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"pages": ["wiki/sources/shape-up.md", "wiki/concepts/betting-table.md"],
|
||||
"warnings": []
|
||||
}
|
||||
```
|
||||
|
||||
`source` is the human-readable name used when writing/updating `wiki/sources/<slug>.md`. `dry_run: true` returns the page contents without writing.
|
||||
|
||||
### `POST /ingest-path`
|
||||
|
||||
Ingest a file or walk a directory recursively. Supports `.md`, `.txt`, `.pdf`.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"path": "/Users/mathias/brain/raw/shape-up.pdf",
|
||||
"source": "shape-up-book",
|
||||
"dry_run": false
|
||||
}
|
||||
```
|
||||
|
||||
If `path` is a directory, all supported files within it are ingested in sequence. `source` is optional for directory ingestion — if omitted, the LLM derives it from each file's name and content.
|
||||
|
||||
**Response:** same shape as `/ingest`, with pages and warnings aggregated across all files.
|
||||
|
||||
### Supervisor skill update
|
||||
|
||||
`brain_ingest` in `internal/skills/brain/handlers.go` gains an optional `path` field. If `path` is set, it calls `/ingest-path`; otherwise `/ingest`.
|
||||
|
||||
---
|
||||
|
||||
## Pipeline
|
||||
|
||||
`pipeline.Run(ctx, cfg, brainDir, content, source, dryRun)` — called by both HTTP handlers after any file reading is done.
|
||||
|
||||
Steps:
|
||||
|
||||
1. **Load inventory** — walk `brain/wiki/{concepts,entities,sources}/`, build slug index grouped by type. Injected into prompt so LLM knows what to update vs create.
|
||||
2. **Load schema** — read `brain/CLAUDE.md`; fall back to embedded default if absent.
|
||||
3. **Chunk** — split content at `INGEST_CHUNK_SIZE` chars (default 6000; split on paragraph boundary). If `INGEST_CHUNK_SIZE=0`, no chunking.
|
||||
4. **LLM call per chunk** — returns JSON array of `{"path": "wiki/concepts/foo.md", "content": "..."}`. Prompt structure: system instruction → date → schema → inventory → non-negotiable slug/wikilink rules → source content.
|
||||
5. **Parse + truncation recovery** — strip markdown fences if present. If JSON array is truncated mid-object (token limit), salvage all complete objects before the break and log a warning.
|
||||
6. **Merge** — combine pages with the same path across chunks:
|
||||
- Bullet sections (Related Concepts, Related Entities, Sources, Key Claims): union unique lines
|
||||
- Append sections (Evolving Notes, Updates, Open Questions): append new content
|
||||
- All other sections: keep first occurrence
|
||||
- Frontmatter: keep first occurrence
|
||||
7. **Write** — create subdirs as needed, write files atomically. In dry-run mode, return page map without writing.
|
||||
8. **Rebuild `index.md`** — one-sentence summary per page (derived from first body paragraph), grouped by type, with page count header.
|
||||
9. **Append to `log.md`** — date, source, list of pages written, warning count.
|
||||
|
||||
---
|
||||
|
||||
## File Watcher
|
||||
|
||||
Background goroutine started at server startup (when `INGEST_WATCH_INTERVAL > 0`).
|
||||
|
||||
**Poll loop:**
|
||||
1. Walk `brain/raw/` for files with supported extensions (`.md`, `.txt`, `.pdf`), excluding `processed/` and `failed/` subdirs.
|
||||
2. For each file found: derive source from filename (strip extension, kebab-to-title), call `pipeline.Run` with the file content.
|
||||
3. On success: move file to `brain/raw/processed/YYYY-MM-DD/<filename>`.
|
||||
4. On failure: move file to `brain/raw/failed/<filename>`, append error to `brain/log.md`.
|
||||
5. Sleep `INGEST_WATCH_INTERVAL` seconds, repeat.
|
||||
|
||||
Files are processed one at a time (no concurrency within the watcher) to avoid LLM rate-limit collisions.
|
||||
|
||||
---
|
||||
|
||||
## LLM Prompt
|
||||
|
||||
**System:**
|
||||
> You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided. Output ONLY a valid JSON array — no markdown fences, no other text. Each element must have: `"path"` (relative path within wiki, e.g. `"wiki/sources/foo.md"`) and `"content"` (full markdown including YAML frontmatter). Follow the schema strictly: correct frontmatter fields, wikilinks as `[[slug|Display Text]]`, dates in YYYY-MM-DD format, paraphrase rather than quoting verbatim.
|
||||
|
||||
**User (built dynamically):**
|
||||
1. Today's date
|
||||
2. Full schema (`brain/CLAUDE.md` content)
|
||||
3. Existing wiki inventory grouped by type (for update-vs-create decisions)
|
||||
4. Non-negotiable rules: slug format, wikilink format, one-source-per-book, section type enforcement
|
||||
5. Source content (the chunk)
|
||||
|
||||
Temperature: 0.2 for reproducibility.
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Ingestion server (new env vars)
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `INGEST_LLM_URL` | `http://iguana:4000/v1` | OpenAI-compatible endpoint |
|
||||
| `INGEST_LLM_KEY` | (empty) | API key |
|
||||
| `INGEST_LLM_MODEL` | `koala/qwen35-9b-fast` | Model name |
|
||||
| `INGEST_LLM_TIMEOUT` | `15` | LLM call timeout (minutes) |
|
||||
| `INGEST_CHUNK_SIZE` | `6000` | Max chars per LLM call (0 = no chunking) |
|
||||
| `INGEST_WATCH_INTERVAL` | `30` | Watcher poll interval in seconds (0 = disabled) |
|
||||
|
||||
### Supervisor (new env vars + wiring)
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `INGEST_SVC_URL` | (empty) | URL of ingestion server for `brain_ingest` |
|
||||
| `KB_RETRIEVAL_URL` | (empty) | URL of KB retrieval server for `brain_search` |
|
||||
|
||||
`config.go` gets two new fields. `main.go` passes them to `brain.New()`. Both tools are only registered as MCP tools when the respective URL is configured (already implemented in `skill.go`).
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
| Package | What is tested |
|
||||
|---|---|
|
||||
| `wiki/` | Slug generation (edge cases: apostrophes, colons, version strings), merge logic (bullets union, append, keep-first), inventory loading from temp dir, truncation recovery (valid partial JSON), index rebuild output |
|
||||
| `pipeline/` | Integration test: temp brain dir + mock LLM HTTP server returning fixture JSON; verify files written to correct paths, index rebuilt, log appended |
|
||||
| `api/` | Handler tests for `/ingest` and `/ingest-path` using mock pipeline; 400 on missing fields, 200 with expected response shape |
|
||||
| `watcher/` | File placed in `brain/raw/` is moved to `processed/` on mock-pipeline success; moved to `failed/` on error |
|
||||
|
||||
All tests are table-driven. No real LLM calls in tests.
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Python validation/correction loop (can be added later; the LLM prompt enforces schema rules as non-negotiable instructions)
|
||||
- `brain/training-data/` — trainer worker concern
|
||||
- `brain/sessions/` — retrospective/sessionlog concern
|
||||
- Upload endpoint (multipart HTTP) — `scp`/rsync to `brain/raw/` + watcher covers this
|
||||
- Qdrant vector indexing — `brain_search` calls a separate KB retrieval service; ingestion does not write to Qdrant
|
||||
@@ -0,0 +1,148 @@
|
||||
# Level 3: Strip Slug Authority from LLM — Design Spec
|
||||
|
||||
## Problem
|
||||
|
||||
The ingestion pipeline currently asks the LLM to produce full wiki pages including the file path (e.g. `wiki/sources/finbert-huggingface.md`). This causes two classes of bug:
|
||||
|
||||
1. **Slug proliferation** — the LLM invents different slugs for the same concept across chunks or runs, producing duplicate pages that diverge in content.
|
||||
2. **Unstable paths** — the LLM may shorten, expand, or vary titles, making deduplication via `Resolve` unreliable because the slug mismatch is upstream of the normalizer.
|
||||
|
||||
## Solution
|
||||
|
||||
Strip slug authority from the LLM entirely. The LLM returns a minimal structured object. The pipeline computes all slugs deterministically from titles using `wiki.Slug(title)`.
|
||||
|
||||
---
|
||||
|
||||
## LLM JSON Contract
|
||||
|
||||
### Output format (per page)
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "FinBERT",
|
||||
"type": "concept",
|
||||
"subtype": "framework",
|
||||
"domain": "ai-llm",
|
||||
"content": "## Definition\n\nA BERT-based model fine-tuned for financial sentiment...\n\n## Related\n\n- [[Sentiment Analysis]]\n- [[Hugging Face]]\n"
|
||||
}
|
||||
```
|
||||
|
||||
**Fields:**
|
||||
|
||||
| Field | Required | Values |
|
||||
|-------|----------|--------|
|
||||
| `title` | yes | Human-readable title, e.g. "FinBERT" |
|
||||
| `type` | yes | `"source"` \| `"concept"` \| `"entity"` |
|
||||
| `subtype` | for entity/source | entity: `person\|company\|tool\|model\|framework\|technology`; source: `article\|pdf\|book\|video\|note\|project` |
|
||||
| `domain` | no | tag string, e.g. `ai-llm`, `finance` |
|
||||
| `content` | yes | Markdown body sections only — no frontmatter, no path |
|
||||
|
||||
**Wikilinks in content:** `[[Display Name]]` only. No slug. The pipeline canonicalizes to `[[slug|Display Name]]` in a post-processing step.
|
||||
|
||||
**The LLM never writes slugs, paths, or frontmatter.**
|
||||
|
||||
---
|
||||
|
||||
## Pipeline Changes
|
||||
|
||||
### New type: `RawPage`
|
||||
|
||||
```go
|
||||
type RawPage struct {
|
||||
Title string
|
||||
Type string // "source" | "concept" | "entity"
|
||||
Subtype string
|
||||
Domain string
|
||||
Content string
|
||||
}
|
||||
```
|
||||
|
||||
### New step order
|
||||
|
||||
```
|
||||
ParseRawPages → BuildPages → Resolve → CanonicalizeLinks → injectSourceRefs → mergeAll → write
|
||||
```
|
||||
|
||||
### Step descriptions
|
||||
|
||||
**`ParseRawPages(output string) ([]RawPage, []string)`**
|
||||
Replaces `ParsePages`. Deserializes JSON objects with the new schema. Same truncation-recovery logic as today. Returns `(pages, warnings)`.
|
||||
|
||||
**`BuildPages(rawPages []RawPage, sourceSlug, date string) []wiki.Page`**
|
||||
Converts `RawPage → wiki.Page`:
|
||||
- Computes slug: `wiki.Slug(page.Title)`
|
||||
- Computes path: `wiki/<type>/<slug>.md`
|
||||
- Assembles frontmatter:
|
||||
```
|
||||
---
|
||||
title: <Title>
|
||||
type: <type>
|
||||
subtype: <subtype> # omitted if empty
|
||||
domain: <domain> # omitted if empty
|
||||
created: <date>
|
||||
source: <sourceSlug> # omitted for the source page itself
|
||||
---
|
||||
```
|
||||
- Concatenates frontmatter + content
|
||||
|
||||
**`Resolve(pages []wiki.Page, inventory) []wiki.Page`**
|
||||
Unchanged. Normalizes near-duplicate titles to existing inventory slugs.
|
||||
|
||||
**`CanonicalizeLinks(pages []wiki.Page, inventory) ([]wiki.Page, []string)`**
|
||||
New. Builds a title→slug map from inventory + current batch. Replaces `[[Display Name]]` with `[[slug|Display Name]]` in each page's content. Titles with no known slug are left as-is and returned as warnings.
|
||||
|
||||
**`injectSourceRefs`**
|
||||
Unchanged. Reads `[[slug|...]]` links (post-canonicalization) to inject back-references.
|
||||
|
||||
**`mergeAll → write`**
|
||||
Unchanged.
|
||||
|
||||
### `pipeline.Run` signature change
|
||||
|
||||
```go
|
||||
func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error)
|
||||
```
|
||||
|
||||
`source` is already passed (it's the display name / filename). A new internal `sourceSlug` is derived from it via `wiki.Slug(source)` before calling `BuildPages`. No API change needed.
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `ingestion/internal/pipeline/parse.go` | Replace `ParsePages` with `ParseRawPages` + `RawPage` type |
|
||||
| `ingestion/internal/pipeline/build.go` | New file: `BuildPages` |
|
||||
| `ingestion/internal/pipeline/links.go` | New file: `CanonicalizeLinks` |
|
||||
| `ingestion/internal/pipeline/pipeline.go` | Wire new steps; derive `sourceSlug` from `source` |
|
||||
| `ingestion/internal/pipeline/prompt.go` | New system prompt + `BuildPrompt` for new JSON format |
|
||||
| `brain/schema.md` | Update wikilink format and JSON schema docs |
|
||||
|
||||
`resolve.go`, `refs.go`, `backfill.go`, `merge.go` — no changes.
|
||||
|
||||
---
|
||||
|
||||
## Wikilink Format
|
||||
|
||||
- **LLM output**: `[[Display Name]]`
|
||||
- **Stored on disk**: `[[slug|Display Name]]`
|
||||
- **`CanonicalizeLinks`** converts between the two using the inventory
|
||||
|
||||
This matches Obsidian's display-alias syntax that the existing codebase already uses.
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
- `ParseRawPages`: table-driven, cover valid JSON, truncated output, unknown type, missing title
|
||||
- `BuildPages`: table-driven, cover slug computation, frontmatter assembly, source page (no `source:` field), entity with subtype
|
||||
- `CanonicalizeLinks`: cover known title → replaced, unknown title → left as-is + warning, multiple links in one page
|
||||
- Integration test: full `Run` call with mock LLM returning new JSON format, assert no slug duplication across two chunks of the same source
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Re-ingesting existing pages (user will trigger manually after deploy)
|
||||
- Changing the `BackfillRefs` endpoint (already correct, slug-based)
|
||||
- Changing the `Resolve` fuzzy-match algorithm
|
||||
36
ingestion/Dockerfile
Normal file
36
ingestion/Dockerfile
Normal file
@@ -0,0 +1,36 @@
|
||||
# syntax=docker/dockerfile:1
|
||||
|
||||
FROM golang:1.26-bookworm AS builder
|
||||
|
||||
ARG VERSION=dev
|
||||
WORKDIR /src
|
||||
|
||||
COPY go.mod go.sum ./
|
||||
RUN go mod download
|
||||
|
||||
COPY . .
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
|
||||
go build -trimpath -ldflags="-s -w" \
|
||||
-o /out/ingestion ./cmd/server
|
||||
|
||||
FROM alpine:3.21
|
||||
|
||||
RUN apk add --no-cache poppler-utils
|
||||
|
||||
COPY --from=builder /out/ingestion /usr/local/bin/ingestion
|
||||
|
||||
RUN addgroup -S ingestion && adduser -S -G ingestion ingestion
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# brain/ is writable state — mount a PersistentVolume here
|
||||
VOLUME /app/brain
|
||||
|
||||
ENV INGEST_BRAIN_DIR=/app/brain
|
||||
ENV INGEST_PORT=3300
|
||||
|
||||
USER ingestion
|
||||
|
||||
EXPOSE 3300
|
||||
|
||||
ENTRYPOINT ["/usr/local/bin/ingestion"]
|
||||
@@ -2,34 +2,88 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"os"
|
||||
"strconv"
|
||||
"time"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/api"
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/watcher"
|
||||
)
|
||||
|
||||
func envOr(key, fallback string) string {
|
||||
if v := os.Getenv(key); v != "" {
|
||||
return v
|
||||
}
|
||||
return fallback
|
||||
}
|
||||
|
||||
func envInt(key string, fallback int) int {
|
||||
if v := os.Getenv(key); v != "" {
|
||||
if n, err := strconv.Atoi(v); err == nil {
|
||||
return n
|
||||
}
|
||||
}
|
||||
return fallback
|
||||
}
|
||||
|
||||
func main() {
|
||||
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
|
||||
|
||||
brainDir := os.Getenv("INGEST_BRAIN_DIR")
|
||||
if brainDir == "" {
|
||||
brainDir = "../brain"
|
||||
brainDir := envOr("INGEST_BRAIN_DIR", "../brain")
|
||||
port := envOr("INGEST_PORT", "3300")
|
||||
|
||||
llmURL := envOr("INGEST_LLM_URL", "http://iguana:4000/v1")
|
||||
llmKey := os.Getenv("INGEST_LLM_KEY")
|
||||
llmModel := envOr("INGEST_LLM_MODEL", "koala/qwen35-9b-fast")
|
||||
llmTimeoutMins := envInt("INGEST_LLM_TIMEOUT", 15)
|
||||
chunkSize := envInt("INGEST_CHUNK_SIZE", 6000)
|
||||
watchInterval := envInt("INGEST_WATCH_INTERVAL", 30)
|
||||
|
||||
llmClient := llm.New(llmURL, llmKey, llmModel, time.Duration(llmTimeoutMins)*time.Minute)
|
||||
|
||||
pipelineCfg := pipeline.Config{
|
||||
Complete: llmClient.Complete,
|
||||
ChunkSize: chunkSize,
|
||||
}
|
||||
|
||||
port := os.Getenv("INGEST_PORT")
|
||||
if port == "" {
|
||||
port = "3300"
|
||||
}
|
||||
h := api.NewHandler(brainDir, logger, pipelineCfg)
|
||||
|
||||
h := api.NewHandler(brainDir, logger)
|
||||
ctx := context.Background()
|
||||
if watchInterval > 0 {
|
||||
watcher.Start(ctx, watcher.Config{
|
||||
BrainDir: brainDir,
|
||||
Interval: time.Duration(watchInterval) * time.Second,
|
||||
Pipeline: pipelineCfg,
|
||||
})
|
||||
}
|
||||
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("/query", h.Query)
|
||||
mux.HandleFunc("/write", h.Write)
|
||||
mux.HandleFunc("POST /query", h.Query)
|
||||
mux.HandleFunc("POST /write", h.Write)
|
||||
mux.HandleFunc("POST /ingest", h.Ingest)
|
||||
mux.HandleFunc("POST /ingest-path", h.IngestPath)
|
||||
mux.HandleFunc("POST /ingest-raw", h.IngestRaw)
|
||||
mux.HandleFunc("POST /backfill-refs", h.BackfillRefs)
|
||||
|
||||
addr := ":" + port
|
||||
logger.Info("ingestion server starting", "addr", addr, "brain_dir", brainDir)
|
||||
watchIntervalLog := "disabled"
|
||||
if watchInterval > 0 {
|
||||
watchIntervalLog = fmt.Sprintf("%ds", watchInterval)
|
||||
}
|
||||
logger.Info("ingestion server starting",
|
||||
"addr", addr,
|
||||
"brain_dir", brainDir,
|
||||
"llm_url", llmURL,
|
||||
"llm_model", llmModel,
|
||||
"chunk_size", chunkSize,
|
||||
"watch_interval", watchIntervalLog,
|
||||
)
|
||||
if err := http.ListenAndServe(addr, mux); err != nil {
|
||||
logger.Error("server stopped", "err", err)
|
||||
os.Exit(1)
|
||||
|
||||
@@ -11,6 +11,8 @@ import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/extract"
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/search"
|
||||
)
|
||||
|
||||
@@ -18,11 +20,15 @@ import (
|
||||
type Handler struct {
|
||||
brainDir string
|
||||
logger *slog.Logger
|
||||
pipeline pipeline.Config
|
||||
}
|
||||
|
||||
// NewHandler constructs a Handler. brainDir is the absolute path to brain/.
|
||||
func NewHandler(brainDir string, logger *slog.Logger) *Handler {
|
||||
return &Handler{brainDir: brainDir, logger: logger}
|
||||
func NewHandler(brainDir string, logger *slog.Logger, pipelineCfg pipeline.Config) *Handler {
|
||||
if logger == nil {
|
||||
logger = slog.Default()
|
||||
}
|
||||
return &Handler{brainDir: brainDir, logger: logger, pipeline: pipelineCfg}
|
||||
}
|
||||
|
||||
type queryRequest struct {
|
||||
@@ -37,15 +43,32 @@ type writeRequest struct {
|
||||
Domain string `json:"domain,omitempty"`
|
||||
}
|
||||
|
||||
type ingestRequest struct {
|
||||
Content string `json:"content"`
|
||||
Source string `json:"source"`
|
||||
DryRun bool `json:"dry_run"`
|
||||
}
|
||||
|
||||
type ingestPathRequest struct {
|
||||
Path string `json:"path"`
|
||||
Source string `json:"source"`
|
||||
DryRun bool `json:"dry_run"`
|
||||
}
|
||||
|
||||
type ingestResponse struct {
|
||||
Pages []string `json:"pages"`
|
||||
Warnings []string `json:"warnings"`
|
||||
}
|
||||
|
||||
// Query handles POST /query — full-text search across the brain wiki.
|
||||
func (h *Handler) Query(w http.ResponseWriter, r *http.Request) {
|
||||
var req queryRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
http.Error(w, "invalid JSON", http.StatusBadRequest)
|
||||
writeError(w, http.StatusBadRequest, "invalid JSON")
|
||||
return
|
||||
}
|
||||
if strings.TrimSpace(req.Query) == "" {
|
||||
http.Error(w, "query is required", http.StatusBadRequest)
|
||||
writeError(w, http.StatusBadRequest, "query is required")
|
||||
return
|
||||
}
|
||||
if req.Limit == 0 {
|
||||
@@ -55,22 +78,22 @@ func (h *Handler) Query(w http.ResponseWriter, r *http.Request) {
|
||||
results, err := search.Query(h.brainDir, req.Query, req.Limit)
|
||||
if err != nil {
|
||||
h.logger.Error("query failed", "err", err)
|
||||
http.Error(w, "search error", http.StatusInternalServerError)
|
||||
writeError(w, http.StatusInternalServerError, "search error")
|
||||
return
|
||||
}
|
||||
|
||||
writeJSON(w, map[string]any{"results": results})
|
||||
}
|
||||
|
||||
// Write handles POST /write — write raw content to brain/raw/.
|
||||
// Write handles POST /write — write raw content to brain/knowledge/.
|
||||
func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
|
||||
var req writeRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
http.Error(w, "invalid JSON", http.StatusBadRequest)
|
||||
writeError(w, http.StatusBadRequest, "invalid JSON")
|
||||
return
|
||||
}
|
||||
if req.Content == "" {
|
||||
http.Error(w, "content is required", http.StatusBadRequest)
|
||||
writeError(w, http.StatusBadRequest, "content is required")
|
||||
return
|
||||
}
|
||||
|
||||
@@ -79,9 +102,9 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
|
||||
filename = fmt.Sprintf("%s-auto.md", time.Now().UTC().Format("2006-01-02-150405"))
|
||||
}
|
||||
|
||||
rawDir := filepath.Join(h.brainDir, "raw")
|
||||
rawDir := filepath.Join(h.brainDir, "knowledge")
|
||||
if err := os.MkdirAll(rawDir, 0o755); err != nil {
|
||||
http.Error(w, "failed to create raw dir", http.StatusInternalServerError)
|
||||
writeError(w, http.StatusInternalServerError, "failed to create raw dir")
|
||||
return
|
||||
}
|
||||
|
||||
@@ -99,10 +122,18 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
|
||||
finalContent = fm.String() + req.Content
|
||||
}
|
||||
|
||||
dest := filepath.Join(rawDir, filepath.Base(filename))
|
||||
base := filepath.Base(filename)
|
||||
if !strings.HasSuffix(base, ".md") {
|
||||
base += ".md"
|
||||
}
|
||||
dest := filepath.Join(rawDir, base)
|
||||
if !strings.HasPrefix(filepath.Clean(dest)+string(os.PathSeparator), filepath.Clean(rawDir)+string(os.PathSeparator)) {
|
||||
writeError(w, http.StatusBadRequest, "invalid filename")
|
||||
return
|
||||
}
|
||||
if err := os.WriteFile(dest, []byte(finalContent), 0o644); err != nil {
|
||||
h.logger.Error("write failed", "err", err)
|
||||
http.Error(w, "write error", http.StatusInternalServerError)
|
||||
writeError(w, http.StatusInternalServerError, "write error")
|
||||
return
|
||||
}
|
||||
|
||||
@@ -110,7 +141,198 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
|
||||
writeJSON(w, map[string]string{"path": filepath.ToSlash(rel)})
|
||||
}
|
||||
|
||||
// Ingest handles POST /ingest — run the pipeline on provided content.
|
||||
func (h *Handler) Ingest(w http.ResponseWriter, r *http.Request) {
|
||||
var req ingestRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeError(w, http.StatusBadRequest, "invalid JSON")
|
||||
return
|
||||
}
|
||||
if strings.TrimSpace(req.Content) == "" {
|
||||
writeError(w, http.StatusBadRequest, "content is required")
|
||||
return
|
||||
}
|
||||
if strings.TrimSpace(req.Source) == "" {
|
||||
writeError(w, http.StatusBadRequest, "source is required")
|
||||
return
|
||||
}
|
||||
|
||||
result, err := pipeline.Run(r.Context(), h.pipeline, h.brainDir, req.Content, req.Source, req.DryRun)
|
||||
if err != nil {
|
||||
h.logger.Error("ingest failed", "source", req.Source, "err", err)
|
||||
writeError(w, http.StatusInternalServerError, "ingest error")
|
||||
return
|
||||
}
|
||||
|
||||
pages := result.Pages
|
||||
if pages == nil {
|
||||
pages = []string{}
|
||||
}
|
||||
warnings := result.Warnings
|
||||
if warnings == nil {
|
||||
warnings = []string{}
|
||||
}
|
||||
writeJSON(w, ingestResponse{Pages: pages, Warnings: warnings})
|
||||
}
|
||||
|
||||
// supportedExtensions lists file extensions that IngestPath will process.
|
||||
var supportedExtensions = map[string]bool{
|
||||
".md": true,
|
||||
".txt": true,
|
||||
".pdf": true,
|
||||
}
|
||||
|
||||
// IngestPath handles POST /ingest-path — ingest a file or directory.
|
||||
func (h *Handler) IngestPath(w http.ResponseWriter, r *http.Request) {
|
||||
var req ingestPathRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeError(w, http.StatusBadRequest, "invalid JSON")
|
||||
return
|
||||
}
|
||||
if strings.TrimSpace(req.Path) == "" {
|
||||
writeError(w, http.StatusBadRequest, "path is required")
|
||||
return
|
||||
}
|
||||
|
||||
info, err := os.Stat(req.Path)
|
||||
if err != nil {
|
||||
writeError(w, http.StatusBadRequest, fmt.Sprintf("path not accessible: %v", err))
|
||||
return
|
||||
}
|
||||
|
||||
var allPages []string
|
||||
var allWarnings []string
|
||||
|
||||
if info.IsDir() {
|
||||
err = filepath.WalkDir(req.Path, func(path string, d os.DirEntry, walkErr error) error {
|
||||
if walkErr != nil {
|
||||
return walkErr
|
||||
}
|
||||
if d.IsDir() {
|
||||
return nil
|
||||
}
|
||||
ext := strings.ToLower(filepath.Ext(path))
|
||||
if !supportedExtensions[ext] {
|
||||
return nil
|
||||
}
|
||||
content, readErr := extract.Text(path)
|
||||
if readErr != nil {
|
||||
allWarnings = append(allWarnings, fmt.Sprintf("extract %s: %v", path, readErr))
|
||||
return nil
|
||||
}
|
||||
source := req.Source
|
||||
if source == "" {
|
||||
source = filepath.Base(path)
|
||||
}
|
||||
result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, content, source, req.DryRun)
|
||||
if runErr != nil {
|
||||
allWarnings = append(allWarnings, fmt.Sprintf("ingest %s: %v", path, runErr))
|
||||
return nil
|
||||
}
|
||||
allPages = append(allPages, result.Pages...)
|
||||
allWarnings = append(allWarnings, result.Warnings...)
|
||||
return nil
|
||||
})
|
||||
if err != nil {
|
||||
h.logger.Error("walk dir failed", "path", req.Path, "err", err)
|
||||
writeError(w, http.StatusInternalServerError, fmt.Sprintf("walk error: %v", err))
|
||||
return
|
||||
}
|
||||
} else {
|
||||
ext := strings.ToLower(filepath.Ext(req.Path))
|
||||
if !supportedExtensions[ext] {
|
||||
writeError(w, http.StatusBadRequest, fmt.Sprintf("unsupported file extension: %s", ext))
|
||||
return
|
||||
}
|
||||
content, readErr := extract.Text(req.Path)
|
||||
if readErr != nil {
|
||||
writeError(w, http.StatusInternalServerError, fmt.Sprintf("extract text: %v", readErr))
|
||||
return
|
||||
}
|
||||
source := req.Source
|
||||
if source == "" {
|
||||
source = filepath.Base(req.Path)
|
||||
}
|
||||
result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, content, source, req.DryRun)
|
||||
if runErr != nil {
|
||||
h.logger.Error("ingest-path failed", "path", req.Path, "err", runErr)
|
||||
writeError(w, http.StatusInternalServerError, "ingest error")
|
||||
return
|
||||
}
|
||||
allPages = result.Pages
|
||||
allWarnings = result.Warnings
|
||||
}
|
||||
|
||||
if allPages == nil {
|
||||
allPages = []string{}
|
||||
}
|
||||
if allWarnings == nil {
|
||||
allWarnings = []string{}
|
||||
}
|
||||
writeJSON(w, ingestResponse{Pages: allPages, Warnings: allWarnings})
|
||||
}
|
||||
|
||||
type ingestRawRequest struct {
|
||||
Source string `json:"source"`
|
||||
Pages []pipeline.RawPage `json:"pages"`
|
||||
DryRun bool `json:"dry_run"`
|
||||
}
|
||||
|
||||
// IngestRaw handles POST /ingest-raw — run the pipeline on pre-parsed RawPages,
|
||||
// skipping the LLM extraction step. Use when the caller has already produced
|
||||
// structured page data (e.g. from a more capable model or manual curation).
|
||||
func (h *Handler) IngestRaw(w http.ResponseWriter, r *http.Request) {
|
||||
var req ingestRawRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeError(w, http.StatusBadRequest, "invalid JSON")
|
||||
return
|
||||
}
|
||||
if strings.TrimSpace(req.Source) == "" {
|
||||
writeError(w, http.StatusBadRequest, "source is required")
|
||||
return
|
||||
}
|
||||
if len(req.Pages) == 0 {
|
||||
writeError(w, http.StatusBadRequest, "pages is required and must be non-empty")
|
||||
return
|
||||
}
|
||||
|
||||
result, err := pipeline.RunRaw(h.brainDir, req.Source, req.Pages, req.DryRun)
|
||||
if err != nil {
|
||||
h.logger.Error("ingest-raw failed", "source", req.Source, "err", err)
|
||||
writeError(w, http.StatusInternalServerError, "ingest error")
|
||||
return
|
||||
}
|
||||
|
||||
pages := result.Pages
|
||||
if pages == nil {
|
||||
pages = []string{}
|
||||
}
|
||||
warnings := result.Warnings
|
||||
if warnings == nil {
|
||||
warnings = []string{}
|
||||
}
|
||||
writeJSON(w, ingestResponse{Pages: pages, Warnings: warnings})
|
||||
}
|
||||
|
||||
// BackfillRefs handles POST /backfill-refs — injects source back-references
|
||||
// into all concept and entity pages based on existing wiki/sources/ pages.
|
||||
func (h *Handler) BackfillRefs(w http.ResponseWriter, r *http.Request) {
|
||||
n, err := pipeline.BackfillRefs(r.Context(), h.brainDir)
|
||||
if err != nil {
|
||||
h.logger.Error("backfill-refs failed", "err", err)
|
||||
writeError(w, http.StatusInternalServerError, "backfill error")
|
||||
return
|
||||
}
|
||||
writeJSON(w, map[string]int{"updated": n})
|
||||
}
|
||||
|
||||
func writeJSON(w http.ResponseWriter, v any) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(v) //nolint:errcheck
|
||||
}
|
||||
|
||||
func writeError(w http.ResponseWriter, code int, msg string) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(code)
|
||||
json.NewEncoder(w).Encode(map[string]string{"error": msg}) //nolint:errcheck
|
||||
}
|
||||
|
||||
@@ -3,6 +3,7 @@ package api_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
@@ -12,25 +13,43 @@ import (
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/api"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/api"
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
||||
)
|
||||
|
||||
// stubComplete returns a fixed JSON RawPage so tests never call a real LLM.
|
||||
func stubComplete(_ context.Context, _, _ string) (string, error) {
|
||||
return `[{"title":"Test Source","type":"source","subtype":"article","content":"## Summary\n\nSome content here.\n"}]`, nil
|
||||
}
|
||||
|
||||
func stubPipelineCfg() pipeline.Config {
|
||||
return pipeline.Config{
|
||||
Complete: stubComplete,
|
||||
ChunkSize: 0,
|
||||
Schema: "# Test Schema\nwiki/sources/, wiki/concepts/, wiki/entities/",
|
||||
}
|
||||
}
|
||||
|
||||
func setup(t *testing.T) (string, *api.Handler) {
|
||||
t.Helper()
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "raw"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "knowledge"), 0o755))
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "concepts", "tdd.md"),
|
||||
filepath.Join(dir, "knowledge", "tdd.md"),
|
||||
[]byte("---\ntitle: TDD\ndomain: software\n---\n\nTest-driven development is a discipline.\n"),
|
||||
0o644,
|
||||
))
|
||||
logger := slog.New(slog.NewTextHandler(os.Stderr, nil))
|
||||
return dir, api.NewHandler(dir, logger)
|
||||
return dir, api.NewHandler(dir, logger, stubPipelineCfg())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Existing tests (Write / Query)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
func TestQuery_ReturnsResults(t *testing.T) {
|
||||
_, h := setup(t)
|
||||
body, _ := json.Marshal(map[string]any{"query": "test driven", "limit": 5})
|
||||
@@ -46,7 +65,7 @@ func TestQuery_ReturnsResults(t *testing.T) {
|
||||
assert.NotEmpty(t, results)
|
||||
}
|
||||
|
||||
func TestWrite_CreatesRawFile(t *testing.T) {
|
||||
func TestWrite_CreatesKnowledgeFile(t *testing.T) {
|
||||
dir, h := setup(t)
|
||||
body, _ := json.Marshal(map[string]any{
|
||||
"content": "# Test note\n\nSome content.",
|
||||
@@ -62,8 +81,7 @@ func TestWrite_CreatesRawFile(t *testing.T) {
|
||||
require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
|
||||
assert.NotEmpty(t, resp["path"])
|
||||
|
||||
written := filepath.Join(dir, "raw", "test-note.md")
|
||||
content, err := os.ReadFile(written)
|
||||
content, err := os.ReadFile(filepath.Join(dir, "knowledge", "test-note.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(content), "Some content.")
|
||||
}
|
||||
@@ -93,7 +111,7 @@ func TestWrite_IncludesFrontmatterWhenTypeProvided(t *testing.T) {
|
||||
h.Write(rec, req)
|
||||
|
||||
assert.Equal(t, http.StatusOK, rec.Code)
|
||||
content, err := os.ReadFile(filepath.Join(dir, "raw", "typed-note.md"))
|
||||
content, err := os.ReadFile(filepath.Join(dir, "knowledge", "typed-note.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(content), "type: concept")
|
||||
assert.Contains(t, string(content), "domain: software")
|
||||
@@ -109,7 +127,206 @@ func TestWrite_GeneratesFilenameIfAbsent(t *testing.T) {
|
||||
h.Write(rec, req)
|
||||
|
||||
assert.Equal(t, http.StatusOK, rec.Code)
|
||||
entries, _ := os.ReadDir(filepath.Join(dir, "raw"))
|
||||
assert.Len(t, entries, 1)
|
||||
assert.True(t, strings.HasSuffix(entries[0].Name(), ".md"))
|
||||
entries, _ := os.ReadDir(filepath.Join(dir, "knowledge"))
|
||||
// +1 because setup already wrote tdd.md
|
||||
assert.Len(t, entries, 2)
|
||||
assert.True(t, strings.HasSuffix(entries[1].Name(), ".md"))
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// POST /ingest
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
func TestIngest_Validation(t *testing.T) {
|
||||
cases := []struct {
|
||||
name string
|
||||
body map[string]any
|
||||
}{
|
||||
{"missing content", map[string]any{"source": "test-source"}},
|
||||
{"missing source", map[string]any{"content": "some content"}},
|
||||
{"whitespace content", map[string]any{"content": " ", "source": "test-source"}},
|
||||
{"whitespace source", map[string]any{"content": "some content", "source": " "}},
|
||||
}
|
||||
for _, tc := range cases {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
_, h := setup(t)
|
||||
body, _ := json.Marshal(tc.body)
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.Ingest(rec, req)
|
||||
|
||||
assert.Equal(t, http.StatusBadRequest, rec.Code)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestIngest_Success(t *testing.T) {
|
||||
_, h := setup(t)
|
||||
body, _ := json.Marshal(map[string]any{
|
||||
"content": "some content about shape-up methodology",
|
||||
"source": "shape-up-book",
|
||||
"dry_run": true,
|
||||
})
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.Ingest(rec, req)
|
||||
|
||||
require.Equal(t, http.StatusOK, rec.Code)
|
||||
var resp map[string]any
|
||||
require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
|
||||
pages, ok := resp["pages"]
|
||||
require.True(t, ok, "response must have pages field")
|
||||
pagesSlice, ok := pages.([]any)
|
||||
require.True(t, ok, "pages must be an array")
|
||||
assert.NotEmpty(t, pagesSlice)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// POST /ingest-path
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
func TestIngestPath_MissingPath(t *testing.T) {
|
||||
_, h := setup(t)
|
||||
body, _ := json.Marshal(map[string]any{"source": "test-source"})
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.IngestPath(rec, req)
|
||||
|
||||
assert.Equal(t, http.StatusBadRequest, rec.Code)
|
||||
}
|
||||
|
||||
func TestIngestPath_File(t *testing.T) {
|
||||
_, h := setup(t)
|
||||
|
||||
// Create a temp file with content
|
||||
dir := t.TempDir()
|
||||
f := filepath.Join(dir, "doc.md")
|
||||
require.NoError(t, os.WriteFile(f, []byte("# Hello\nThis is markdown content."), 0o644))
|
||||
|
||||
body, _ := json.Marshal(map[string]any{
|
||||
"path": f,
|
||||
"source": "test-doc",
|
||||
"dry_run": true,
|
||||
})
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.IngestPath(rec, req)
|
||||
|
||||
require.Equal(t, http.StatusOK, rec.Code)
|
||||
var resp map[string]any
|
||||
require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
|
||||
pages, ok := resp["pages"]
|
||||
require.True(t, ok, "response must have pages field")
|
||||
pagesSlice, ok := pages.([]any)
|
||||
require.True(t, ok, "pages must be an array")
|
||||
assert.NotEmpty(t, pagesSlice)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// POST /ingest-raw
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
func TestIngestRaw_Validation(t *testing.T) {
|
||||
cases := []struct {
|
||||
name string
|
||||
body map[string]any
|
||||
}{
|
||||
{"missing source", map[string]any{"pages": []any{map[string]any{"title": "X", "type": "concept", "content": "x"}}}},
|
||||
{"missing pages", map[string]any{"source": "test-source"}},
|
||||
{"empty pages", map[string]any{"source": "test-source", "pages": []any{}}},
|
||||
}
|
||||
for _, tc := range cases {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
_, h := setup(t)
|
||||
body, _ := json.Marshal(tc.body)
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.IngestRaw(rec, req)
|
||||
|
||||
assert.Equal(t, http.StatusBadRequest, rec.Code)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestIngestRaw_Success(t *testing.T) {
|
||||
dir, h := setup(t)
|
||||
body, _ := json.Marshal(map[string]any{
|
||||
"source": "test-article",
|
||||
"pages": []any{
|
||||
map[string]any{"title": "Test Article", "type": "source", "subtype": "article", "domain": "Testing", "content": "## Summary\n\nThis is a test article about [[Test Concept]].\n"},
|
||||
map[string]any{"title": "Test Concept", "type": "concept", "domain": "Testing", "content": "A concept for testing.\n"},
|
||||
},
|
||||
})
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.IngestRaw(rec, req)
|
||||
|
||||
require.Equal(t, http.StatusOK, rec.Code)
|
||||
var resp map[string]any
|
||||
require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
|
||||
pages := resp["pages"].([]any)
|
||||
assert.Len(t, pages, 2)
|
||||
|
||||
// Verify files were written
|
||||
sourcePath := filepath.Join(dir, "wiki", "sources", "test-article.md")
|
||||
assert.FileExists(t, sourcePath)
|
||||
conceptPath := filepath.Join(dir, "wiki", "concepts", "test-concept.md")
|
||||
assert.FileExists(t, conceptPath)
|
||||
}
|
||||
|
||||
func TestIngestRaw_DryRun(t *testing.T) {
|
||||
dir, h := setup(t)
|
||||
body, _ := json.Marshal(map[string]any{
|
||||
"source": "dry-run-test",
|
||||
"pages": []any{
|
||||
map[string]any{"title": "Dry Run Source", "type": "source", "subtype": "article", "content": "Content."},
|
||||
},
|
||||
"dry_run": true,
|
||||
})
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.IngestRaw(rec, req)
|
||||
|
||||
require.Equal(t, http.StatusOK, rec.Code)
|
||||
var resp map[string]any
|
||||
require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
|
||||
pages := resp["pages"].([]any)
|
||||
assert.NotEmpty(t, pages)
|
||||
|
||||
// Verify no files were written
|
||||
sourcePath := filepath.Join(dir, "wiki", "sources", "dry-run-test.md")
|
||||
assert.NoFileExists(t, sourcePath)
|
||||
}
|
||||
|
||||
func TestIngestPath_Directory(t *testing.T) {
|
||||
_, h := setup(t)
|
||||
|
||||
// Create a temp dir with one .md file
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.WriteFile(filepath.Join(dir, "notes.md"), []byte("# Notes\nSome notes."), 0o644))
|
||||
|
||||
body, _ := json.Marshal(map[string]any{
|
||||
"path": dir,
|
||||
"dry_run": true,
|
||||
})
|
||||
req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body))
|
||||
rec := httptest.NewRecorder()
|
||||
|
||||
h.IngestPath(rec, req)
|
||||
|
||||
require.Equal(t, http.StatusOK, rec.Code)
|
||||
var resp map[string]any
|
||||
require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
|
||||
pages, ok := resp["pages"]
|
||||
require.True(t, ok, "response must have pages field")
|
||||
pagesSlice, ok := pages.([]any)
|
||||
require.True(t, ok, "pages must be an array")
|
||||
assert.NotEmpty(t, pagesSlice)
|
||||
}
|
||||
|
||||
39
ingestion/internal/extract/extract.go
Normal file
39
ingestion/internal/extract/extract.go
Normal file
@@ -0,0 +1,39 @@
|
||||
// ingestion/internal/extract/extract.go
|
||||
package extract
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// Text reads the file at path and returns its plain-text content.
|
||||
// Supported extensions: .md, .txt (passthrough), .pdf (via pdftotext).
|
||||
func Text(path string) (string, error) {
|
||||
ext := strings.ToLower(fileExt(path))
|
||||
switch ext {
|
||||
case ".md", ".txt":
|
||||
b, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("read %s: %w", path, err)
|
||||
}
|
||||
return string(b), nil
|
||||
case ".pdf":
|
||||
return extractPDF(path)
|
||||
default:
|
||||
return "", fmt.Errorf("unsupported file extension: %s", ext)
|
||||
}
|
||||
}
|
||||
|
||||
// fileExt returns the file extension including the dot, lowercased.
|
||||
func fileExt(path string) string {
|
||||
for i := len(path) - 1; i >= 0; i-- {
|
||||
if path[i] == '.' {
|
||||
return path[i:]
|
||||
}
|
||||
if path[i] == '/' || path[i] == '\\' {
|
||||
break
|
||||
}
|
||||
}
|
||||
return ""
|
||||
}
|
||||
62
ingestion/internal/extract/extract_test.go
Normal file
62
ingestion/internal/extract/extract_test.go
Normal file
@@ -0,0 +1,62 @@
|
||||
// ingestion/internal/extract/extract_test.go
|
||||
package extract
|
||||
|
||||
import (
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestText_Markdown(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "note.md")
|
||||
require.NoError(t, os.WriteFile(path, []byte("# Hello\n\nWorld."), 0o644))
|
||||
|
||||
got, err := Text(path)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "# Hello\n\nWorld.", got)
|
||||
}
|
||||
|
||||
func TestText_Txt(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "note.txt")
|
||||
require.NoError(t, os.WriteFile(path, []byte("plain text"), 0o644))
|
||||
|
||||
got, err := Text(path)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "plain text", got)
|
||||
}
|
||||
|
||||
func TestText_UnsupportedExtension(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "data.csv")
|
||||
require.NoError(t, os.WriteFile(path, []byte("a,b,c"), 0o644))
|
||||
|
||||
_, err := Text(path)
|
||||
assert.ErrorContains(t, err, "unsupported")
|
||||
}
|
||||
|
||||
func TestText_PDF(t *testing.T) {
|
||||
if _, err := exec.LookPath("pdftotext"); err != nil {
|
||||
t.Skip("pdftotext not available")
|
||||
}
|
||||
dir := t.TempDir()
|
||||
pdfPath := filepath.Join(dir, "test.pdf")
|
||||
|
||||
// Minimal valid PDF containing the text "Hello PDF".
|
||||
minimalPDF := "%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj\n" +
|
||||
"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj\n" +
|
||||
"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R/Contents 4 0 R/Resources<</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>>>>>>>endobj\n" +
|
||||
"4 0 obj<</Length 44>>\nstream\nBT /F1 12 Tf 100 700 Td (Hello PDF) Tj ET\nendstream\nendobj\n" +
|
||||
"xref\n0 5\n0000000000 65535 f\n0000000009 00000 n\n0000000058 00000 n\n0000000115 00000 n\n0000000310 00000 n\n" +
|
||||
"trailer<</Size 5/Root 1 0 R>>\nstartxref\n406\n%%EOF\n"
|
||||
require.NoError(t, os.WriteFile(pdfPath, []byte(minimalPDF), 0o644))
|
||||
|
||||
got, err := Text(pdfPath)
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, got, "Hello PDF")
|
||||
}
|
||||
28
ingestion/internal/extract/pdf.go
Normal file
28
ingestion/internal/extract/pdf.go
Normal file
@@ -0,0 +1,28 @@
|
||||
// ingestion/internal/extract/pdf.go
|
||||
package extract
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"os/exec"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// extractPDF runs pdftotext on path and returns the extracted text.
|
||||
// pdftotext must be installed (package: poppler-utils on Alpine/Debian, poppler on Homebrew).
|
||||
func extractPDF(path string) (string, error) {
|
||||
cmd := exec.Command("pdftotext", "-q", path, "-")
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
|
||||
if err := cmd.Run(); err != nil {
|
||||
errMsg := strings.TrimSpace(stderr.String())
|
||||
if errMsg == "" {
|
||||
errMsg = err.Error()
|
||||
}
|
||||
return "", fmt.Errorf("pdftotext: %s", errMsg)
|
||||
}
|
||||
|
||||
return strings.TrimSpace(stdout.String()), nil
|
||||
}
|
||||
119
ingestion/internal/llm/client.go
Normal file
119
ingestion/internal/llm/client.go
Normal file
@@ -0,0 +1,119 @@
|
||||
package llm
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Client calls an OpenAI-compatible chat completions endpoint.
|
||||
type Client struct {
|
||||
baseURL string
|
||||
apiKey string
|
||||
model string
|
||||
httpClient *http.Client
|
||||
}
|
||||
|
||||
// New constructs a Client.
|
||||
func New(baseURL, apiKey, model string, timeout time.Duration) *Client {
|
||||
return &Client{
|
||||
baseURL: strings.TrimRight(baseURL, "/"),
|
||||
apiKey: apiKey,
|
||||
model: model,
|
||||
httpClient: &http.Client{Timeout: timeout},
|
||||
}
|
||||
}
|
||||
|
||||
type chatRequest struct {
|
||||
Model string `json:"model"`
|
||||
Messages []message `json:"messages"`
|
||||
Temperature float64 `json:"temperature"`
|
||||
}
|
||||
|
||||
type message struct {
|
||||
Role string `json:"role"`
|
||||
Content string `json:"content"`
|
||||
}
|
||||
|
||||
type chatResponse struct {
|
||||
Choices []struct {
|
||||
Message message `json:"message"`
|
||||
} `json:"choices"`
|
||||
}
|
||||
|
||||
// Complete sends a system + user message and returns the assistant's reply.
|
||||
// Retries once on HTTP 429 using Retry-After header or 5s backoff.
|
||||
func (c *Client) Complete(ctx context.Context, system, user string) (string, error) {
|
||||
body := chatRequest{
|
||||
Model: c.model,
|
||||
Messages: []message{
|
||||
{Role: "system", Content: system},
|
||||
{Role: "user", Content: user},
|
||||
},
|
||||
Temperature: 0.2,
|
||||
}
|
||||
b, err := json.Marshal(body)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("marshal request: %w", err)
|
||||
}
|
||||
|
||||
do := func() (*http.Response, error) {
|
||||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.baseURL+"/chat/completions", bytes.NewReader(b))
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("build request: %w", err)
|
||||
}
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
if c.apiKey != "" {
|
||||
req.Header.Set("Authorization", "Bearer "+c.apiKey)
|
||||
}
|
||||
return c.httpClient.Do(req)
|
||||
}
|
||||
|
||||
resp, err := do()
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("call LLM: %w", err)
|
||||
}
|
||||
|
||||
if resp.StatusCode == http.StatusTooManyRequests {
|
||||
_ = resp.Body.Close()
|
||||
wait := 5 * time.Second
|
||||
if ra := resp.Header.Get("Retry-After"); ra != "" {
|
||||
if secs, err := strconv.Atoi(ra); err == nil {
|
||||
wait = time.Duration(secs) * time.Second
|
||||
}
|
||||
}
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return "", ctx.Err()
|
||||
case <-time.After(wait):
|
||||
}
|
||||
resp, err = do()
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("retry LLM call: %w", err)
|
||||
}
|
||||
}
|
||||
defer resp.Body.Close() //nolint:errcheck
|
||||
|
||||
out, err := io.ReadAll(resp.Body)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("read response: %w", err)
|
||||
}
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
return "", fmt.Errorf("LLM returned %d: %s", resp.StatusCode, out)
|
||||
}
|
||||
|
||||
var cr chatResponse
|
||||
if err := json.Unmarshal(out, &cr); err != nil {
|
||||
return "", fmt.Errorf("parse response: %w", err)
|
||||
}
|
||||
if len(cr.Choices) == 0 {
|
||||
return "", fmt.Errorf("LLM returned no choices")
|
||||
}
|
||||
return cr.Choices[0].Message.Content, nil
|
||||
}
|
||||
86
ingestion/internal/llm/client_test.go
Normal file
86
ingestion/internal/llm/client_test.go
Normal file
@@ -0,0 +1,86 @@
|
||||
package llm
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func mockServer(t *testing.T, response string) *httptest.Server {
|
||||
t.Helper()
|
||||
return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
assert.Equal(t, "/chat/completions", r.URL.Path)
|
||||
assert.Equal(t, "application/json", r.Header.Get("Content-Type"))
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||||
"choices": []map[string]any{
|
||||
{"message": map[string]any{"role": "assistant", "content": response}},
|
||||
},
|
||||
})
|
||||
}))
|
||||
}
|
||||
|
||||
func TestClient_Complete(t *testing.T) {
|
||||
srv := mockServer(t, "hello world")
|
||||
defer srv.Close()
|
||||
|
||||
c := New(srv.URL, "", "test-model", 10*time.Second)
|
||||
got, err := c.Complete(context.Background(), "you are helpful", "say hello")
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "hello world", got)
|
||||
}
|
||||
|
||||
func TestClient_ReturnsErrorOnNon200(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
http.Error(w, "overloaded", http.StatusServiceUnavailable)
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
c := New(srv.URL, "", "test-model", 10*time.Second)
|
||||
_, err := c.Complete(context.Background(), "sys", "user")
|
||||
assert.Error(t, err)
|
||||
}
|
||||
|
||||
func TestClient_SendsAuthHeader(t *testing.T) {
|
||||
var gotAuth string
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
gotAuth = r.Header.Get("Authorization")
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||||
"choices": []map[string]any{{"message": map[string]any{"content": "ok"}}},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
c := New(srv.URL, "my-key", "test-model", 10*time.Second)
|
||||
_, err := c.Complete(context.Background(), "sys", "user")
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "Bearer my-key", gotAuth)
|
||||
}
|
||||
|
||||
func TestClient_Retries429(t *testing.T) {
|
||||
calls := 0
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
calls++
|
||||
if calls == 1 {
|
||||
w.Header().Set("Retry-After", "0")
|
||||
w.WriteHeader(http.StatusTooManyRequests)
|
||||
return
|
||||
}
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||||
"choices": []map[string]any{{"message": map[string]any{"content": "retried"}}},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
c := New(srv.URL, "", "test-model", 10*time.Second)
|
||||
got, err := c.Complete(context.Background(), "sys", "user")
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "retried", got)
|
||||
assert.Equal(t, 2, calls)
|
||||
}
|
||||
91
ingestion/internal/pipeline/backfill.go
Normal file
91
ingestion/internal/pipeline/backfill.go
Normal file
@@ -0,0 +1,91 @@
|
||||
// ingestion/internal/pipeline/backfill.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
// BackfillRefs walks wiki/sources/ and injects source back-references into every
|
||||
// concept and entity page that each source links to.
|
||||
// Changes for all sources are accumulated in memory before writing, so multiple
|
||||
// sources referencing the same concept are merged in one pass.
|
||||
// Deduplication is handled by wiki.Merge — running this multiple times is safe.
|
||||
// Returns the number of concept/entity pages written.
|
||||
func BackfillRefs(ctx context.Context, brainDir string) (int, error) {
|
||||
inventory, err := wiki.LoadInventory(brainDir)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("load inventory: %w", err)
|
||||
}
|
||||
|
||||
sourcesDir := filepath.Join(brainDir, "wiki", "sources")
|
||||
entries, err := os.ReadDir(sourcesDir)
|
||||
if err != nil {
|
||||
if os.IsNotExist(err) {
|
||||
return 0, nil
|
||||
}
|
||||
return 0, fmt.Errorf("read sources dir: %w", err)
|
||||
}
|
||||
|
||||
// Accumulate all changes before writing: relPath → updated Page.
|
||||
// Collecting first means two sources that both link the same concept
|
||||
// get both refs merged before a single write.
|
||||
pending := make(map[string]wiki.Page)
|
||||
|
||||
for _, e := range entries {
|
||||
if ctx.Err() != nil {
|
||||
return 0, ctx.Err()
|
||||
}
|
||||
if e.IsDir() || !strings.HasSuffix(e.Name(), ".md") {
|
||||
continue
|
||||
}
|
||||
|
||||
b, err := os.ReadFile(filepath.Join(sourcesDir, e.Name()))
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
sourceContent := string(b)
|
||||
sourceSlug := strings.TrimSuffix(e.Name(), ".md")
|
||||
sourceTitle := extractTitle(sourceContent)
|
||||
if sourceTitle == "" {
|
||||
sourceTitle = sourceSlug
|
||||
}
|
||||
sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
|
||||
|
||||
for slug := range extractWikilinks(sourceContent) {
|
||||
if slug == sourceSlug {
|
||||
continue
|
||||
}
|
||||
pt, ok := findInInventory(slug, inventory)
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
relPath := "wiki/" + string(pt) + "/" + slug + ".md"
|
||||
|
||||
// Start from already-accumulated version if we've seen this page.
|
||||
page, seen := pending[relPath]
|
||||
if !seen {
|
||||
raw, err := os.ReadFile(filepath.Join(brainDir, filepath.FromSlash(relPath)))
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
page = wiki.Page{Path: relPath, Content: string(raw)}
|
||||
}
|
||||
pending[relPath] = addSourceRef(page, sourceRef)
|
||||
}
|
||||
}
|
||||
|
||||
for relPath, page := range pending {
|
||||
dest := filepath.Join(brainDir, filepath.FromSlash(relPath))
|
||||
if err := os.WriteFile(dest, []byte(page.Content), 0o644); err != nil {
|
||||
return 0, fmt.Errorf("write %s: %w", relPath, err)
|
||||
}
|
||||
}
|
||||
|
||||
return len(pending), nil
|
||||
}
|
||||
107
ingestion/internal/pipeline/backfill_test.go
Normal file
107
ingestion/internal/pipeline/backfill_test.go
Normal file
@@ -0,0 +1,107 @@
|
||||
// ingestion/internal/pipeline/backfill_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"context"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func setupBrainDir(t *testing.T) string {
|
||||
t.Helper()
|
||||
dir := t.TempDir()
|
||||
for _, sub := range []string{"wiki/sources", "wiki/concepts", "wiki/entities"} {
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, sub), 0o755))
|
||||
}
|
||||
return dir
|
||||
}
|
||||
|
||||
func writeFile(t *testing.T, path, content string) {
|
||||
t.Helper()
|
||||
require.NoError(t, os.MkdirAll(filepath.Dir(path), 0o755))
|
||||
require.NoError(t, os.WriteFile(path, []byte(content), 0o644))
|
||||
}
|
||||
|
||||
func TestBackfillRefs_UpdatesConcept(t *testing.T) {
|
||||
dir := setupBrainDir(t)
|
||||
writeFile(t, filepath.Join(dir, "wiki/sources/shape-up.md"),
|
||||
"---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[betting|Betting]].\n")
|
||||
writeFile(t, filepath.Join(dir, "wiki/concepts/betting.md"),
|
||||
"---\ntitle: Betting\n---\n\n## Definition\n\nA resource allocation technique.\n")
|
||||
|
||||
n, err := BackfillRefs(context.Background(), dir)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, 1, n)
|
||||
|
||||
got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/betting.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(got), "## Sources")
|
||||
assert.Contains(t, string(got), "[[shape-up|Shape Up]]")
|
||||
assert.Contains(t, string(got), "## Definition") // original content preserved
|
||||
}
|
||||
|
||||
func TestBackfillRefs_Deduplication(t *testing.T) {
|
||||
dir := setupBrainDir(t)
|
||||
writeFile(t, filepath.Join(dir, "wiki/sources/shape-up.md"),
|
||||
"---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[betting|Betting]].\n")
|
||||
writeFile(t, filepath.Join(dir, "wiki/concepts/betting.md"),
|
||||
"---\ntitle: Betting\n---\n\n## Definition\n\nA technique.\n")
|
||||
|
||||
// Run twice — should not duplicate the ref.
|
||||
_, err := BackfillRefs(context.Background(), dir)
|
||||
require.NoError(t, err)
|
||||
_, err = BackfillRefs(context.Background(), dir)
|
||||
require.NoError(t, err)
|
||||
|
||||
got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/betting.md"))
|
||||
require.NoError(t, err)
|
||||
|
||||
count := 0
|
||||
for _, line := range splitLines(string(got)) {
|
||||
if line == "- [[shape-up|Shape Up]]" {
|
||||
count++
|
||||
}
|
||||
}
|
||||
assert.Equal(t, 1, count, "ref should appear exactly once after two runs")
|
||||
}
|
||||
|
||||
func TestBackfillRefs_MultipleSources(t *testing.T) {
|
||||
dir := setupBrainDir(t)
|
||||
writeFile(t, filepath.Join(dir, "wiki/sources/book-a.md"),
|
||||
"---\ntitle: Book A\n---\n\n## Summary\n\nSee [[shaping|Shaping]].\n")
|
||||
writeFile(t, filepath.Join(dir, "wiki/sources/book-b.md"),
|
||||
"---\ntitle: Book B\n---\n\n## Summary\n\nAlso [[shaping|Shaping]].\n")
|
||||
writeFile(t, filepath.Join(dir, "wiki/concepts/shaping.md"),
|
||||
"---\ntitle: Shaping\n---\n\n## Definition\n\nA design activity.\n")
|
||||
|
||||
n, err := BackfillRefs(context.Background(), dir)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, 1, n) // one concept page written
|
||||
|
||||
got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/shaping.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(got), "[[book-a|Book A]]")
|
||||
assert.Contains(t, string(got), "[[book-b|Book B]]")
|
||||
}
|
||||
|
||||
func TestBackfillRefs_NoSourcesDir(t *testing.T) {
|
||||
dir := t.TempDir() // no wiki/sources subdir
|
||||
n, err := BackfillRefs(context.Background(), dir)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, 0, n)
|
||||
}
|
||||
|
||||
func TestBackfillRefs_SkipsUnknownSlugs(t *testing.T) {
|
||||
dir := setupBrainDir(t)
|
||||
// Source links to a slug not in inventory and not on disk.
|
||||
writeFile(t, filepath.Join(dir, "wiki/sources/article.md"),
|
||||
"---\ntitle: Article\n---\n\n## Summary\n\nSee [[ghost-slug|Ghost]].\n")
|
||||
|
||||
n, err := BackfillRefs(context.Background(), dir)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, 0, n)
|
||||
}
|
||||
106
ingestion/internal/pipeline/build.go
Normal file
106
ingestion/internal/pipeline/build.go
Normal file
@@ -0,0 +1,106 @@
|
||||
// ingestion/internal/pipeline/build.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
// BuildPages converts RawPages from the LLM into wiki.Pages with computed slugs,
|
||||
// paths, and YAML frontmatter. sourceSlug is the slug of the source being ingested
|
||||
// (derived from the filename, not the LLM title). Pages whose title resolves to an
|
||||
// empty slug are skipped and returned as warnings instead.
|
||||
func BuildPages(rawPages []RawPage, sourceSlug, date string) ([]wiki.Page, []string) {
|
||||
out := make([]wiki.Page, 0, len(rawPages))
|
||||
var warnings []string
|
||||
for _, rp := range rawPages {
|
||||
slug := computeSlug(rp, sourceSlug)
|
||||
if slug == "" {
|
||||
warnings = append(warnings, fmt.Sprintf("skipped page with empty title (type: %s)", rp.Type))
|
||||
continue
|
||||
}
|
||||
out = append(out, buildPage(rp, sourceSlug, date))
|
||||
}
|
||||
return out, warnings
|
||||
}
|
||||
|
||||
func computeSlug(rp RawPage, sourceSlug string) string {
|
||||
if rp.Type == "source" {
|
||||
return sourceSlug
|
||||
}
|
||||
return wiki.Slug(rp.Title)
|
||||
}
|
||||
|
||||
func buildPage(rp RawPage, sourceSlug, date string) wiki.Page {
|
||||
var slug, dir string
|
||||
switch rp.Type {
|
||||
case "source":
|
||||
slug = sourceSlug
|
||||
dir = "wiki/sources"
|
||||
case "concept":
|
||||
slug = wiki.Slug(rp.Title)
|
||||
dir = "wiki/concepts"
|
||||
case "entity":
|
||||
slug = wiki.Slug(rp.Title)
|
||||
dir = "wiki/entities"
|
||||
default:
|
||||
slug = wiki.Slug(rp.Title)
|
||||
dir = "wiki/" + rp.Type
|
||||
}
|
||||
|
||||
path := dir + "/" + slug + ".md"
|
||||
fm := buildFrontmatter(rp, date)
|
||||
|
||||
return wiki.Page{
|
||||
Path: path,
|
||||
Content: fm + "\n" + rp.Content,
|
||||
}
|
||||
}
|
||||
|
||||
func buildFrontmatter(rp RawPage, date string) string {
|
||||
var sb strings.Builder
|
||||
sb.WriteString("---\n")
|
||||
fmt.Fprintf(&sb, "title: %s\n", yamlScalar(rp.Title))
|
||||
|
||||
switch rp.Type {
|
||||
case "source":
|
||||
subtype := rp.Subtype
|
||||
if subtype == "" {
|
||||
subtype = "article"
|
||||
}
|
||||
fmt.Fprintf(&sb, "type: %s\n", yamlScalar(subtype))
|
||||
if rp.Domain != "" {
|
||||
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||
}
|
||||
fmt.Fprintf(&sb, "date_ingested: %s\n", date)
|
||||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||
case "concept":
|
||||
if rp.Domain != "" {
|
||||
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||
}
|
||||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||
case "entity":
|
||||
if rp.Subtype != "" {
|
||||
fmt.Fprintf(&sb, "type: %s\n", yamlScalar(rp.Subtype))
|
||||
}
|
||||
if rp.Domain != "" {
|
||||
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||
}
|
||||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||
default:
|
||||
if rp.Domain != "" {
|
||||
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||
}
|
||||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||
}
|
||||
|
||||
fmt.Fprintf(&sb, "aliases:\n - %s\n", yamlScalar(rp.Title))
|
||||
sb.WriteString("---\n")
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
func yamlScalar(s string) string {
|
||||
return "'" + strings.ReplaceAll(s, "'", "''") + "'"
|
||||
}
|
||||
167
ingestion/internal/pipeline/build_test.go
Normal file
167
ingestion/internal/pipeline/build_test.go
Normal file
@@ -0,0 +1,167 @@
|
||||
// ingestion/internal/pipeline/build_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestBuildPages_SourcePage(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{
|
||||
Title: "Shape Up",
|
||||
Type: "source",
|
||||
Subtype: "book",
|
||||
Domain: "product-strategy",
|
||||
Content: "## Summary\n\nA book about shaping product work.\n",
|
||||
},
|
||||
}
|
||||
pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.Empty(t, warnings)
|
||||
|
||||
p := pages[0]
|
||||
assert.Equal(t, "wiki/sources/shape-up.md", p.Path)
|
||||
assert.Contains(t, p.Content, "title: 'Shape Up'")
|
||||
assert.Contains(t, p.Content, "type: 'book'")
|
||||
assert.Contains(t, p.Content, "domain: 'product-strategy'")
|
||||
assert.Contains(t, p.Content, "date_ingested: 2026-04-23")
|
||||
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||||
assert.Contains(t, p.Content, "aliases:\n - 'Shape Up'")
|
||||
assert.Contains(t, p.Content, "## Summary")
|
||||
assert.True(t, strings.HasPrefix(p.Content, "---\n"), "content must start with frontmatter")
|
||||
}
|
||||
|
||||
func TestBuildPages_ConceptPage(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{
|
||||
Title: "Betting",
|
||||
Type: "concept",
|
||||
Domain: "product-strategy",
|
||||
Content: "## Definition\n\nA resource allocation technique.\n",
|
||||
},
|
||||
}
|
||||
pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.Empty(t, warnings)
|
||||
|
||||
p := pages[0]
|
||||
assert.Equal(t, "wiki/concepts/betting.md", p.Path)
|
||||
assert.Contains(t, p.Content, "title: 'Betting'")
|
||||
assert.Contains(t, p.Content, "domain: 'product-strategy'")
|
||||
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||||
assert.Contains(t, p.Content, "aliases:\n - 'Betting'")
|
||||
assert.NotContains(t, p.Content, "date_ingested")
|
||||
assert.Contains(t, p.Content, "## Definition")
|
||||
}
|
||||
|
||||
func TestBuildPages_EntityPage(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{
|
||||
Title: "Ryan Singer",
|
||||
Type: "entity",
|
||||
Subtype: "person",
|
||||
Domain: "product-strategy",
|
||||
Content: "## Description\n\nA product designer.\n",
|
||||
},
|
||||
}
|
||||
pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.Empty(t, warnings)
|
||||
|
||||
p := pages[0]
|
||||
assert.Equal(t, "wiki/entities/ryan-singer.md", p.Path)
|
||||
assert.Contains(t, p.Content, "title: 'Ryan Singer'")
|
||||
assert.Contains(t, p.Content, "type: 'person'")
|
||||
assert.Contains(t, p.Content, "domain: 'product-strategy'")
|
||||
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||||
assert.Contains(t, p.Content, "aliases:\n - 'Ryan Singer'")
|
||||
assert.NotContains(t, p.Content, "date_ingested")
|
||||
}
|
||||
|
||||
func TestBuildPages_SourceSlugUsedForSourcePage(t *testing.T) {
|
||||
// LLM title differs from filename — pipeline uses sourceSlug for the source page path.
|
||||
raw := []RawPage{
|
||||
{Title: "FinBERT: A Pretrained Model", Type: "source", Subtype: "article", Content: "## Summary\n\nA model.\n"},
|
||||
}
|
||||
pages, _ := BuildPages(raw, "finbert-huggingface", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.Equal(t, "wiki/sources/finbert-huggingface.md", pages[0].Path)
|
||||
}
|
||||
|
||||
func TestBuildPages_ConceptSlugDerivedFromTitle(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{Title: "Domain-Driven Design", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||||
}
|
||||
pages, _ := BuildPages(raw, "some-source", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.Equal(t, "wiki/concepts/domain-driven-design.md", pages[0].Path)
|
||||
}
|
||||
|
||||
func TestBuildPages_SourceDefaultSubtype(t *testing.T) {
|
||||
// If subtype is omitted for a source, default to "article"
|
||||
raw := []RawPage{
|
||||
{Title: "Some Post", Type: "source", Content: "## Summary\n\nA post.\n"},
|
||||
}
|
||||
pages, _ := BuildPages(raw, "some-post", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.Contains(t, pages[0].Content, "type: 'article'")
|
||||
}
|
||||
|
||||
func TestBuildPages_OmitsDomainWhenEmpty(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{Title: "Betting", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||||
}
|
||||
pages, _ := BuildPages(raw, "src", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.NotContains(t, pages[0].Content, "domain:")
|
||||
}
|
||||
|
||||
func TestBuildPages_MultiplePages(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{Title: "Shape Up", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
|
||||
{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
|
||||
{Title: "Ryan Singer", Type: "entity", Subtype: "person", Content: "## Description\n\nA designer.\n"},
|
||||
}
|
||||
pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
|
||||
require.Len(t, pages, 3)
|
||||
assert.Equal(t, "wiki/sources/shape-up.md", pages[0].Path)
|
||||
assert.Equal(t, "wiki/concepts/betting.md", pages[1].Path)
|
||||
assert.Equal(t, "wiki/entities/ryan-singer.md", pages[2].Path)
|
||||
}
|
||||
|
||||
func TestBuildPages_TitleWithColon(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{Title: "Shape Up: The Basecamp Method", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
|
||||
}
|
||||
pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
// Title with colon must be quoted in YAML
|
||||
assert.Contains(t, pages[0].Content, "title: 'Shape Up: The Basecamp Method'")
|
||||
assert.Contains(t, pages[0].Content, "aliases:\n - 'Shape Up: The Basecamp Method'")
|
||||
}
|
||||
|
||||
func TestBuildPages_EntityNoSubtype(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{Title: "Basecamp", Type: "entity", Content: "## Description\n\nA company.\n"},
|
||||
}
|
||||
pages, _ := BuildPages(raw, "src", "2026-04-23")
|
||||
require.Len(t, pages, 1)
|
||||
assert.NotContains(t, pages[0].Content, "type:")
|
||||
assert.Contains(t, pages[0].Content, "title: 'Basecamp'")
|
||||
}
|
||||
|
||||
func TestBuildPages_EmptyTitleSkippedWithWarning(t *testing.T) {
|
||||
raw := []RawPage{
|
||||
{Title: "", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||||
{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
|
||||
}
|
||||
pages, warnings := BuildPages(raw, "src", "2026-04-23")
|
||||
require.Len(t, pages, 1, "empty-title page should be skipped")
|
||||
assert.Equal(t, "wiki/concepts/betting.md", pages[0].Path)
|
||||
assert.Len(t, warnings, 1)
|
||||
assert.Contains(t, warnings[0], "empty title")
|
||||
}
|
||||
39
ingestion/internal/pipeline/chunk.go
Normal file
39
ingestion/internal/pipeline/chunk.go
Normal file
@@ -0,0 +1,39 @@
|
||||
// ingestion/internal/pipeline/chunk.go
|
||||
package pipeline
|
||||
|
||||
import "strings"
|
||||
|
||||
// Chunk splits content into pieces of at most maxSize bytes, splitting at
|
||||
// paragraph boundaries (\n\n). If maxSize <= 0, returns content as one chunk.
|
||||
func Chunk(content string, maxSize int) []string {
|
||||
content = strings.TrimSpace(content)
|
||||
if maxSize <= 0 || len(content) <= maxSize {
|
||||
return []string{content}
|
||||
}
|
||||
|
||||
paragraphs := strings.Split(content, "\n\n")
|
||||
var chunks []string
|
||||
var cur strings.Builder
|
||||
|
||||
for _, para := range paragraphs {
|
||||
para = strings.TrimSpace(para)
|
||||
if para == "" {
|
||||
continue
|
||||
}
|
||||
addition := para
|
||||
if cur.Len() > 0 {
|
||||
addition = "\n\n" + para
|
||||
}
|
||||
if cur.Len() > 0 && cur.Len()+len(addition) > maxSize {
|
||||
chunks = append(chunks, cur.String())
|
||||
cur.Reset()
|
||||
cur.WriteString(para)
|
||||
} else {
|
||||
cur.WriteString(addition)
|
||||
}
|
||||
}
|
||||
if cur.Len() > 0 {
|
||||
chunks = append(chunks, cur.String())
|
||||
}
|
||||
return chunks
|
||||
}
|
||||
36
ingestion/internal/pipeline/chunk_test.go
Normal file
36
ingestion/internal/pipeline/chunk_test.go
Normal file
@@ -0,0 +1,36 @@
|
||||
// ingestion/internal/pipeline/chunk_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
)
|
||||
|
||||
func TestChunk_NoChunkingWhenZero(t *testing.T) {
|
||||
content := strings.Repeat("word ", 1000)
|
||||
chunks := Chunk(content, 0)
|
||||
assert.Len(t, chunks, 1)
|
||||
}
|
||||
|
||||
func TestChunk_SplitsAtParagraph(t *testing.T) {
|
||||
content := "First paragraph here.\n\nSecond paragraph here."
|
||||
chunks := Chunk(content, 40)
|
||||
assert.Len(t, chunks, 2)
|
||||
assert.Equal(t, "First paragraph here.", chunks[0])
|
||||
assert.Equal(t, "Second paragraph here.", chunks[1])
|
||||
}
|
||||
|
||||
func TestChunk_SingleLargeParagraph(t *testing.T) {
|
||||
content := strings.Repeat("x", 100)
|
||||
chunks := Chunk(content, 50)
|
||||
assert.Len(t, chunks, 1)
|
||||
}
|
||||
|
||||
func TestChunk_NoChunkingWhenContentFits(t *testing.T) {
|
||||
content := "Short content."
|
||||
chunks := Chunk(content, 1000)
|
||||
assert.Len(t, chunks, 1)
|
||||
assert.Equal(t, "Short content.", chunks[0])
|
||||
}
|
||||
70
ingestion/internal/pipeline/links.go
Normal file
70
ingestion/internal/pipeline/links.go
Normal file
@@ -0,0 +1,70 @@
|
||||
// ingestion/internal/pipeline/links.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"path/filepath"
|
||||
"regexp"
|
||||
"strings"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
// plainLinkRE matches [[Display Name]] — wikilinks without a slug pipe.
|
||||
// It does NOT match [[slug|Display]] (those already have a pipe).
|
||||
var plainLinkRE = regexp.MustCompile(`\[\[([^\]|]+)\]\]`)
|
||||
|
||||
// CanonicalizeLinks converts [[Display Name]] wikilinks to [[slug|Display Name]]
|
||||
// using a title→slug map built from the inventory and current batch.
|
||||
// Unknown titles are left as-is and returned as warnings.
|
||||
func CanonicalizeLinks(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) ([]wiki.Page, []string) {
|
||||
titleToSlug := buildTitleMap(pages, inventory)
|
||||
|
||||
var allWarnings []string
|
||||
out := make([]wiki.Page, len(pages))
|
||||
for i, p := range pages {
|
||||
newContent, warnings := canonicalizeContent(p.Content, titleToSlug)
|
||||
p.Content = newContent
|
||||
out[i] = p
|
||||
allWarnings = append(allWarnings, warnings...)
|
||||
}
|
||||
return out, allWarnings
|
||||
}
|
||||
|
||||
// buildTitleMap builds a lowercase-title → slug map from inventory and current batch.
|
||||
// Current batch entries take precedence over inventory (they may be updates).
|
||||
func buildTitleMap(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) map[string]string {
|
||||
m := make(map[string]string)
|
||||
for _, entries := range inventory {
|
||||
for _, e := range entries {
|
||||
m[strings.ToLower(e.Title)] = e.Slug
|
||||
}
|
||||
}
|
||||
// Current batch overrides inventory
|
||||
for _, p := range pages {
|
||||
title := extractTitle(p.Content)
|
||||
slug := strings.TrimSuffix(filepath.Base(p.Path), ".md")
|
||||
if title != "" && slug != "" {
|
||||
m[strings.ToLower(title)] = slug
|
||||
}
|
||||
}
|
||||
return m
|
||||
}
|
||||
|
||||
func canonicalizeContent(content string, titleToSlug map[string]string) (string, []string) {
|
||||
var warnings []string
|
||||
result := plainLinkRE.ReplaceAllStringFunc(content, func(match string) string {
|
||||
sub := plainLinkRE.FindStringSubmatch(match)
|
||||
if len(sub) < 2 {
|
||||
return match
|
||||
}
|
||||
displayName := sub[1]
|
||||
slug, ok := titleToSlug[strings.ToLower(displayName)]
|
||||
if !ok {
|
||||
warnings = append(warnings, fmt.Sprintf("unknown wikilink: [[%s]]", displayName))
|
||||
return match
|
||||
}
|
||||
return "[[" + slug + "|" + displayName + "]]"
|
||||
})
|
||||
return result, warnings
|
||||
}
|
||||
125
ingestion/internal/pipeline/links_test.go
Normal file
125
ingestion/internal/pipeline/links_test.go
Normal file
@@ -0,0 +1,125 @@
|
||||
// ingestion/internal/pipeline/links_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
func TestCanonicalizeLinks_KnownTitle(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/shape-up.md",
|
||||
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
|
||||
},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {
|
||||
{Slug: "betting", Title: "Betting"},
|
||||
},
|
||||
}
|
||||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||
require.Len(t, got, 1)
|
||||
assert.Empty(t, warnings)
|
||||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||
assert.NotContains(t, got[0].Content, "[[Betting]]")
|
||||
}
|
||||
|
||||
func TestCanonicalizeLinks_UnknownTitleLeftAsIs(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/shape-up.md",
|
||||
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Ghost Concept]].\n",
|
||||
},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{}
|
||||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||
require.Len(t, got, 1)
|
||||
assert.NotEmpty(t, warnings)
|
||||
assert.Contains(t, got[0].Content, "[[Ghost Concept]]")
|
||||
}
|
||||
|
||||
func TestCanonicalizeLinks_AlreadyCanonicalLinkUntouched(t *testing.T) {
|
||||
// Links already in [[slug|Display]] format must not be double-converted
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/shape-up.md",
|
||||
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[betting|Betting]].\n",
|
||||
},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {
|
||||
{Slug: "betting", Title: "Betting"},
|
||||
},
|
||||
}
|
||||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||
require.Len(t, got, 1)
|
||||
assert.Empty(t, warnings)
|
||||
// Should remain exactly as-is — not double-wrapped
|
||||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||
assert.NotContains(t, got[0].Content, "[[betting|[[betting|Betting]]]]")
|
||||
}
|
||||
|
||||
func TestCanonicalizeLinks_CaseInsensitiveMatch(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/foo.md",
|
||||
Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[domain driven design]].\n",
|
||||
},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {
|
||||
{Slug: "domain-driven-design", Title: "Domain Driven Design"},
|
||||
},
|
||||
}
|
||||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||
require.Len(t, got, 1)
|
||||
assert.Empty(t, warnings)
|
||||
assert.Contains(t, got[0].Content, "[[domain-driven-design|domain driven design]]")
|
||||
}
|
||||
|
||||
func TestCanonicalizeLinks_CurrentBatchPagesResolved(t *testing.T) {
|
||||
// A concept created in the same batch should be canonicalizable
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/shape-up.md",
|
||||
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
|
||||
},
|
||||
{
|
||||
Path: "wiki/concepts/betting.md",
|
||||
Content: "---\ntitle: 'Betting'\n---\n\n## Definition\n\nA technique.\n",
|
||||
},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{} // empty — Betting is in the batch, not inventory
|
||||
|
||||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||
require.Len(t, got, 2)
|
||||
assert.Empty(t, warnings)
|
||||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||
}
|
||||
|
||||
func TestCanonicalizeLinks_MultipleLinksInOnePage(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/foo.md",
|
||||
Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[Betting]] and [[Shape Up]].\n",
|
||||
},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {
|
||||
{Slug: "betting", Title: "Betting"},
|
||||
},
|
||||
wiki.PageTypeSource: {
|
||||
{Slug: "shape-up", Title: "Shape Up"},
|
||||
},
|
||||
}
|
||||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||
require.Len(t, got, 1)
|
||||
assert.Empty(t, warnings)
|
||||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||
assert.Contains(t, got[0].Content, "[[shape-up|Shape Up]]")
|
||||
}
|
||||
110
ingestion/internal/pipeline/parse.go
Normal file
110
ingestion/internal/pipeline/parse.go
Normal file
@@ -0,0 +1,110 @@
|
||||
// ingestion/internal/pipeline/parse.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// RawPage is the LLM's output format — minimal structured data with no path or frontmatter.
|
||||
// The pipeline derives slugs, paths, and frontmatter from these fields.
|
||||
type RawPage struct {
|
||||
Title string `json:"title"`
|
||||
Type string `json:"type"` // "source" | "concept" | "entity"
|
||||
Subtype string `json:"subtype"` // entity: person|company|tool|model|framework|technology; source: article|pdf|book|video|note|project
|
||||
Domain string `json:"domain"`
|
||||
Content string `json:"content"` // Markdown body only — no frontmatter
|
||||
}
|
||||
|
||||
// ParseRawPages parses LLM output as a JSON array of RawPage objects.
|
||||
// If the output contains invalid JSON escape sequences (e.g. \. from Markdown),
|
||||
// it attempts repair before falling back to truncation recovery.
|
||||
func ParseRawPages(output string) ([]RawPage, []string) {
|
||||
output = strings.TrimSpace(output)
|
||||
if output == "" {
|
||||
return nil, []string{"LLM returned empty output"}
|
||||
}
|
||||
|
||||
output = stripFences(output)
|
||||
|
||||
// Fast path: valid JSON.
|
||||
var pages []RawPage
|
||||
if err := json.Unmarshal([]byte(output), &pages); err == nil {
|
||||
return pages, nil
|
||||
}
|
||||
|
||||
// Repair pass: fix invalid escape sequences (e.g. \. \d from Markdown content).
|
||||
repaired := repairJSON(output)
|
||||
if err := json.Unmarshal([]byte(repaired), &pages); err == nil {
|
||||
return pages, []string{"repaired invalid JSON escape sequences in LLM output"}
|
||||
}
|
||||
|
||||
// Truncation recovery: find last `}` that closes a complete object.
|
||||
idx := strings.LastIndex(repaired, "}")
|
||||
if idx < 0 {
|
||||
return nil, []string{"LLM output contained no complete JSON objects"}
|
||||
}
|
||||
|
||||
start := strings.Index(repaired, "[")
|
||||
if start < 0 {
|
||||
return nil, []string{"LLM output contained no JSON array opening bracket"}
|
||||
}
|
||||
|
||||
candidate := repaired[start:idx+1] + "]"
|
||||
if err := json.Unmarshal([]byte(candidate), &pages); err != nil {
|
||||
return nil, []string{fmt.Sprintf("truncation recovery failed: %v", err)}
|
||||
}
|
||||
|
||||
return pages, []string{fmt.Sprintf("LLM output was truncated; recovered %d page(s)", len(pages))}
|
||||
}
|
||||
|
||||
// repairJSON replaces invalid JSON escape sequences (e.g. \. \d \p) with
|
||||
// a properly escaped backslash followed by the same character.
|
||||
// It iterates byte-by-byte to correctly skip already-valid escape sequences
|
||||
// (including \\) without requiring lookbehind support.
|
||||
func repairJSON(s string) string {
|
||||
var b strings.Builder
|
||||
b.Grow(len(s))
|
||||
i := 0
|
||||
for i < len(s) {
|
||||
if s[i] != '\\' {
|
||||
b.WriteByte(s[i])
|
||||
i++
|
||||
continue
|
||||
}
|
||||
// We have a backslash. Peek at the next character.
|
||||
if i+1 >= len(s) {
|
||||
// Trailing backslash — emit as-is.
|
||||
b.WriteByte(s[i])
|
||||
i++
|
||||
continue
|
||||
}
|
||||
next := s[i+1]
|
||||
switch next {
|
||||
case '"', '\\', '/', 'b', 'f', 'n', 'r', 't', 'u':
|
||||
// Valid JSON escape sequence — emit both characters as-is.
|
||||
b.WriteByte(s[i])
|
||||
b.WriteByte(next)
|
||||
i += 2
|
||||
default:
|
||||
// Invalid escape — double the backslash.
|
||||
b.WriteByte('\\')
|
||||
b.WriteByte('\\')
|
||||
b.WriteByte(next)
|
||||
i += 2
|
||||
}
|
||||
}
|
||||
return b.String()
|
||||
}
|
||||
|
||||
func stripFences(s string) string {
|
||||
for _, prefix := range []string{"```json\n", "```json\r\n", "```\n", "```\r\n"} {
|
||||
if strings.HasPrefix(s, prefix) {
|
||||
s = strings.TrimPrefix(s, prefix)
|
||||
s = strings.TrimSuffix(strings.TrimSpace(s), "```")
|
||||
return strings.TrimSpace(s)
|
||||
}
|
||||
}
|
||||
return s
|
||||
}
|
||||
87
ingestion/internal/pipeline/parse_test.go
Normal file
87
ingestion/internal/pipeline/parse_test.go
Normal file
@@ -0,0 +1,87 @@
|
||||
// ingestion/internal/pipeline/parse_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestParseRawPages_ValidJSON(t *testing.T) {
|
||||
input := `[{"title":"Shape Up","type":"source","subtype":"book","domain":"product-strategy","content":"## Summary\n\nFoo."},{"title":"Betting","type":"concept","content":"## Definition\n\nA technique."}]`
|
||||
pages, warnings := ParseRawPages(input)
|
||||
require.Len(t, pages, 2)
|
||||
assert.Empty(t, warnings)
|
||||
assert.Equal(t, "Shape Up", pages[0].Title)
|
||||
assert.Equal(t, "source", pages[0].Type)
|
||||
assert.Equal(t, "book", pages[0].Subtype)
|
||||
assert.Equal(t, "product-strategy", pages[0].Domain)
|
||||
assert.Equal(t, "Betting", pages[1].Title)
|
||||
assert.Equal(t, "concept", pages[1].Type)
|
||||
assert.Empty(t, pages[1].Subtype)
|
||||
}
|
||||
|
||||
func TestParseRawPages_StripsFences(t *testing.T) {
|
||||
input := "```json\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"## Definition\\n\\nFoo.\"}]\n```"
|
||||
pages, warnings := ParseRawPages(input)
|
||||
require.Len(t, pages, 1)
|
||||
assert.Empty(t, warnings)
|
||||
assert.Equal(t, "Foo", pages[0].Title)
|
||||
}
|
||||
|
||||
func TestParseRawPages_TruncationRecovery(t *testing.T) {
|
||||
input := `[{"title":"Foo","type":"concept","content":"## Definition\n\nFoo."},{"title":"Bar","type":"concept","content":"trunc`
|
||||
pages, warnings := ParseRawPages(input)
|
||||
require.Len(t, pages, 1)
|
||||
assert.Equal(t, "Foo", pages[0].Title)
|
||||
assert.NotEmpty(t, warnings)
|
||||
}
|
||||
|
||||
func TestParseRawPages_EmptyInput(t *testing.T) {
|
||||
pages, warnings := ParseRawPages("")
|
||||
assert.Empty(t, pages)
|
||||
assert.NotEmpty(t, warnings)
|
||||
}
|
||||
|
||||
func TestParseRawPages_PlainFence(t *testing.T) {
|
||||
input := "```\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"ok\"}]\n```"
|
||||
pages, warnings := ParseRawPages(input)
|
||||
require.Len(t, pages, 1)
|
||||
assert.Empty(t, warnings)
|
||||
}
|
||||
|
||||
func TestParseRawPages_MissingTitle(t *testing.T) {
|
||||
// Missing title — still parsed, Title is empty string
|
||||
input := `[{"type":"concept","content":"## Definition\n\nFoo."}]`
|
||||
pages, warnings := ParseRawPages(input)
|
||||
require.Len(t, pages, 1)
|
||||
assert.Empty(t, warnings)
|
||||
assert.Empty(t, pages[0].Title)
|
||||
}
|
||||
|
||||
func TestParseRawPages_InvalidEscapeRepaired(t *testing.T) {
|
||||
// LLM copied markdown escaped list numbers (\.) into JSON — invalid escape
|
||||
raw := "[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"Step 4\\. Do it.\"}]"
|
||||
pages, warnings := ParseRawPages(raw)
|
||||
require.Len(t, pages, 1)
|
||||
assert.Equal(t, "Foo", pages[0].Title)
|
||||
assert.Contains(t, pages[0].Content, `4\.`)
|
||||
assert.NotEmpty(t, warnings) // repair warning
|
||||
}
|
||||
|
||||
func TestRepairJSON_FixesInvalidEscapes(t *testing.T) {
|
||||
cases := []struct {
|
||||
in string
|
||||
want string
|
||||
}{
|
||||
{`{"a":"foo\.bar"}`, `{"a":"foo\\.bar"}`},
|
||||
{`{"a":"\\n is fine"}`, `{"a":"\\n is fine"}`}, // valid \n untouched
|
||||
{`{"a":"\d+ items"}`, `{"a":"\\d+ items"}`},
|
||||
{`{"a":"already \\ escaped"}`, `{"a":"already \\ escaped"}`}, // valid \\ untouched
|
||||
}
|
||||
for _, tc := range cases {
|
||||
got := repairJSON(tc.in)
|
||||
assert.Equal(t, tc.want, got, "input: %s", tc.in)
|
||||
}
|
||||
}
|
||||
146
ingestion/internal/pipeline/pipeline.go
Normal file
146
ingestion/internal/pipeline/pipeline.go
Normal file
@@ -0,0 +1,146 @@
|
||||
// ingestion/internal/pipeline/pipeline.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
// CompleteFunc is the function signature for LLM calls.
|
||||
type CompleteFunc func(ctx context.Context, system, user string) (string, error)
|
||||
|
||||
// Config holds pipeline configuration.
|
||||
type Config struct {
|
||||
Complete CompleteFunc
|
||||
ChunkSize int // 0 = no chunking
|
||||
Schema string // overrides brain/schema.md when set (useful in tests)
|
||||
}
|
||||
|
||||
// Result is the outcome of a pipeline run.
|
||||
type Result struct {
|
||||
Pages []string // relative paths written (or would-be written in dry-run)
|
||||
Warnings []string
|
||||
}
|
||||
|
||||
// Run ingests content and writes structured wiki pages to brainDir/wiki/.
|
||||
// In dry-run mode, pages are returned but not written to disk.
|
||||
func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error) {
|
||||
inventory, err := wiki.LoadInventory(brainDir)
|
||||
if err != nil {
|
||||
return Result{}, fmt.Errorf("load inventory: %w", err)
|
||||
}
|
||||
|
||||
schema := cfg.Schema
|
||||
if schema == "" {
|
||||
schema = loadSchema(brainDir)
|
||||
}
|
||||
|
||||
sourceSlug := wiki.Slug(source)
|
||||
date := time.Now().UTC().Format("2006-01-02")
|
||||
chunks := Chunk(content, cfg.ChunkSize)
|
||||
|
||||
var allRaw []RawPage
|
||||
var allWarnings []string
|
||||
|
||||
for _, chunk := range chunks {
|
||||
userPrompt := BuildPrompt(schema, source, chunk, inventory)
|
||||
output, err := cfg.Complete(ctx, systemPrompt, userPrompt)
|
||||
if err != nil {
|
||||
return Result{}, fmt.Errorf("LLM call: %w", err)
|
||||
}
|
||||
raw, warnings := ParseRawPages(output)
|
||||
allRaw = append(allRaw, raw...)
|
||||
allWarnings = append(allWarnings, warnings...)
|
||||
}
|
||||
|
||||
return buildAndWrite(allRaw, sourceSlug, date, brainDir, source, inventory, allWarnings, dryRun)
|
||||
}
|
||||
|
||||
// RunRaw runs the pipeline on pre-parsed RawPages, skipping the LLM extraction
|
||||
// step. Use this when the caller has already produced the structured RawPage data
|
||||
// (e.g. from a more capable model or manual curation).
|
||||
func RunRaw(brainDir, source string, rawPages []RawPage, dryRun bool) (Result, error) {
|
||||
inventory, err := wiki.LoadInventory(brainDir)
|
||||
if err != nil {
|
||||
return Result{}, fmt.Errorf("load inventory: %w", err)
|
||||
}
|
||||
|
||||
sourceSlug := wiki.Slug(source)
|
||||
date := time.Now().UTC().Format("2006-01-02")
|
||||
|
||||
return buildAndWrite(rawPages, sourceSlug, date, brainDir, source, inventory, nil, dryRun)
|
||||
}
|
||||
|
||||
// buildAndWrite runs BuildPages through write for both Run and RunRaw.
|
||||
func buildAndWrite(rawPages []RawPage, sourceSlug, date, brainDir, source string, inventory map[wiki.PageType][]wiki.Entry, warnings []string, dryRun bool) (Result, error) {
|
||||
pages, buildWarnings := BuildPages(rawPages, sourceSlug, date)
|
||||
warnings = append(warnings, buildWarnings...)
|
||||
resolved := Resolve(pages, inventory)
|
||||
canonicalized, linkWarnings := CanonicalizeLinks(resolved, inventory)
|
||||
warnings = append(warnings, linkWarnings...)
|
||||
withRefs := injectSourceRefs(canonicalized, inventory, brainDir)
|
||||
merged := mergeAll(withRefs)
|
||||
|
||||
var written []string
|
||||
for _, page := range merged {
|
||||
if !dryRun {
|
||||
dest := filepath.Join(brainDir, filepath.FromSlash(page.Path))
|
||||
if err := os.MkdirAll(filepath.Dir(dest), 0o755); err != nil {
|
||||
return Result{}, fmt.Errorf("mkdir for %s: %w", page.Path, err)
|
||||
}
|
||||
if err := os.WriteFile(dest, []byte(page.Content), 0o644); err != nil {
|
||||
return Result{}, fmt.Errorf("write %s: %w", page.Path, err)
|
||||
}
|
||||
}
|
||||
written = append(written, page.Path)
|
||||
}
|
||||
|
||||
if !dryRun {
|
||||
if err := wiki.RebuildIndex(brainDir, date); err != nil {
|
||||
warnings = append(warnings, fmt.Sprintf("rebuild index: %v", err))
|
||||
}
|
||||
if err := wiki.AppendLog(brainDir, source, written, warnings, date); err != nil {
|
||||
warnings = append(warnings, fmt.Sprintf("append log: %v", err))
|
||||
}
|
||||
}
|
||||
|
||||
return Result{Pages: written, Warnings: warnings}, nil
|
||||
}
|
||||
|
||||
// mergeAll deduplicates pages by path, merging content from later occurrences.
|
||||
func mergeAll(pages []wiki.Page) []wiki.Page {
|
||||
order := make([]string, 0, len(pages))
|
||||
byPath := make(map[string]wiki.Page, len(pages))
|
||||
for _, p := range pages {
|
||||
if _, seen := byPath[p.Path]; !seen {
|
||||
order = append(order, p.Path)
|
||||
byPath[p.Path] = p
|
||||
} else {
|
||||
byPath[p.Path] = wiki.Merge(byPath[p.Path], p)
|
||||
}
|
||||
}
|
||||
result := make([]wiki.Page, 0, len(order))
|
||||
for _, path := range order {
|
||||
result = append(result, byPath[path])
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
const defaultSchema = `# Brain Wiki Schema
|
||||
Three page types: wiki/sources/, wiki/concepts/, wiki/entities/.
|
||||
See brain/schema.md for the full schema.
|
||||
`
|
||||
|
||||
func loadSchema(brainDir string) string {
|
||||
b, err := os.ReadFile(filepath.Join(brainDir, "schema.md"))
|
||||
if err != nil {
|
||||
return defaultSchema
|
||||
}
|
||||
return strings.TrimSpace(string(b))
|
||||
}
|
||||
139
ingestion/internal/pipeline/pipeline_test.go
Normal file
139
ingestion/internal/pipeline/pipeline_test.go
Normal file
@@ -0,0 +1,139 @@
|
||||
// ingestion/internal/pipeline/pipeline_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
|
||||
)
|
||||
|
||||
func TestRun_WritesPages(t *testing.T) {
|
||||
brainDir := t.TempDir()
|
||||
for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||||
}
|
||||
|
||||
llmResponse := mustJSON([]RawPage{
|
||||
{
|
||||
Title: "Test Article",
|
||||
Type: "source",
|
||||
Subtype: "article",
|
||||
Domain: "software-engineering",
|
||||
Content: "## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n[[Testing]]\n\n## Entities Mentioned\n\n## Open Questions Raised\n",
|
||||
},
|
||||
{
|
||||
Title: "Testing",
|
||||
Type: "concept",
|
||||
Domain: "software-engineering",
|
||||
Content: "## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n",
|
||||
},
|
||||
})
|
||||
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||||
"choices": []map[string]any{
|
||||
{"message": map[string]any{"role": "assistant", "content": llmResponse}},
|
||||
},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
cfg := Config{
|
||||
Complete: llm.New(srv.URL, "", "test-model", 30*time.Second).Complete,
|
||||
ChunkSize: 0,
|
||||
}
|
||||
|
||||
result, err := Run(context.Background(), cfg, brainDir, "An article about testing.", "test-article", false)
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, result.Pages, 2)
|
||||
|
||||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "test-article.md"))
|
||||
require.NoError(t, err)
|
||||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "concepts", "testing.md"))
|
||||
require.NoError(t, err)
|
||||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "index.md"))
|
||||
require.NoError(t, err)
|
||||
_, err = os.Stat(filepath.Join(brainDir, "log.md"))
|
||||
require.NoError(t, err)
|
||||
}
|
||||
|
||||
func TestRun_DryRunDoesNotWrite(t *testing.T) {
|
||||
brainDir := t.TempDir()
|
||||
for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||||
}
|
||||
|
||||
llmResponse := mustJSON([]RawPage{{
|
||||
Title: "Foo",
|
||||
Type: "source",
|
||||
Subtype: "article",
|
||||
Content: "## Summary\n\nFoo.\n",
|
||||
}})
|
||||
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||||
"choices": []map[string]any{{"message": map[string]any{"content": llmResponse}}},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
cfg := Config{Complete: llm.New(srv.URL, "", "m", 30*time.Second).Complete}
|
||||
result, err := Run(context.Background(), cfg, brainDir, "foo content", "foo", true)
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, result.Pages, 1)
|
||||
|
||||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "foo.md"))
|
||||
assert.True(t, os.IsNotExist(err))
|
||||
}
|
||||
|
||||
func TestRun_MergesDuplicatePaths(t *testing.T) {
|
||||
brainDir := t.TempDir()
|
||||
for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||||
}
|
||||
|
||||
// LLM returns same title twice (simulates multi-chunk duplicate)
|
||||
llmResponse := mustJSON([]RawPage{
|
||||
{Title: "Foo", Type: "concept", Content: "## Definition\n\nFirst.\n\n## Related Concepts\n\n[[Bar]]\n"},
|
||||
{Title: "Foo", Type: "concept", Content: "## Definition\n\nSecond.\n\n## Related Concepts\n\n[[Baz]]\n"},
|
||||
})
|
||||
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||||
"choices": []map[string]any{{"message": map[string]any{"content": llmResponse}}},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
cfg := Config{Complete: llm.New(srv.URL, "", "m", 30*time.Second).Complete}
|
||||
result, err := Run(context.Background(), cfg, brainDir, "content", "foo", false)
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, result.Pages, 1) // deduplicated
|
||||
|
||||
content, err := os.ReadFile(filepath.Join(brainDir, "wiki", "concepts", "foo.md"))
|
||||
require.NoError(t, err)
|
||||
// keep-first for Definition, union for Related Concepts
|
||||
assert.Contains(t, string(content), "First.")
|
||||
// Bar and Baz unknown in empty inventory → left as plain [[links]]
|
||||
assert.Contains(t, string(content), "[[Bar]]")
|
||||
assert.Contains(t, string(content), "[[Baz]]")
|
||||
}
|
||||
|
||||
func mustJSON(v any) string {
|
||||
b, err := json.Marshal(v)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
return string(b)
|
||||
}
|
||||
63
ingestion/internal/pipeline/prompt.go
Normal file
63
ingestion/internal/pipeline/prompt.go
Normal file
@@ -0,0 +1,63 @@
|
||||
// ingestion/internal/pipeline/prompt.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
const systemPrompt = `You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided.
|
||||
|
||||
Output ONLY a valid JSON array — no markdown fences, no other text before or after.
|
||||
Each element must have exactly these fields:
|
||||
"title" — exact page title (e.g. "FinBERT", "Ryan Singer", "Shape Up")
|
||||
"type" — exactly one of: "source", "concept", "entity"
|
||||
"subtype" — for source: article|pdf|book|video|note|project; for entity: person|company|tool|model|framework|technology; omit for concept
|
||||
"domain" — one of the domains in the schema (omit if none fits)
|
||||
"content" — Markdown body only — NO frontmatter, NO path, NO slug
|
||||
|
||||
Wikilinks in content: [[Display Name]] — just the display name, no slug, no pipe separator.
|
||||
Only link to pages listed in the inventory or pages you are creating in this response.`
|
||||
|
||||
// BuildPrompt constructs the user prompt for a single chunk.
|
||||
func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]wiki.Entry) string {
|
||||
var sb strings.Builder
|
||||
|
||||
fmt.Fprintf(&sb, "Today's date is %s.\n\n", time.Now().UTC().Format("2006-01-02"))
|
||||
|
||||
sb.WriteString("## Schema\n\n")
|
||||
sb.WriteString(schema)
|
||||
sb.WriteString("\n\n")
|
||||
|
||||
sb.WriteString("## Existing wiki pages\n\n")
|
||||
sb.WriteString("Reference these pages by display name only — [[Display Name]] — in your content.\n\n")
|
||||
|
||||
for _, pt := range []wiki.PageType{wiki.PageTypeConcept, wiki.PageTypeEntity, wiki.PageTypeSource} {
|
||||
entries := inventory[pt]
|
||||
label := strings.ToUpper(string(pt)[:1]) + string(pt)[1:]
|
||||
if len(entries) == 0 {
|
||||
fmt.Fprintf(&sb, "%s — (none yet)\n\n", label)
|
||||
continue
|
||||
}
|
||||
fmt.Fprintf(&sb, "%s:\n", label)
|
||||
for _, e := range entries {
|
||||
fmt.Fprintf(&sb, " - %s\n", e.Title)
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
sb.WriteString("## Non-negotiable rules\n\n")
|
||||
sb.WriteString("1. Output ONLY a valid JSON array — no prose, no fences.\n")
|
||||
sb.WriteString("2. Fields: title, type, subtype (if applicable), domain (if applicable), content.\n")
|
||||
sb.WriteString("3. Wikilinks: [[Display Name]] — no slug, no pipe. The pipeline handles slugs.\n")
|
||||
sb.WriteString("4. Section links must match their section type (Related Concepts → concepts only, etc.).\n")
|
||||
sb.WriteString("5. One source page per book — if inventory shows it exists, return it as an UPDATE.\n\n")
|
||||
|
||||
fmt.Fprintf(&sb, "## Source: %s\n\n", source)
|
||||
sb.WriteString(content)
|
||||
|
||||
return sb.String()
|
||||
}
|
||||
115
ingestion/internal/pipeline/refs.go
Normal file
115
ingestion/internal/pipeline/refs.go
Normal file
@@ -0,0 +1,115 @@
|
||||
// ingestion/internal/pipeline/refs.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"regexp"
|
||||
"strings"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
var wikilinkRE = regexp.MustCompile(`\[\[([^|\]]+)\|`)
|
||||
|
||||
// injectSourceRefs finds the source page in the proposed batch, extracts its
|
||||
// wikilinks, and injects a back-reference into every linked concept or entity page.
|
||||
// Pages that exist on disk but are not in the current batch are loaded and
|
||||
// appended so they will be updated on write.
|
||||
func injectSourceRefs(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry, brainDir string) []wiki.Page {
|
||||
sourceSlug, sourceTitle, found := findSourcePage(pages)
|
||||
if !found {
|
||||
return pages
|
||||
}
|
||||
|
||||
var sourceContent string
|
||||
for _, p := range pages {
|
||||
if strings.HasPrefix(p.Path, "wiki/sources/") &&
|
||||
strings.TrimSuffix(filepath.Base(p.Path), ".md") == sourceSlug {
|
||||
sourceContent = p.Content
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
linkedSlugs := extractWikilinks(sourceContent)
|
||||
sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
|
||||
|
||||
bySlug := make(map[string]int, len(pages))
|
||||
for i, p := range pages {
|
||||
if !strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||
bySlug[strings.TrimSuffix(filepath.Base(p.Path), ".md")] = i
|
||||
}
|
||||
}
|
||||
|
||||
for slug := range linkedSlugs {
|
||||
if slug == sourceSlug {
|
||||
continue
|
||||
}
|
||||
if idx, ok := bySlug[slug]; ok {
|
||||
pages[idx] = addSourceRef(pages[idx], sourceRef)
|
||||
continue
|
||||
}
|
||||
pt, ok := findInInventory(slug, inventory)
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
diskPath := filepath.Join(brainDir, "wiki", string(pt), slug+".md")
|
||||
b, err := os.ReadFile(diskPath)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
page := wiki.Page{
|
||||
Path: "wiki/" + string(pt) + "/" + slug + ".md",
|
||||
Content: string(b),
|
||||
}
|
||||
pages = append(pages, addSourceRef(page, sourceRef))
|
||||
}
|
||||
|
||||
return pages
|
||||
}
|
||||
|
||||
// addSourceRef injects sourceRef into the ## Sources bullet section of page
|
||||
// using wiki.Merge, which deduplicates bullets automatically.
|
||||
func addSourceRef(page wiki.Page, sourceRef string) wiki.Page {
|
||||
patch := wiki.Page{
|
||||
Path: page.Path,
|
||||
Content: "\n## Sources\n\n" + sourceRef + "\n",
|
||||
}
|
||||
return wiki.Merge(page, patch)
|
||||
}
|
||||
|
||||
// extractWikilinks returns the set of slugs referenced as [[slug|...]] in content.
|
||||
func extractWikilinks(content string) map[string]bool {
|
||||
slugs := make(map[string]bool)
|
||||
for _, m := range wikilinkRE.FindAllStringSubmatch(content, -1) {
|
||||
slugs[m[1]] = true
|
||||
}
|
||||
return slugs
|
||||
}
|
||||
|
||||
// findSourcePage returns the slug and title of the first wiki/sources/ page in pages.
|
||||
func findSourcePage(pages []wiki.Page) (slug, title string, found bool) {
|
||||
for _, p := range pages {
|
||||
if strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||
slug = strings.TrimSuffix(filepath.Base(p.Path), ".md")
|
||||
title = extractTitle(p.Content)
|
||||
if title == "" {
|
||||
title = slug
|
||||
}
|
||||
return slug, title, true
|
||||
}
|
||||
}
|
||||
return "", "", false
|
||||
}
|
||||
|
||||
// findInInventory returns the PageType for a slug if it appears in the inventory.
|
||||
func findInInventory(slug string, inventory map[wiki.PageType][]wiki.Entry) (wiki.PageType, bool) {
|
||||
for pt, entries := range inventory {
|
||||
for _, e := range entries {
|
||||
if e.Slug == slug {
|
||||
return pt, true
|
||||
}
|
||||
}
|
||||
}
|
||||
return "", false
|
||||
}
|
||||
172
ingestion/internal/pipeline/refs_test.go
Normal file
172
ingestion/internal/pipeline/refs_test.go
Normal file
@@ -0,0 +1,172 @@
|
||||
// ingestion/internal/pipeline/refs_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
func makeInventory(concepts, entities []string) map[wiki.PageType][]wiki.Entry {
|
||||
inv := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {},
|
||||
wiki.PageTypeEntity: {},
|
||||
wiki.PageTypeSource: {},
|
||||
}
|
||||
for _, slug := range concepts {
|
||||
inv[wiki.PageTypeConcept] = append(inv[wiki.PageTypeConcept], wiki.Entry{Slug: slug, Title: slug})
|
||||
}
|
||||
for _, slug := range entities {
|
||||
inv[wiki.PageTypeEntity] = append(inv[wiki.PageTypeEntity], wiki.Entry{Slug: slug, Title: slug})
|
||||
}
|
||||
return inv
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_NoSourcePage(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFoo.\n"},
|
||||
}
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
assert.Equal(t, pages, got)
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_InjectsIntoProposedConcept(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[domain-driven-design|Domain Driven Design]].\n",
|
||||
},
|
||||
{
|
||||
Path: "wiki/concepts/domain-driven-design.md",
|
||||
Content: "---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA methodology.\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
|
||||
require.Len(t, got, 2)
|
||||
assert.Contains(t, got[1].Content, "## Sources")
|
||||
assert.Contains(t, got[1].Content, "[[my-article|My Article]]")
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_LoadsConceptFromDisk(t *testing.T) {
|
||||
brainDir := t.TempDir()
|
||||
conceptDir := filepath.Join(brainDir, "wiki", "concepts")
|
||||
require.NoError(t, os.MkdirAll(conceptDir, 0o755))
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(conceptDir, "shape-up.md"),
|
||||
[]byte("---\ntitle: Shape Up\n---\n\n## Definition\n\nA methodology.\n"),
|
||||
0o644,
|
||||
))
|
||||
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[shape-up|Shape Up]].\n",
|
||||
},
|
||||
}
|
||||
inv := makeInventory([]string{"shape-up"}, nil)
|
||||
|
||||
got := injectSourceRefs(pages, inv, brainDir)
|
||||
|
||||
require.Len(t, got, 2)
|
||||
var conceptPage wiki.Page
|
||||
for _, p := range got {
|
||||
if p.Path == "wiki/concepts/shape-up.md" {
|
||||
conceptPage = p
|
||||
}
|
||||
}
|
||||
assert.Contains(t, conceptPage.Content, "## Sources")
|
||||
assert.Contains(t, conceptPage.Content, "[[my-article|My Article]]")
|
||||
assert.Contains(t, conceptPage.Content, "## Definition")
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_NoSelfReference(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSelf-link [[my-article|My Article]].\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
assert.Len(t, got, 1)
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_DeduplicatesOnReingestion(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/my-article.md",
|
||||
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[ddd|DDD]].\n",
|
||||
},
|
||||
{
|
||||
Path: "wiki/concepts/ddd.md",
|
||||
Content: "---\ntitle: DDD\n---\n\n## Definition\n\nA thing.\n\n## Sources\n\n- [[my-article|My Article]]\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
|
||||
require.Len(t, got, 2)
|
||||
count := 0
|
||||
for _, line := range splitLines(got[1].Content) {
|
||||
if line == "- [[my-article|My Article]]" {
|
||||
count++
|
||||
}
|
||||
}
|
||||
assert.Equal(t, 1, count, "source ref should appear exactly once")
|
||||
}
|
||||
|
||||
func TestInjectSourceRefs_InjectsIntoEntity(t *testing.T) {
|
||||
pages := []wiki.Page{
|
||||
{
|
||||
Path: "wiki/sources/book.md",
|
||||
Content: "---\ntitle: Book\n---\n\n## Summary\n\nBy [[ryan-singer|Ryan Singer]].\n",
|
||||
},
|
||||
{
|
||||
Path: "wiki/entities/ryan-singer.md",
|
||||
Content: "---\ntitle: Ryan Singer\n---\n\n## Description\n\nA designer.\n",
|
||||
},
|
||||
}
|
||||
|
||||
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||
|
||||
require.Len(t, got, 2)
|
||||
var entity wiki.Page
|
||||
for _, p := range got {
|
||||
if p.Path == "wiki/entities/ryan-singer.md" {
|
||||
entity = p
|
||||
}
|
||||
}
|
||||
assert.Contains(t, entity.Content, "[[book|Book]]")
|
||||
}
|
||||
|
||||
func TestExtractWikilinks(t *testing.T) {
|
||||
content := "See [[foo|Foo]] and [[bar|Bar]] and [[foo|Foo again]]."
|
||||
got := extractWikilinks(content)
|
||||
assert.True(t, got["foo"])
|
||||
assert.True(t, got["bar"])
|
||||
assert.Len(t, got, 2, "duplicate slugs should be deduplicated")
|
||||
}
|
||||
|
||||
func splitLines(s string) []string {
|
||||
var out []string
|
||||
start := 0
|
||||
for i := 0; i < len(s); i++ {
|
||||
if s[i] == '\n' {
|
||||
if line := s[start:i]; line != "" {
|
||||
out = append(out, line)
|
||||
}
|
||||
start = i + 1
|
||||
}
|
||||
}
|
||||
if last := s[start:]; last != "" {
|
||||
out = append(out, last)
|
||||
}
|
||||
return out
|
||||
}
|
||||
88
ingestion/internal/pipeline/resolve.go
Normal file
88
ingestion/internal/pipeline/resolve.go
Normal file
@@ -0,0 +1,88 @@
|
||||
// ingestion/internal/pipeline/resolve.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"path/filepath"
|
||||
"strings"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
// Resolve remaps proposed pages to existing slugs when a fuzzy title match is found.
|
||||
// It only matches within the same page type (entities→entities, concepts→concepts).
|
||||
// Pages with no inventory match are returned unchanged.
|
||||
func Resolve(proposed []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) []wiki.Page {
|
||||
type key struct {
|
||||
pt wiki.PageType
|
||||
normalized string
|
||||
}
|
||||
lookup := make(map[key]string) // key → canonical slug
|
||||
for pt, entries := range inventory {
|
||||
for _, e := range entries {
|
||||
k := key{pt: pt, normalized: normalizeTitle(e.Title)}
|
||||
lookup[k] = e.Slug
|
||||
for _, alias := range e.Aliases {
|
||||
ak := key{pt: pt, normalized: normalizeTitle(alias)}
|
||||
if _, exists := lookup[ak]; !exists {
|
||||
lookup[ak] = e.Slug
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
out := make([]wiki.Page, 0, len(proposed))
|
||||
for _, page := range proposed {
|
||||
pt := pageTypeFromPath(page.Path)
|
||||
title := extractTitle(page.Content)
|
||||
k := key{pt: pt, normalized: normalizeTitle(title)}
|
||||
if canonicalSlug, ok := lookup[k]; ok {
|
||||
dir := filepath.Dir(page.Path)
|
||||
page.Path = dir + "/" + canonicalSlug + ".md"
|
||||
}
|
||||
out = append(out, page)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// normalizeTitle lowercases, removes leading articles, collapses whitespace.
|
||||
// "The Shape Up Method" → "shape up method"
|
||||
func normalizeTitle(s string) string {
|
||||
s = strings.ToLower(strings.TrimSpace(s))
|
||||
for _, article := range []string{"the ", "a ", "an "} {
|
||||
s = strings.TrimPrefix(s, article)
|
||||
}
|
||||
s = strings.ReplaceAll(s, "-", " ")
|
||||
return strings.Join(strings.Fields(s), " ")
|
||||
}
|
||||
|
||||
// pageTypeFromPath extracts the wiki.PageType from a path like "wiki/entities/foo.md".
|
||||
func pageTypeFromPath(path string) wiki.PageType {
|
||||
parts := strings.Split(filepath.ToSlash(path), "/")
|
||||
if len(parts) >= 2 {
|
||||
return wiki.PageType(parts[1])
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// extractTitle reads the title field from YAML frontmatter in content.
|
||||
// Falls back to empty string if not found.
|
||||
func extractTitle(content string) string {
|
||||
lines := strings.SplitN(content, "\n", 30)
|
||||
inFM := false
|
||||
for _, line := range lines {
|
||||
if strings.TrimSpace(line) == "---" {
|
||||
if !inFM {
|
||||
inFM = true
|
||||
continue
|
||||
}
|
||||
break
|
||||
}
|
||||
if inFM {
|
||||
key, val, ok := strings.Cut(line, ":")
|
||||
if ok && strings.TrimSpace(key) == "title" {
|
||||
return strings.Trim(strings.TrimSpace(val), `"'`)
|
||||
}
|
||||
}
|
||||
}
|
||||
return ""
|
||||
}
|
||||
90
ingestion/internal/pipeline/resolve_test.go
Normal file
90
ingestion/internal/pipeline/resolve_test.go
Normal file
@@ -0,0 +1,90 @@
|
||||
// ingestion/internal/pipeline/resolve_test.go
|
||||
package pipeline
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||
)
|
||||
|
||||
func TestResolve_NoMatch(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/new-person.md", Content: "---\ntitle: New Person\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer"}},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/entities/new-person.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_TitleMatchRedirectsSlug(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/ryan-singer-the-designer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_AliasMatchRedirectsSlug(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/singer.md", Content: "---\ntitle: Singer\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer", "R. Singer"}},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_NormalizationCaseAndArticles(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/concepts/the-shape-up-method.md", Content: "---\ntitle: The Shape Up Method\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeConcept: {
|
||||
{Slug: "shape-up-method", Title: "Shape Up Method", Aliases: nil},
|
||||
},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/concepts/shape-up-method.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_OnlyMatchesSamePageType(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/concepts/ryan-singer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{
|
||||
wiki.PageTypeEntity: {
|
||||
{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
|
||||
},
|
||||
wiki.PageTypeConcept: {},
|
||||
}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Len(t, got, 1)
|
||||
assert.Equal(t, "wiki/concepts/ryan-singer.md", got[0].Path)
|
||||
}
|
||||
|
||||
func TestResolve_EmptyInventory(t *testing.T) {
|
||||
proposed := []wiki.Page{
|
||||
{Path: "wiki/entities/first.md", Content: "---\ntitle: First\n---\n"},
|
||||
}
|
||||
inventory := map[wiki.PageType][]wiki.Entry{}
|
||||
got := Resolve(proposed, inventory)
|
||||
assert.Equal(t, proposed, got)
|
||||
}
|
||||
@@ -33,46 +33,52 @@ func Query(brainDir, query string, limit int) ([]Result, error) {
|
||||
|
||||
var results []Result
|
||||
|
||||
err := filepath.WalkDir(filepath.Join(brainDir, "wiki"), func(path string, d os.DirEntry, err error) error {
|
||||
if err != nil {
|
||||
slog.Warn("search: skipping path", "path", path, "err", err)
|
||||
return nil
|
||||
}
|
||||
if d.IsDir() || !strings.HasSuffix(path, ".md") {
|
||||
return nil
|
||||
for _, subdir := range []string{"knowledge", "wiki"} {
|
||||
dir := filepath.Join(brainDir, subdir)
|
||||
if _, statErr := os.Stat(dir); os.IsNotExist(statErr) {
|
||||
continue
|
||||
}
|
||||
err := filepath.WalkDir(dir, func(path string, d os.DirEntry, err error) error {
|
||||
if err != nil {
|
||||
slog.Warn("search: skipping path", "path", path, "err", err)
|
||||
return nil
|
||||
}
|
||||
if d.IsDir() || !strings.HasSuffix(path, ".md") {
|
||||
return nil
|
||||
}
|
||||
|
||||
content, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
slog.Warn("search: skipping unreadable file", "path", path, "err", err)
|
||||
content, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
slog.Warn("search: skipping unreadable file", "path", path, "err", err)
|
||||
return nil
|
||||
}
|
||||
|
||||
lower := strings.ToLower(string(content))
|
||||
score := 0
|
||||
for _, term := range terms {
|
||||
score += strings.Count(lower, term)
|
||||
}
|
||||
if score == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
rel, err := filepath.Rel(brainDir, path)
|
||||
if err != nil {
|
||||
return fmt.Errorf("rel path: %w", err)
|
||||
}
|
||||
rel = filepath.ToSlash(rel)
|
||||
|
||||
results = append(results, Result{
|
||||
Path: rel,
|
||||
Title: extractTitle(string(content), d.Name()),
|
||||
Excerpt: excerpt(string(content), 300),
|
||||
Score: score,
|
||||
})
|
||||
return nil
|
||||
}
|
||||
|
||||
lower := strings.ToLower(string(content))
|
||||
score := 0
|
||||
for _, term := range terms {
|
||||
score += strings.Count(lower, term)
|
||||
}
|
||||
if score == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
rel, err := filepath.Rel(brainDir, path)
|
||||
if err != nil {
|
||||
return fmt.Errorf("rel path: %w", err)
|
||||
}
|
||||
rel = filepath.ToSlash(rel)
|
||||
|
||||
results = append(results, Result{
|
||||
Path: rel,
|
||||
Title: extractTitle(string(content), d.Name()),
|
||||
Excerpt: excerpt(string(content), 300),
|
||||
Score: score,
|
||||
})
|
||||
return nil
|
||||
})
|
||||
if err != nil {
|
||||
return nil, err
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
}
|
||||
|
||||
sort.Slice(results, func(i, j int) bool {
|
||||
|
||||
@@ -14,17 +14,15 @@ import (
|
||||
|
||||
func TestSearch_ReturnsMatchingPages(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "knowledge"), 0o755))
|
||||
|
||||
// Write a concept page mentioning "retry"
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "concepts", "retry-logic.md"),
|
||||
filepath.Join(dir, "knowledge", "retry-logic.md"),
|
||||
[]byte("---\ntitle: Retry Logic\ndomain: software\n---\n\nRetry logic handles transient failures by re-attempting operations.\n"),
|
||||
0o644,
|
||||
))
|
||||
// Write an unrelated page
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "concepts", "database.md"),
|
||||
filepath.Join(dir, "knowledge", "database.md"),
|
||||
[]byte("---\ntitle: Database\ndomain: software\n---\n\nA database stores structured data.\n"),
|
||||
0o644,
|
||||
))
|
||||
@@ -32,7 +30,7 @@ func TestSearch_ReturnsMatchingPages(t *testing.T) {
|
||||
results, err := search.Query(dir, "retry transient", 5)
|
||||
require.NoError(t, err)
|
||||
require.Len(t, results, 1)
|
||||
assert.Equal(t, "wiki/concepts/retry-logic.md", results[0].Path)
|
||||
assert.Equal(t, "knowledge/retry-logic.md", results[0].Path)
|
||||
assert.Equal(t, "Retry Logic", results[0].Title)
|
||||
assert.Greater(t, results[0].Score, 0)
|
||||
assert.Contains(t, results[0].Excerpt, "Retry")
|
||||
@@ -40,10 +38,10 @@ func TestSearch_ReturnsMatchingPages(t *testing.T) {
|
||||
|
||||
func TestSearch_RespectsLimit(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "knowledge"), 0o755))
|
||||
for i := 0; i < 5; i++ {
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "concepts", fmt.Sprintf("page-%d.md", i)),
|
||||
filepath.Join(dir, "knowledge", fmt.Sprintf("page-%d.md", i)),
|
||||
[]byte(fmt.Sprintf("---\ntitle: Page %d\n---\n\nThis page mentions retry.\n", i)),
|
||||
0o644,
|
||||
))
|
||||
|
||||
210
ingestion/internal/watcher/watcher.go
Normal file
210
ingestion/internal/watcher/watcher.go
Normal file
@@ -0,0 +1,210 @@
|
||||
// ingestion/internal/watcher/watcher.go
|
||||
package watcher
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"io"
|
||||
"log/slog"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"time"
|
||||
"unicode"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/extract"
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
||||
)
|
||||
|
||||
// Config holds watcher configuration.
|
||||
type Config struct {
|
||||
BrainDir string
|
||||
Interval time.Duration
|
||||
Pipeline pipeline.Config
|
||||
}
|
||||
|
||||
// Start launches the watcher in a background goroutine.
|
||||
// It returns immediately. The watcher stops when ctx is cancelled.
|
||||
func Start(ctx context.Context, cfg Config) {
|
||||
go func() {
|
||||
ticker := time.NewTicker(cfg.Interval)
|
||||
defer ticker.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-ticker.C:
|
||||
date := time.Now().UTC().Format("2006-01-02")
|
||||
errs := processDir(ctx, cfg, date)
|
||||
for _, err := range errs {
|
||||
slog.Error("watcher: error processing file", "error", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// processDir walks brain/raw/, processes each eligible file, returns any errors encountered.
|
||||
func processDir(ctx context.Context, cfg Config, date string) []error {
|
||||
rawDir := filepath.Join(cfg.BrainDir, "raw")
|
||||
|
||||
var errs []error
|
||||
err := filepath.WalkDir(rawDir, func(path string, d os.DirEntry, err error) error {
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Skip the root itself.
|
||||
if path == rawDir {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Skip processed/ and failed/ subdirectories entirely.
|
||||
if d.IsDir() {
|
||||
name := d.Name()
|
||||
if name == "processed" || name == "failed" {
|
||||
return filepath.SkipDir
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Only process supported extensions.
|
||||
ext := strings.ToLower(filepath.Ext(path))
|
||||
if ext != ".md" && ext != ".txt" && ext != ".pdf" {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Skip files that have already been processed or permanently failed.
|
||||
if _, err := os.Stat(path + ".processed"); err == nil {
|
||||
return nil
|
||||
}
|
||||
if _, err := os.Stat(path + ".failed"); err == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := processFile(ctx, cfg, path, date); err != nil {
|
||||
errs = append(errs, fmt.Errorf("process %s: %w", filepath.Base(path), err))
|
||||
}
|
||||
return nil
|
||||
})
|
||||
if err != nil {
|
||||
errs = append(errs, fmt.Errorf("walk raw dir: %w", err))
|
||||
}
|
||||
return errs
|
||||
}
|
||||
|
||||
// processFile reads a file, calls pipeline.Run, copies it to processed/ or failed/,
|
||||
// and writes a marker file next to the original so the watcher skips it next poll.
|
||||
// The original file is never deleted, keeping Syncthing-connected vaults (e.g. Obsidian) intact.
|
||||
func processFile(ctx context.Context, cfg Config, path, date string) error {
|
||||
filename := filepath.Base(path)
|
||||
source := deriveSource(filename)
|
||||
|
||||
content, err := extract.Text(path)
|
||||
if err != nil {
|
||||
return fmt.Errorf("extract text: %w", err)
|
||||
}
|
||||
|
||||
_, runErr := pipeline.Run(ctx, cfg.Pipeline, cfg.BrainDir, content, source, false)
|
||||
if runErr != nil {
|
||||
// Copy to failed/ and leave a .failed marker so we don't retry.
|
||||
failedDir := filepath.Join(cfg.BrainDir, "raw", "failed")
|
||||
if mkErr := os.MkdirAll(failedDir, 0o755); mkErr != nil {
|
||||
return fmt.Errorf("mkdir failed dir: %w", mkErr)
|
||||
}
|
||||
dest := filepath.Join(failedDir, filename)
|
||||
if cpErr := copyFile(path, dest); cpErr != nil {
|
||||
return fmt.Errorf("copy to failed: %w", cpErr)
|
||||
}
|
||||
if mkErr := os.WriteFile(path+".failed", []byte(runErr.Error()), 0o644); mkErr != nil {
|
||||
slog.Error("watcher: failed to write .failed marker", "error", mkErr)
|
||||
}
|
||||
|
||||
slog.Warn("watcher: file failed", "file", filename, "error", runErr)
|
||||
|
||||
if logErr := appendWatcherLog(cfg.BrainDir, filename, runErr, date); logErr != nil {
|
||||
slog.Error("watcher: failed to write log entry", "error", logErr)
|
||||
}
|
||||
// Return nil: quarantine succeeded; error already logged.
|
||||
return nil
|
||||
}
|
||||
|
||||
// Copy to processed/YYYY-MM-DD/ and leave a .processed marker so we don't re-ingest.
|
||||
processedDir := filepath.Join(cfg.BrainDir, "raw", "processed", date)
|
||||
if err := os.MkdirAll(processedDir, 0o755); err != nil {
|
||||
return fmt.Errorf("mkdir processed dir: %w", err)
|
||||
}
|
||||
dest := filepath.Join(processedDir, filename)
|
||||
if _, err := os.Stat(dest); err == nil {
|
||||
// Archive copy already exists; append timestamp to avoid overwriting.
|
||||
ext := filepath.Ext(filename)
|
||||
base := strings.TrimSuffix(filename, ext)
|
||||
dest = filepath.Join(processedDir, base+"-"+time.Now().UTC().Format("150405")+ext)
|
||||
}
|
||||
if err := copyFile(path, dest); err != nil {
|
||||
return fmt.Errorf("copy to processed: %w", err)
|
||||
}
|
||||
if err := os.WriteFile(path+".processed", []byte(date), 0o644); err != nil {
|
||||
slog.Error("watcher: failed to write .processed marker", "error", err)
|
||||
}
|
||||
|
||||
slog.Info("watcher: file processed", "file", filename, "source", source)
|
||||
return nil
|
||||
}
|
||||
|
||||
// copyFile copies src to dst, creating dst if it doesn't exist.
|
||||
func copyFile(src, dst string) error {
|
||||
in, err := os.Open(src)
|
||||
if err != nil {
|
||||
return fmt.Errorf("open src: %w", err)
|
||||
}
|
||||
defer in.Close() //nolint:errcheck
|
||||
|
||||
out, err := os.Create(dst)
|
||||
if err != nil {
|
||||
return fmt.Errorf("create dst: %w", err)
|
||||
}
|
||||
|
||||
if _, err := io.Copy(out, in); err != nil {
|
||||
out.Close() //nolint:errcheck
|
||||
return fmt.Errorf("copy: %w", err)
|
||||
}
|
||||
return out.Close()
|
||||
}
|
||||
|
||||
// deriveSource turns a filename into a human-readable source name.
|
||||
// "shape-up-book.md" → "Shape Up Book"
|
||||
func deriveSource(filename string) string {
|
||||
// Strip extension.
|
||||
name := strings.TrimSuffix(filename, filepath.Ext(filename))
|
||||
// Split on hyphens.
|
||||
words := strings.Split(name, "-")
|
||||
// Title-case each word.
|
||||
for i, w := range words {
|
||||
if w == "" {
|
||||
continue
|
||||
}
|
||||
runes := []rune(w)
|
||||
runes[0] = unicode.ToUpper(runes[0])
|
||||
words[i] = string(runes)
|
||||
}
|
||||
return strings.Join(words, " ")
|
||||
}
|
||||
|
||||
// appendWatcherLog appends a watcher error entry to brain/log.md.
|
||||
func appendWatcherLog(brainDir, filename string, runErr error, date string) error {
|
||||
entry := fmt.Sprintf("## %s — watcher error\n\n- **File:** %s\n- **Error:** %s\n\n",
|
||||
date, filename, runErr.Error())
|
||||
|
||||
logPath := filepath.Join(brainDir, "log.md")
|
||||
f, err := os.OpenFile(logPath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
|
||||
if err != nil {
|
||||
return fmt.Errorf("open log: %w", err)
|
||||
}
|
||||
|
||||
if _, err = f.WriteString(entry); err != nil {
|
||||
f.Close() //nolint:errcheck
|
||||
return fmt.Errorf("write log: %w", err)
|
||||
}
|
||||
return f.Close()
|
||||
}
|
||||
231
ingestion/internal/watcher/watcher_test.go
Normal file
231
ingestion/internal/watcher/watcher_test.go
Normal file
@@ -0,0 +1,231 @@
|
||||
// ingestion/internal/watcher/watcher_test.go
|
||||
package watcher
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
|
||||
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
||||
)
|
||||
|
||||
// successComplete returns a valid JSON-encoded RawPage array for any call.
|
||||
func successComplete(raw pipeline.RawPage) pipeline.CompleteFunc {
|
||||
return func(ctx context.Context, system, user string) (string, error) {
|
||||
b, err := json.Marshal([]pipeline.RawPage{raw})
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
return string(b), nil
|
||||
}
|
||||
}
|
||||
|
||||
// errorComplete always returns an error simulating an LLM failure.
|
||||
func errorComplete(_ context.Context, _, _ string) (string, error) {
|
||||
return "", fmt.Errorf("LLM unavailable")
|
||||
}
|
||||
|
||||
func setupBrainDir(t *testing.T) string {
|
||||
t.Helper()
|
||||
brainDir := t.TempDir()
|
||||
for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources", "raw"} {
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||||
}
|
||||
return brainDir
|
||||
}
|
||||
|
||||
func TestStart_ProcessesFile(t *testing.T) {
|
||||
brainDir := setupBrainDir(t)
|
||||
|
||||
// Place a .md file in raw/.
|
||||
rawFile := filepath.Join(brainDir, "raw", "shape-up-book.md")
|
||||
require.NoError(t, os.WriteFile(rawFile, []byte("Content about Shape Up."), 0o644))
|
||||
|
||||
date := time.Now().UTC().Format("2006-01-02")
|
||||
rawPage := pipeline.RawPage{
|
||||
Title: "Shape Up Book",
|
||||
Type: "source",
|
||||
Subtype: "article",
|
||||
Domain: "product-management",
|
||||
Content: "## Summary\n\nA book about Shape Up.\n",
|
||||
}
|
||||
|
||||
cfg := Config{
|
||||
BrainDir: brainDir,
|
||||
Interval: 50 * time.Millisecond,
|
||||
Pipeline: pipeline.Config{
|
||||
Complete: successComplete(rawPage),
|
||||
ChunkSize: 0,
|
||||
Schema: "# Schema\nThree page types.",
|
||||
},
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
|
||||
defer cancel()
|
||||
|
||||
Start(ctx, cfg)
|
||||
|
||||
// Poll until the file is moved to processed/.
|
||||
processedPath := filepath.Join(brainDir, "raw", "processed", date, "shape-up-book.md")
|
||||
var found bool
|
||||
deadline := time.Now().Add(2 * time.Second)
|
||||
for time.Now().Before(deadline) {
|
||||
if _, err := os.Stat(processedPath); err == nil {
|
||||
found = true
|
||||
break
|
||||
}
|
||||
time.Sleep(20 * time.Millisecond)
|
||||
}
|
||||
require.True(t, found, "file should be copied to processed/")
|
||||
|
||||
// Original file should still exist (copy, not move — keeps Obsidian vault intact).
|
||||
_, err := os.Stat(rawFile)
|
||||
assert.NoError(t, err, "original file should remain in raw/")
|
||||
|
||||
// A .processed marker should exist next to the original.
|
||||
_, err = os.Stat(rawFile + ".processed")
|
||||
assert.NoError(t, err, ".processed marker should be written")
|
||||
|
||||
// Wiki page should exist.
|
||||
wikiPath := filepath.Join(brainDir, "wiki", "sources", "shape-up-book.md")
|
||||
_, err = os.Stat(wikiPath)
|
||||
assert.NoError(t, err, "wiki page should be written")
|
||||
|
||||
// log.md should contain an ingest record.
|
||||
logContent, err := os.ReadFile(filepath.Join(brainDir, "log.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(logContent), "— ingest")
|
||||
}
|
||||
|
||||
func TestStart_MovesToFailedOnError(t *testing.T) {
|
||||
brainDir := setupBrainDir(t)
|
||||
|
||||
rawFile := filepath.Join(brainDir, "raw", "bad-file.md")
|
||||
require.NoError(t, os.WriteFile(rawFile, []byte("Some content."), 0o644))
|
||||
|
||||
cfg := Config{
|
||||
BrainDir: brainDir,
|
||||
Interval: 50 * time.Millisecond,
|
||||
Pipeline: pipeline.Config{
|
||||
Complete: errorComplete,
|
||||
ChunkSize: 0,
|
||||
Schema: "# Schema\nThree page types.",
|
||||
},
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
|
||||
defer cancel()
|
||||
|
||||
Start(ctx, cfg)
|
||||
|
||||
// Poll until the file is moved to failed/.
|
||||
failedPath := filepath.Join(brainDir, "raw", "failed", "bad-file.md")
|
||||
var found bool
|
||||
deadline := time.Now().Add(2 * time.Second)
|
||||
for time.Now().Before(deadline) {
|
||||
if _, err := os.Stat(failedPath); err == nil {
|
||||
found = true
|
||||
break
|
||||
}
|
||||
time.Sleep(20 * time.Millisecond)
|
||||
}
|
||||
require.True(t, found, "file should be copied to failed/")
|
||||
|
||||
// Original file should still exist (copy, not move — keeps Obsidian vault intact).
|
||||
_, err := os.Stat(rawFile)
|
||||
assert.NoError(t, err, "original file should remain in raw/")
|
||||
|
||||
// A .failed marker should exist next to the original.
|
||||
_, err = os.Stat(rawFile + ".failed")
|
||||
assert.NoError(t, err, ".failed marker should be written")
|
||||
|
||||
// log.md should contain a watcher error entry.
|
||||
logContent, err := os.ReadFile(filepath.Join(brainDir, "log.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(logContent), "— watcher error")
|
||||
assert.Contains(t, string(logContent), "bad-file.md")
|
||||
}
|
||||
|
||||
func TestDeriveSource(t *testing.T) {
|
||||
tests := []struct {
|
||||
filename string
|
||||
want string
|
||||
}{
|
||||
{"shape-up-book.md", "Shape Up Book"},
|
||||
{"raft-consensus.txt", "Raft Consensus"},
|
||||
{"my-note.md", "My Note"},
|
||||
{"single.md", "Single"},
|
||||
{"no-extension", "No Extension"},
|
||||
}
|
||||
|
||||
for _, tc := range tests {
|
||||
t.Run(tc.filename, func(t *testing.T) {
|
||||
got := deriveSource(tc.filename)
|
||||
assert.Equal(t, tc.want, got)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestProcessDir_SkipsSubdirs(t *testing.T) {
|
||||
brainDir := setupBrainDir(t)
|
||||
|
||||
// Create processed/ and failed/ subdirs with files inside.
|
||||
for _, sub := range []string{"processed/2026-04-22", "failed"} {
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, "raw", sub), 0o755))
|
||||
}
|
||||
|
||||
processedFile := filepath.Join(brainDir, "raw", "processed", "2026-04-22", "old-file.md")
|
||||
failedFile := filepath.Join(brainDir, "raw", "failed", "broken-file.md")
|
||||
require.NoError(t, os.WriteFile(processedFile, []byte("old"), 0o644))
|
||||
require.NoError(t, os.WriteFile(failedFile, []byte("broken"), 0o644))
|
||||
|
||||
// Also place a valid file in raw/ root that should be processed.
|
||||
validFile := filepath.Join(brainDir, "raw", "valid.md")
|
||||
require.NoError(t, os.WriteFile(validFile, []byte("valid content"), 0o644))
|
||||
|
||||
date := time.Now().UTC().Format("2006-01-02")
|
||||
|
||||
// Track which sources were passed to Complete.
|
||||
var processedSources []string
|
||||
completeFn := func(ctx context.Context, system, user string) (string, error) {
|
||||
// Record that this was called; return a minimal valid RawPage.
|
||||
raw := pipeline.RawPage{
|
||||
Title: "Valid",
|
||||
Type: "source",
|
||||
Subtype: "article",
|
||||
Content: "## Summary\n\nValid.\n",
|
||||
}
|
||||
b, _ := json.Marshal([]pipeline.RawPage{raw})
|
||||
processedSources = append(processedSources, "called")
|
||||
return string(b), nil
|
||||
}
|
||||
|
||||
cfg := Config{
|
||||
BrainDir: brainDir,
|
||||
Interval: time.Hour, // not used; we call processDir directly
|
||||
Pipeline: pipeline.Config{
|
||||
Complete: completeFn,
|
||||
ChunkSize: 0,
|
||||
Schema: "# Schema\nThree page types.",
|
||||
},
|
||||
}
|
||||
|
||||
errs := processDir(context.Background(), cfg, date)
|
||||
assert.Empty(t, errs, "no errors expected")
|
||||
|
||||
// Complete should have been called exactly once (for valid.md, not for files in subdirs).
|
||||
assert.Len(t, processedSources, 1, "only the file in raw/ root should be processed")
|
||||
|
||||
// Files in processed/ and failed/ must remain untouched.
|
||||
_, err := os.Stat(processedFile)
|
||||
assert.NoError(t, err, "processed subdir file should be untouched")
|
||||
_, err = os.Stat(failedFile)
|
||||
assert.NoError(t, err, "failed subdir file should be untouched")
|
||||
}
|
||||
71
ingestion/internal/wiki/index.go
Normal file
71
ingestion/internal/wiki/index.go
Normal file
@@ -0,0 +1,71 @@
|
||||
// ingestion/internal/wiki/index.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// RebuildIndex writes brain/wiki/index.md from the current wiki contents.
|
||||
func RebuildIndex(brainDir, date string) error {
|
||||
inv, err := LoadInventory(brainDir)
|
||||
if err != nil {
|
||||
return fmt.Errorf("load inventory: %w", err)
|
||||
}
|
||||
|
||||
total := len(inv[PageTypeConcept]) + len(inv[PageTypeEntity]) + len(inv[PageTypeSource])
|
||||
var sb strings.Builder
|
||||
fmt.Fprintf(&sb, "# Wiki Index\n\n")
|
||||
fmt.Fprintf(&sb, "_Updated: %s — %d pages (%d concepts, %d entities, %d sources)_\n\n",
|
||||
date, total,
|
||||
len(inv[PageTypeConcept]),
|
||||
len(inv[PageTypeEntity]),
|
||||
len(inv[PageTypeSource]))
|
||||
|
||||
for _, pt := range []PageType{PageTypeConcept, PageTypeEntity, PageTypeSource} {
|
||||
entries := inv[pt]
|
||||
if len(entries) == 0 {
|
||||
continue
|
||||
}
|
||||
label := strings.ToUpper(string(pt)[:1]) + string(pt)[1:]
|
||||
fmt.Fprintf(&sb, "## %s\n\n", label)
|
||||
for _, e := range entries {
|
||||
summary := pageFirstSentence(brainDir, e)
|
||||
if summary != "" {
|
||||
fmt.Fprintf(&sb, "- [[%s|%s]] — %s\n", e.Slug, e.Title, summary)
|
||||
} else {
|
||||
fmt.Fprintf(&sb, "- [[%s|%s]]\n", e.Slug, e.Title)
|
||||
}
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
dest := filepath.Join(brainDir, "wiki", "index.md")
|
||||
return os.WriteFile(dest, []byte(sb.String()), 0o644)
|
||||
}
|
||||
|
||||
func pageFirstSentence(brainDir string, e Entry) string {
|
||||
path := filepath.Join(brainDir, "wiki", string(e.Type), e.Slug+".md")
|
||||
content, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
parts := strings.SplitN(string(content), "---", 3)
|
||||
body := string(content)
|
||||
if len(parts) == 3 {
|
||||
body = parts[2]
|
||||
}
|
||||
for _, line := range strings.Split(body, "\n") {
|
||||
line = strings.TrimSpace(line)
|
||||
if line == "" || strings.HasPrefix(line, "#") {
|
||||
continue
|
||||
}
|
||||
if len(line) > 100 {
|
||||
return line[:100] + "…"
|
||||
}
|
||||
return line
|
||||
}
|
||||
return ""
|
||||
}
|
||||
76
ingestion/internal/wiki/index_test.go
Normal file
76
ingestion/internal/wiki/index_test.go
Normal file
@@ -0,0 +1,76 @@
|
||||
// ingestion/internal/wiki/index_test.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func setupWikiDir(t *testing.T) string {
|
||||
t.Helper()
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "concepts", "tdd.md"),
|
||||
[]byte("---\ntitle: TDD\n---\n\n## Definition\n\nTest-driven development is a discipline.\n"),
|
||||
0o644,
|
||||
))
|
||||
return dir
|
||||
}
|
||||
|
||||
func TestRebuildIndex(t *testing.T) {
|
||||
dir := setupWikiDir(t)
|
||||
require.NoError(t, RebuildIndex(dir, "2026-04-22"))
|
||||
|
||||
content, err := os.ReadFile(filepath.Join(dir, "wiki", "index.md"))
|
||||
require.NoError(t, err)
|
||||
s := string(content)
|
||||
assert.Contains(t, s, "# Wiki Index")
|
||||
assert.Contains(t, s, "2026-04-22")
|
||||
assert.Contains(t, s, "[[tdd|TDD]]")
|
||||
assert.Contains(t, s, "## Concepts")
|
||||
}
|
||||
|
||||
func TestRebuildIndex_EmptyWiki(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
|
||||
|
||||
require.NoError(t, RebuildIndex(dir, "2026-04-22"))
|
||||
content, err := os.ReadFile(filepath.Join(dir, "wiki", "index.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(content), "# Wiki Index")
|
||||
}
|
||||
|
||||
func TestAppendLog(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, AppendLog(dir, "shape-up-book",
|
||||
[]string{"wiki/sources/shape-up.md", "wiki/concepts/betting-table.md"},
|
||||
nil, "2026-04-22"))
|
||||
|
||||
content, err := os.ReadFile(filepath.Join(dir, "log.md"))
|
||||
require.NoError(t, err)
|
||||
s := string(content)
|
||||
assert.Contains(t, s, "shape-up-book")
|
||||
assert.Contains(t, s, "wiki/sources/shape-up.md")
|
||||
assert.True(t, strings.HasPrefix(s, "## 2026-04-22"))
|
||||
}
|
||||
|
||||
func TestAppendLog_AppendsOnSecondCall(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, AppendLog(dir, "source-a", []string{"wiki/sources/a.md"}, nil, "2026-04-22"))
|
||||
require.NoError(t, AppendLog(dir, "source-b", []string{"wiki/sources/b.md"}, nil, "2026-04-22"))
|
||||
|
||||
content, err := os.ReadFile(filepath.Join(dir, "log.md"))
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, string(content), "source-a")
|
||||
assert.Contains(t, string(content), "source-b")
|
||||
}
|
||||
90
ingestion/internal/wiki/inventory.go
Normal file
90
ingestion/internal/wiki/inventory.go
Normal file
@@ -0,0 +1,90 @@
|
||||
// ingestion/internal/wiki/inventory.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// LoadInventory walks brain/wiki/ and returns all pages grouped by type.
|
||||
// Missing subdirectories are silently skipped.
|
||||
func LoadInventory(brainDir string) (map[PageType][]Entry, error) {
|
||||
result := map[PageType][]Entry{
|
||||
PageTypeConcept: {},
|
||||
PageTypeEntity: {},
|
||||
PageTypeSource: {},
|
||||
}
|
||||
for pt := range result {
|
||||
dir := filepath.Join(brainDir, "wiki", string(pt))
|
||||
entries, err := os.ReadDir(dir)
|
||||
if os.IsNotExist(err) {
|
||||
continue
|
||||
}
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("read dir %s: %w", dir, err)
|
||||
}
|
||||
for _, e := range entries {
|
||||
if e.IsDir() || !strings.HasSuffix(e.Name(), ".md") {
|
||||
continue
|
||||
}
|
||||
slug := strings.TrimSuffix(e.Name(), ".md")
|
||||
path := filepath.Join(dir, e.Name())
|
||||
title, aliases := readFrontmatter(path, slug)
|
||||
result[pt] = append(result[pt], Entry{Slug: slug, Title: title, Aliases: aliases, Type: pt})
|
||||
}
|
||||
}
|
||||
return result, nil
|
||||
}
|
||||
|
||||
// readFrontmatter extracts title and aliases from YAML frontmatter.
|
||||
// Falls back to slug for title and empty aliases on any error.
|
||||
func readFrontmatter(path, fallbackSlug string) (title string, aliases []string) {
|
||||
title = fallbackSlug
|
||||
f, err := os.Open(path)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
defer f.Close() //nolint:errcheck
|
||||
|
||||
scanner := bufio.NewScanner(f)
|
||||
inFM := false
|
||||
inAliases := false
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
if strings.TrimSpace(line) == "---" {
|
||||
if !inFM {
|
||||
inFM = true
|
||||
continue
|
||||
}
|
||||
break // end of frontmatter
|
||||
}
|
||||
if !inFM {
|
||||
continue
|
||||
}
|
||||
|
||||
// Detect alias list items (lines starting with " - ").
|
||||
if inAliases {
|
||||
trimmed := strings.TrimSpace(line)
|
||||
if strings.HasPrefix(trimmed, "- ") {
|
||||
aliases = append(aliases, strings.TrimPrefix(trimmed, "- "))
|
||||
continue
|
||||
}
|
||||
inAliases = false // end of alias block
|
||||
}
|
||||
|
||||
key, val, ok := strings.Cut(line, ":")
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
switch strings.TrimSpace(key) {
|
||||
case "title":
|
||||
title = strings.Trim(strings.TrimSpace(val), `"'`)
|
||||
case "aliases":
|
||||
inAliases = true
|
||||
}
|
||||
}
|
||||
return
|
||||
}
|
||||
83
ingestion/internal/wiki/inventory_test.go
Normal file
83
ingestion/internal/wiki/inventory_test.go
Normal file
@@ -0,0 +1,83 @@
|
||||
// ingestion/internal/wiki/inventory_test.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestLoadInventory(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
|
||||
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "concepts", "domain-driven-design.md"),
|
||||
[]byte("---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA thing.\n"),
|
||||
0o644,
|
||||
))
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "entities", "ryan-singer.md"),
|
||||
[]byte("---\ntitle: Ryan Singer\n---\n\n## Description\n\nDesigner.\n"),
|
||||
0o644,
|
||||
))
|
||||
|
||||
inv, err := LoadInventory(dir)
|
||||
require.NoError(t, err)
|
||||
|
||||
assert.Len(t, inv[PageTypeConcept], 1)
|
||||
assert.Equal(t, "domain-driven-design", inv[PageTypeConcept][0].Slug)
|
||||
assert.Equal(t, "Domain Driven Design", inv[PageTypeConcept][0].Title)
|
||||
|
||||
assert.Len(t, inv[PageTypeEntity], 1)
|
||||
assert.Equal(t, "ryan-singer", inv[PageTypeEntity][0].Slug)
|
||||
|
||||
assert.Empty(t, inv[PageTypeSource])
|
||||
}
|
||||
|
||||
func TestLoadInventory_EmptyDirs(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
|
||||
|
||||
inv, err := LoadInventory(dir)
|
||||
require.NoError(t, err)
|
||||
assert.Empty(t, inv[PageTypeConcept])
|
||||
assert.Empty(t, inv[PageTypeEntity])
|
||||
assert.Empty(t, inv[PageTypeSource])
|
||||
}
|
||||
|
||||
func TestLoadInventory_MissingDirsOk(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
// No wiki/ subdirs at all
|
||||
inv, err := LoadInventory(dir)
|
||||
require.NoError(t, err)
|
||||
assert.NotNil(t, inv)
|
||||
}
|
||||
|
||||
func TestLoadInventory_ReadsAliases(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
|
||||
require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
|
||||
|
||||
require.NoError(t, os.WriteFile(
|
||||
filepath.Join(dir, "wiki", "entities", "ryan-singer.md"),
|
||||
[]byte("---\ntitle: Ryan Singer\naliases:\n - Singer\n - R. Singer\n---\n\n## Description\n\nDesigner.\n"),
|
||||
0o644,
|
||||
))
|
||||
|
||||
inv, err := LoadInventory(dir)
|
||||
require.NoError(t, err)
|
||||
|
||||
require.Len(t, inv[PageTypeEntity], 1)
|
||||
e := inv[PageTypeEntity][0]
|
||||
assert.Equal(t, "Ryan Singer", e.Title)
|
||||
assert.Equal(t, []string{"Singer", "R. Singer"}, e.Aliases)
|
||||
}
|
||||
40
ingestion/internal/wiki/log.go
Normal file
40
ingestion/internal/wiki/log.go
Normal file
@@ -0,0 +1,40 @@
|
||||
// ingestion/internal/wiki/log.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// AppendLog appends one ingestion record to brain/log.md.
|
||||
func AppendLog(brainDir, source string, pages, warnings []string, date string) error {
|
||||
var sb strings.Builder
|
||||
fmt.Fprintf(&sb, "## %s — ingest\n\n", date)
|
||||
fmt.Fprintf(&sb, "- **Source:** %s\n", source)
|
||||
if len(pages) > 0 {
|
||||
sb.WriteString("- **Pages written:**\n")
|
||||
for _, p := range pages {
|
||||
fmt.Fprintf(&sb, " - %s\n", p)
|
||||
}
|
||||
}
|
||||
if len(warnings) > 0 {
|
||||
sb.WriteString("- **Warnings:**\n")
|
||||
for _, w := range warnings {
|
||||
fmt.Fprintf(&sb, " - %s\n", w)
|
||||
}
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
|
||||
logPath := filepath.Join(brainDir, "log.md")
|
||||
f, err := os.OpenFile(logPath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
|
||||
if err != nil {
|
||||
return fmt.Errorf("open log: %w", err)
|
||||
}
|
||||
if _, err = f.WriteString(sb.String()); err != nil {
|
||||
f.Close() //nolint:errcheck
|
||||
return fmt.Errorf("write log: %w", err)
|
||||
}
|
||||
return f.Close()
|
||||
}
|
||||
120
ingestion/internal/wiki/merge.go
Normal file
120
ingestion/internal/wiki/merge.go
Normal file
@@ -0,0 +1,120 @@
|
||||
// ingestion/internal/wiki/merge.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
var bulletSections = map[string]bool{
|
||||
"Related Concepts": true,
|
||||
"Related Entities": true,
|
||||
"Sources": true,
|
||||
"Key Claims": true,
|
||||
"Entities Mentioned": true,
|
||||
"Concepts Introduced or Reinforced": true,
|
||||
"Chapters": true,
|
||||
}
|
||||
|
||||
var appendSections = map[string]bool{
|
||||
"Evolving Notes": true,
|
||||
"Updates": true,
|
||||
"Open Questions Raised": true,
|
||||
"Open Questions": true,
|
||||
}
|
||||
|
||||
type section struct {
|
||||
heading string
|
||||
content string
|
||||
}
|
||||
|
||||
// Merge combines two Page values with the same path.
|
||||
// Frontmatter is taken from a. Sections are merged by strategy:
|
||||
// bullet sections union unique lines, append sections concatenate,
|
||||
// all others keep a's version. Sections in b not present in a are appended.
|
||||
func Merge(a, b Page) Page {
|
||||
fmA, secsA := parseSections(a.Content)
|
||||
_, secsB := parseSections(b.Content)
|
||||
|
||||
idx := make(map[string]int, len(secsA))
|
||||
for i, s := range secsA {
|
||||
idx[s.heading] = i
|
||||
}
|
||||
|
||||
for _, sB := range secsB {
|
||||
i, exists := idx[sB.heading]
|
||||
if !exists {
|
||||
idx[sB.heading] = len(secsA)
|
||||
secsA = append(secsA, sB)
|
||||
continue
|
||||
}
|
||||
sA := secsA[i]
|
||||
switch {
|
||||
case bulletSections[sB.heading]:
|
||||
secsA[i].content = mergeBullets(sA.content, sB.content)
|
||||
case appendSections[sB.heading]:
|
||||
secsA[i].content = strings.TrimRight(sA.content, "\n") + "\n\n" + strings.TrimLeft(sB.content, "\n")
|
||||
}
|
||||
}
|
||||
|
||||
return Page{Path: a.Path, Content: rebuildContent(fmA, secsA)}
|
||||
}
|
||||
|
||||
func parseSections(markdown string) (frontmatter string, sections []section) {
|
||||
lines := strings.Split(markdown, "\n")
|
||||
i := 0
|
||||
|
||||
if i < len(lines) && strings.TrimSpace(lines[i]) == "---" {
|
||||
i++
|
||||
var fmLines []string
|
||||
for i < len(lines) {
|
||||
if strings.TrimSpace(lines[i]) == "---" {
|
||||
i++
|
||||
break
|
||||
}
|
||||
fmLines = append(fmLines, lines[i])
|
||||
i++
|
||||
}
|
||||
frontmatter = fmt.Sprintf("---\n%s\n---\n", strings.Join(fmLines, "\n"))
|
||||
}
|
||||
|
||||
var cur *section
|
||||
for ; i < len(lines); i++ {
|
||||
line := lines[i]
|
||||
if strings.HasPrefix(line, "## ") {
|
||||
if cur != nil {
|
||||
sections = append(sections, *cur)
|
||||
}
|
||||
cur = §ion{heading: strings.TrimPrefix(line, "## ")}
|
||||
} else if cur != nil {
|
||||
cur.content += line + "\n"
|
||||
}
|
||||
}
|
||||
if cur != nil {
|
||||
sections = append(sections, *cur)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func rebuildContent(frontmatter string, sections []section) string {
|
||||
var sb strings.Builder
|
||||
sb.WriteString(frontmatter)
|
||||
for _, sec := range sections {
|
||||
fmt.Fprintf(&sb, "\n## %s\n\n%s", sec.heading, sec.content)
|
||||
}
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
func mergeBullets(a, b string) string {
|
||||
seen := make(map[string]bool)
|
||||
var lines []string
|
||||
for _, line := range strings.Split(a+b, "\n") {
|
||||
trimmed := strings.TrimSpace(line)
|
||||
if trimmed == "" || seen[trimmed] {
|
||||
continue
|
||||
}
|
||||
seen[trimmed] = true
|
||||
lines = append(lines, line)
|
||||
}
|
||||
return strings.Join(lines, "\n") + "\n"
|
||||
}
|
||||
55
ingestion/internal/wiki/merge_test.go
Normal file
55
ingestion/internal/wiki/merge_test.go
Normal file
@@ -0,0 +1,55 @@
|
||||
// ingestion/internal/wiki/merge_test.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
)
|
||||
|
||||
func TestMerge_BulletSectionsUnion(t *testing.T) {
|
||||
a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Related Concepts\n\n- [[bar|Bar]]\n"}
|
||||
b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Related Concepts\n\n- [[bar|Bar]]\n- [[baz|Baz]]\n"}
|
||||
|
||||
got := Merge(a, b)
|
||||
assert.Contains(t, got.Content, "[[bar|Bar]]")
|
||||
assert.Contains(t, got.Content, "[[baz|Baz]]")
|
||||
assert.Equal(t, 1, strings.Count(got.Content, "[[bar|Bar]]"))
|
||||
}
|
||||
|
||||
func TestMerge_AppendSections(t *testing.T) {
|
||||
a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Evolving Notes\n\nFirst note.\n"}
|
||||
b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Evolving Notes\n\nSecond note.\n"}
|
||||
|
||||
got := Merge(a, b)
|
||||
assert.Contains(t, got.Content, "First note.")
|
||||
assert.Contains(t, got.Content, "Second note.")
|
||||
}
|
||||
|
||||
func TestMerge_KeepFirstForOtherSections(t *testing.T) {
|
||||
a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFirst definition.\n"}
|
||||
b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nSecond definition.\n"}
|
||||
|
||||
got := Merge(a, b)
|
||||
assert.Contains(t, got.Content, "First definition.")
|
||||
assert.NotContains(t, got.Content, "Second definition.")
|
||||
}
|
||||
|
||||
func TestMerge_NewSectionFromB(t *testing.T) {
|
||||
a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nA thing.\n"}
|
||||
b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Why It Matters\n\nBecause reasons.\n"}
|
||||
|
||||
got := Merge(a, b)
|
||||
assert.Contains(t, got.Content, "A thing.")
|
||||
assert.Contains(t, got.Content, "Because reasons.")
|
||||
}
|
||||
|
||||
func TestMerge_KeepsFrontmatterFromA(t *testing.T) {
|
||||
a := Page{Path: "p.md", Content: "---\ntitle: A\nlast_updated: 2026-01-01\n---\n\n## Definition\n\nA.\n"}
|
||||
b := Page{Path: "p.md", Content: "---\ntitle: B\nlast_updated: 2026-06-01\n---\n\n## Definition\n\nB.\n"}
|
||||
|
||||
got := Merge(a, b)
|
||||
assert.Contains(t, got.Content, "title: A")
|
||||
assert.NotContains(t, got.Content, "title: B")
|
||||
}
|
||||
28
ingestion/internal/wiki/slug.go
Normal file
28
ingestion/internal/wiki/slug.go
Normal file
@@ -0,0 +1,28 @@
|
||||
// ingestion/internal/wiki/slug.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"unicode"
|
||||
)
|
||||
|
||||
// Slug converts a title to a kebab-case slug suitable for wiki filenames.
|
||||
// Rules: lowercase, spaces/hyphens/underscores → hyphens, strip everything else.
|
||||
func Slug(title string) string {
|
||||
var b strings.Builder
|
||||
prevHyphen := true // start true to trim leading hyphens
|
||||
for _, r := range strings.ToLower(title) {
|
||||
switch {
|
||||
case r == ' ' || r == '-' || r == '_':
|
||||
if !prevHyphen {
|
||||
b.WriteRune('-')
|
||||
prevHyphen = true
|
||||
}
|
||||
case unicode.IsLetter(r) || unicode.IsDigit(r):
|
||||
b.WriteRune(r)
|
||||
prevHyphen = false
|
||||
// all other characters (apostrophes, colons, dots, etc.) are dropped
|
||||
}
|
||||
}
|
||||
return strings.TrimRight(b.String(), "-")
|
||||
}
|
||||
29
ingestion/internal/wiki/slug_test.go
Normal file
29
ingestion/internal/wiki/slug_test.go
Normal file
@@ -0,0 +1,29 @@
|
||||
// ingestion/internal/wiki/slug_test.go
|
||||
package wiki
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
)
|
||||
|
||||
func TestSlug(t *testing.T) {
|
||||
tests := []struct {
|
||||
input string
|
||||
want string
|
||||
}{
|
||||
{"Domain Driven Design", "domain-driven-design"},
|
||||
{"It's Complicated", "its-complicated"},
|
||||
{"gRPC", "grpc"},
|
||||
{"GPT-4o", "gpt-4o"},
|
||||
{"Property 1: It's Rough", "property-1-its-rough"},
|
||||
{" leading spaces ", "leading-spaces"},
|
||||
{"multiple spaces", "multiple-spaces"},
|
||||
{"already-kebab", "already-kebab"},
|
||||
}
|
||||
for _, tc := range tests {
|
||||
t.Run(tc.input, func(t *testing.T) {
|
||||
assert.Equal(t, tc.want, Slug(tc.input))
|
||||
})
|
||||
}
|
||||
}
|
||||
25
ingestion/internal/wiki/types.go
Normal file
25
ingestion/internal/wiki/types.go
Normal file
@@ -0,0 +1,25 @@
|
||||
// ingestion/internal/wiki/types.go
|
||||
package wiki
|
||||
|
||||
// PageType identifies the wiki subdirectory for a page.
|
||||
type PageType string
|
||||
|
||||
const (
|
||||
PageTypeConcept PageType = "concepts"
|
||||
PageTypeEntity PageType = "entities"
|
||||
PageTypeSource PageType = "sources"
|
||||
)
|
||||
|
||||
// Page is a wiki page to be written to disk.
|
||||
type Page struct {
|
||||
Path string // relative to brainDir, e.g. "wiki/sources/foo.md"
|
||||
Content string // full markdown including YAML frontmatter
|
||||
}
|
||||
|
||||
// Entry is a summary of an existing wiki page used to build the inventory.
|
||||
type Entry struct {
|
||||
Slug string
|
||||
Title string
|
||||
Aliases []string
|
||||
Type PageType
|
||||
}
|
||||
76
internal/brain/client.go
Normal file
76
internal/brain/client.go
Normal file
@@ -0,0 +1,76 @@
|
||||
// internal/brain/client.go
|
||||
// Package brain provides a lightweight client for querying the ingestion server.
|
||||
// Skill handlers call Query before spawning workers to inject relevant knowledge
|
||||
// from the brain into the task prompt. Errors are suppressed — the brain is
|
||||
// optional context; its absence must never block a skill invocation.
|
||||
package brain
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"strings"
|
||||
)
|
||||
|
||||
type queryResult struct {
|
||||
Path string `json:"path"`
|
||||
Title string `json:"title"`
|
||||
Excerpt string `json:"excerpt"`
|
||||
Score int `json:"score"`
|
||||
}
|
||||
|
||||
// Query calls the ingestion server and returns relevant knowledge as a
|
||||
// formatted string ready to prepend to a worker task prompt.
|
||||
// Returns empty string (no error) when baseURL or query is empty,
|
||||
// when the brain is unreachable, or when no results are found.
|
||||
func Query(ctx context.Context, baseURL, query string, limit int) (string, error) {
|
||||
if baseURL == "" || strings.TrimSpace(query) == "" {
|
||||
return "", nil
|
||||
}
|
||||
if limit <= 0 {
|
||||
limit = 3
|
||||
}
|
||||
|
||||
body, _ := json.Marshal(map[string]any{"query": query, "limit": limit})
|
||||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, baseURL+"/query", bytes.NewReader(body))
|
||||
if err != nil {
|
||||
slog.Warn("brain: build request failed", "err", err)
|
||||
return "", nil
|
||||
}
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
|
||||
resp, err := http.DefaultClient.Do(req)
|
||||
if err != nil {
|
||||
slog.Warn("brain: ingestion server unreachable", "err", err)
|
||||
return "", nil
|
||||
}
|
||||
defer func() { _ = resp.Body.Close() }()
|
||||
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
slog.Warn("brain: ingestion server returned non-OK", "status", resp.StatusCode)
|
||||
return "", nil
|
||||
}
|
||||
|
||||
out, _ := io.ReadAll(resp.Body)
|
||||
var result struct {
|
||||
Results []queryResult `json:"results"`
|
||||
}
|
||||
if err := json.Unmarshal(out, &result); err != nil || len(result.Results) == 0 {
|
||||
return "", nil
|
||||
}
|
||||
|
||||
var b strings.Builder
|
||||
b.WriteString("## Relevant knowledge\n\n")
|
||||
for _, r := range result.Results {
|
||||
title := r.Title
|
||||
if title == "" {
|
||||
title = r.Path
|
||||
}
|
||||
fmt.Fprintf(&b, "### %s\n%s\n\n", title, r.Excerpt)
|
||||
}
|
||||
return b.String(), nil
|
||||
}
|
||||
67
internal/brain/client_test.go
Normal file
67
internal/brain/client_test.go
Normal file
@@ -0,0 +1,67 @@
|
||||
package brain_test
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
|
||||
"github.com/mathiasbq/supervisor/internal/brain"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestQueryEmptyBaseURL(t *testing.T) {
|
||||
result, err := brain.Query(context.Background(), "", "tdd patterns", 3)
|
||||
require.NoError(t, err)
|
||||
assert.Empty(t, result)
|
||||
}
|
||||
|
||||
func TestQueryEmptyQuery(t *testing.T) {
|
||||
result, err := brain.Query(context.Background(), "http://localhost:9999", "", 3)
|
||||
require.NoError(t, err)
|
||||
assert.Empty(t, result)
|
||||
}
|
||||
|
||||
func TestQueryFormatsResults(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
assert.Equal(t, "/query", r.URL.Path)
|
||||
var req map[string]any
|
||||
require.NoError(t, json.NewDecoder(r.Body).Decode(&req))
|
||||
assert.Equal(t, "tdd patterns", req["query"])
|
||||
|
||||
json.NewEncoder(w).Encode(map[string]any{ //nolint:errcheck
|
||||
"results": []map[string]any{
|
||||
{"path": "knowledge/tdd.md", "title": "TDD Guide", "excerpt": "Always write tests first.", "score": 5},
|
||||
{"path": "knowledge/go.md", "title": "Go Conventions", "excerpt": "Use table-driven tests.", "score": 3},
|
||||
},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
result, err := brain.Query(context.Background(), srv.URL, "tdd patterns", 3)
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, result, "## Relevant knowledge")
|
||||
assert.Contains(t, result, "TDD Guide")
|
||||
assert.Contains(t, result, "Always write tests first.")
|
||||
assert.Contains(t, result, "Go Conventions")
|
||||
}
|
||||
|
||||
func TestQueryEmptyResults(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
json.NewEncoder(w).Encode(map[string]any{"results": []any{}}) //nolint:errcheck
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
result, err := brain.Query(context.Background(), srv.URL, "obscure query", 3)
|
||||
require.NoError(t, err)
|
||||
assert.Empty(t, result)
|
||||
}
|
||||
|
||||
func TestQueryUnavailableServerReturnsEmpty(t *testing.T) {
|
||||
// Brain unavailable — should degrade gracefully, no error
|
||||
result, err := brain.Query(context.Background(), "http://127.0.0.1:19999", "query", 3)
|
||||
require.NoError(t, err)
|
||||
assert.Empty(t, result)
|
||||
}
|
||||
@@ -9,6 +9,8 @@ type Config struct {
|
||||
ConfigDir string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor
|
||||
ModelsFile string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml
|
||||
IngestBaseURL string // INGEST_BASE_URL, default http://localhost:3300
|
||||
IngestSvcURL string // INGEST_SVC_URL — base URL for brain_ingest (/ingest, /ingest-path)
|
||||
KBRetrievalURL string // KB_RETRIEVAL_URL — base URL for brain_search
|
||||
SessionsDir string // SUPERVISOR_SESSIONS_DIR, default ./brain/sessions
|
||||
BrainDir string // SUPERVISOR_BRAIN_DIR, default ./brain
|
||||
}
|
||||
@@ -22,6 +24,8 @@ func Load() (Config, error) {
|
||||
}
|
||||
cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml")
|
||||
cfg.IngestBaseURL = envOr("INGEST_BASE_URL", "http://localhost:3300")
|
||||
cfg.IngestSvcURL = envOr("INGEST_SVC_URL", "")
|
||||
cfg.KBRetrievalURL = envOr("KB_RETRIEVAL_URL", "")
|
||||
cfg.SessionsDir = envOr("SUPERVISOR_SESSIONS_DIR", "./brain/sessions")
|
||||
cfg.BrainDir = envOr("SUPERVISOR_BRAIN_DIR", "./brain")
|
||||
return cfg, nil
|
||||
|
||||
@@ -7,9 +7,13 @@ import (
|
||||
"gopkg.in/yaml.v3"
|
||||
)
|
||||
|
||||
type skillChain struct {
|
||||
Chain []string `yaml:"chain"`
|
||||
}
|
||||
|
||||
type modelsFile struct {
|
||||
Default string `yaml:"default"`
|
||||
Skills map[string]string `yaml:"skills"`
|
||||
DefaultChain []string `yaml:"default_chain"`
|
||||
Skills map[string]skillChain `yaml:"skills"`
|
||||
}
|
||||
|
||||
type Models struct {
|
||||
@@ -28,16 +32,18 @@ func LoadModels(path string) (Models, error) {
|
||||
return Models{data: f}, nil
|
||||
}
|
||||
|
||||
// Resolve returns the model for a skill, respecting three-layer priority:
|
||||
// 1. override (from MCP call) — highest
|
||||
// 2. per-skill default from models.yaml
|
||||
// 3. global default
|
||||
func (m Models) Resolve(skill, override string) string {
|
||||
// ModelFor returns the primary model to use for a skill.
|
||||
// If override is non-empty, it is returned directly.
|
||||
// Falls back to default_chain[0] when the skill has no explicit entry.
|
||||
func (m Models) ModelFor(skill, override string) string {
|
||||
if override != "" {
|
||||
return override
|
||||
}
|
||||
if model, ok := m.data.Skills[skill]; ok {
|
||||
return model
|
||||
if sc, ok := m.data.Skills[skill]; ok && len(sc.Chain) > 0 {
|
||||
return sc.Chain[0]
|
||||
}
|
||||
return m.data.Default
|
||||
if len(m.data.DefaultChain) > 0 {
|
||||
return m.data.DefaultChain[0]
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
@@ -10,35 +10,44 @@ import (
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestModelsResolve(t *testing.T) {
|
||||
yaml := `
|
||||
default: ollama/default-model
|
||||
const testYAML = `
|
||||
default_chain:
|
||||
- ollama/qwen3-coder-30b-tuned
|
||||
- claude-sonnet-4-6
|
||||
|
||||
skills:
|
||||
tdd: ollama/qwen3-coder-30b-tuned
|
||||
review: ollama/devstral-tuned
|
||||
review:
|
||||
chain:
|
||||
- ollama/devstral-tuned
|
||||
- ollama/gemma4
|
||||
- claude-sonnet-4-6
|
||||
spec:
|
||||
chain:
|
||||
- ollama/phi4
|
||||
- claude-opus-4-6
|
||||
`
|
||||
|
||||
func writeModels(t *testing.T, content string) string {
|
||||
t.Helper()
|
||||
f := filepath.Join(t.TempDir(), "models.yaml")
|
||||
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))
|
||||
|
||||
m, err := config.LoadModels(f)
|
||||
require.NoError(t, err)
|
||||
|
||||
assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.Resolve("tdd", ""))
|
||||
assert.Equal(t, "ollama/devstral-tuned", m.Resolve("review", ""))
|
||||
assert.Equal(t, "ollama/default-model", m.Resolve("unknown", ""))
|
||||
require.NoError(t, os.WriteFile(f, []byte(content), 0644))
|
||||
return f
|
||||
}
|
||||
|
||||
func TestModelsOverride(t *testing.T) {
|
||||
yaml := `
|
||||
default: ollama/default-model
|
||||
skills:
|
||||
tdd: ollama/qwen3-coder-30b-tuned
|
||||
`
|
||||
f := filepath.Join(t.TempDir(), "models.yaml")
|
||||
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))
|
||||
|
||||
m, err := config.LoadModels(f)
|
||||
func TestModelsModelForSkillWithEntry(t *testing.T) {
|
||||
m, err := config.LoadModels(writeModels(t, testYAML))
|
||||
require.NoError(t, err)
|
||||
|
||||
assert.Equal(t, "anthropic/claude-sonnet-4-6", m.Resolve("tdd", "anthropic/claude-sonnet-4-6"))
|
||||
assert.Equal(t, "ollama/devstral-tuned", m.ModelFor("review", ""))
|
||||
}
|
||||
|
||||
func TestModelsModelForDefaultFallback(t *testing.T) {
|
||||
m, err := config.LoadModels(writeModels(t, testYAML))
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.ModelFor("trainer", ""))
|
||||
}
|
||||
|
||||
func TestModelsModelForCallerOverride(t *testing.T) {
|
||||
m, err := config.LoadModels(writeModels(t, testYAML))
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "claude-opus-4-6", m.ModelFor("review", "claude-opus-4-6"))
|
||||
}
|
||||
|
||||
@@ -1,108 +0,0 @@
|
||||
package exec
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Config holds executor configuration.
|
||||
type Config struct {
|
||||
ClaudeBinary string // path to claude binary, defaults to "claude"
|
||||
SystemPrompt string // contents of supervisor CLAUDE.md
|
||||
Timeout time.Duration // per-invocation timeout, default 120s
|
||||
LiteLLMBaseURL string // passed to Claude so it can delegate to Ollama
|
||||
LiteLLMAPIKey string // passed to Claude for LiteLLM auth
|
||||
}
|
||||
|
||||
// Request is the input to a single supervisor invocation.
|
||||
type Request struct {
|
||||
SkillPrompt string // skill-specific discipline (e.g. tdd.md contents)
|
||||
TaskPrompt string // the specific task (phase, project_root, spec, model)
|
||||
Model string // resolved model name, passed in task prompt
|
||||
Tools string // comma-separated allowed tools, default "Bash,Read,Write"
|
||||
}
|
||||
|
||||
// Executor spawns a claude instance and captures its structured JSON output.
|
||||
type Executor struct {
|
||||
cfg Config
|
||||
}
|
||||
|
||||
func New(cfg Config) *Executor {
|
||||
if cfg.ClaudeBinary == "" {
|
||||
cfg.ClaudeBinary = "claude"
|
||||
}
|
||||
if cfg.Timeout == 0 {
|
||||
cfg.Timeout = 120 * time.Second
|
||||
}
|
||||
return &Executor{cfg: cfg}
|
||||
}
|
||||
|
||||
func (e *Executor) Run(ctx context.Context, req Request) (Result, error) {
|
||||
ctx, cancel := context.WithTimeout(ctx, e.cfg.Timeout)
|
||||
defer cancel()
|
||||
|
||||
tools := req.Tools
|
||||
if tools == "" {
|
||||
tools = "Bash,Read,Write"
|
||||
}
|
||||
|
||||
// Build the full prompt: system rules + skill rules + infra context + task.
|
||||
// LITELLM_API_KEY is injected as a subprocess env var, not in the prompt,
|
||||
// to prevent it appearing in error log output.
|
||||
litellmCtx := fmt.Sprintf("LITELLM_BASE_URL: %s", e.cfg.LiteLLMBaseURL)
|
||||
prompt := strings.Join([]string{
|
||||
e.cfg.SystemPrompt,
|
||||
"---",
|
||||
req.SkillPrompt,
|
||||
"---",
|
||||
litellmCtx,
|
||||
"---",
|
||||
req.TaskPrompt,
|
||||
}, "\n\n")
|
||||
|
||||
args := []string{
|
||||
"--print",
|
||||
"--permission-mode", "bypassPermissions",
|
||||
"--tools", tools,
|
||||
"--json-schema", Schema,
|
||||
"--output-format", "json",
|
||||
prompt,
|
||||
}
|
||||
|
||||
cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...)
|
||||
cmd.Env = append(os.Environ(), "LITELLM_API_KEY="+e.cfg.LiteLLMAPIKey)
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
|
||||
if err := cmd.Run(); err != nil {
|
||||
if ctx.Err() != nil {
|
||||
return Result{}, fmt.Errorf("timeout after %s", e.cfg.Timeout)
|
||||
}
|
||||
return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String())
|
||||
}
|
||||
|
||||
// --output-format json wraps the response in an envelope; structured output
|
||||
// from --json-schema is in the "structured_output" field.
|
||||
var envelope struct {
|
||||
StructuredOutput *Result `json:"structured_output"`
|
||||
IsError bool `json:"is_error"`
|
||||
Result string `json:"result"` // fallback text result for error messages
|
||||
}
|
||||
if err := json.Unmarshal(stdout.Bytes(), &envelope); err != nil {
|
||||
return Result{}, fmt.Errorf("parse envelope JSON: %w — raw: %s — stderr: %s", err, stdout.String(), stderr.String())
|
||||
}
|
||||
if envelope.StructuredOutput == nil {
|
||||
return Result{}, fmt.Errorf("no structured_output in response — result: %s — stderr: %s", envelope.Result, stderr.String())
|
||||
}
|
||||
if err := envelope.StructuredOutput.Validate(); err != nil {
|
||||
return Result{}, fmt.Errorf("invalid result: %w", err)
|
||||
}
|
||||
return *envelope.StructuredOutput, nil
|
||||
}
|
||||
@@ -1,77 +0,0 @@
|
||||
package exec_test
|
||||
|
||||
import (
|
||||
"context"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
iexec "github.com/mathiasbq/supervisor/internal/exec"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
// fakeClaudePath writes a shell script that prints fixed output and returns its path.
|
||||
func fakeClaudePath(t *testing.T, output string, exitCode int) string {
|
||||
t.Helper()
|
||||
dir := t.TempDir()
|
||||
script := filepath.Join(dir, "claude")
|
||||
var content string
|
||||
if exitCode != 0 {
|
||||
content = "#!/bin/sh\necho 'error' >&2\nexit 1\n"
|
||||
} else {
|
||||
content = "#!/bin/sh\necho '" + output + "'\n"
|
||||
}
|
||||
require.NoError(t, os.WriteFile(script, []byte(content), 0755))
|
||||
return script
|
||||
}
|
||||
|
||||
func TestExecutorParsesValidResult(t *testing.T) {
|
||||
// Fake claude emits the --output-format json envelope that the real CLI produces.
|
||||
// The executor extracts the result from the "structured_output" field.
|
||||
envelope := `{"type":"result","subtype":"success","is_error":false,"structured_output":{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}}`
|
||||
claude := fakeClaudePath(t, envelope, 0)
|
||||
|
||||
ex := iexec.New(iexec.Config{
|
||||
ClaudeBinary: claude,
|
||||
SystemPrompt: "you are a supervisor",
|
||||
Timeout: 5 * time.Second,
|
||||
})
|
||||
|
||||
result, err := ex.Run(context.Background(), iexec.Request{
|
||||
SkillPrompt: "tdd rules",
|
||||
TaskPrompt: "run red phase",
|
||||
})
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "pass", result.Status)
|
||||
assert.True(t, result.Verified)
|
||||
}
|
||||
|
||||
func TestExecutorReturnsErrorOnNonZeroExit(t *testing.T) {
|
||||
claude := fakeClaudePath(t, "", 1)
|
||||
|
||||
ex := iexec.New(iexec.Config{
|
||||
ClaudeBinary: claude,
|
||||
SystemPrompt: "you are a supervisor",
|
||||
Timeout: 5 * time.Second,
|
||||
})
|
||||
|
||||
_, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "fail"})
|
||||
assert.Error(t, err)
|
||||
}
|
||||
|
||||
func TestExecutorTimesOut(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
script := filepath.Join(dir, "claude")
|
||||
require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\nsleep 60\n"), 0755))
|
||||
|
||||
ex := iexec.New(iexec.Config{
|
||||
ClaudeBinary: script,
|
||||
SystemPrompt: "you are a supervisor",
|
||||
Timeout: 100 * time.Millisecond,
|
||||
})
|
||||
|
||||
_, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"})
|
||||
assert.ErrorContains(t, err, "timeout")
|
||||
}
|
||||
127
internal/exec/litellm.go
Normal file
127
internal/exec/litellm.go
Normal file
@@ -0,0 +1,127 @@
|
||||
package exec
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// LiteLLMExecutor calls a LiteLLM-compatible /v1/chat/completions endpoint
|
||||
// and returns the raw assistant message text.
|
||||
type LiteLLMExecutor struct {
|
||||
baseURL string
|
||||
apiKey string
|
||||
httpClient *http.Client
|
||||
}
|
||||
|
||||
// NewLiteLLM creates a LiteLLMExecutor.
|
||||
// timeout applies to the full HTTP round-trip per call.
|
||||
func NewLiteLLM(baseURL, apiKey string, timeout time.Duration) *LiteLLMExecutor {
|
||||
if timeout == 0 {
|
||||
timeout = 120 * time.Second
|
||||
}
|
||||
return &LiteLLMExecutor{
|
||||
baseURL: baseURL,
|
||||
apiKey: apiKey,
|
||||
httpClient: &http.Client{Timeout: timeout},
|
||||
}
|
||||
}
|
||||
|
||||
type litellmMessage struct {
|
||||
Role string `json:"role"`
|
||||
Content string `json:"content"`
|
||||
}
|
||||
|
||||
type litellmRequest struct {
|
||||
Model string `json:"model"`
|
||||
Messages []litellmMessage `json:"messages"`
|
||||
}
|
||||
|
||||
type litellmChoice struct {
|
||||
Message litellmMessage `json:"message"`
|
||||
}
|
||||
|
||||
type litellmResponse struct {
|
||||
Choices []litellmChoice `json:"choices"`
|
||||
}
|
||||
|
||||
// Complete sends system+user messages to the given model and returns the raw
|
||||
// assistant text along with the round-trip duration in milliseconds.
|
||||
func (e *LiteLLMExecutor) Complete(ctx context.Context, model, system, user string) (string, int64, error) {
|
||||
body := litellmRequest{
|
||||
Model: model,
|
||||
Messages: []litellmMessage{
|
||||
{Role: "system", Content: system},
|
||||
{Role: "user", Content: user},
|
||||
},
|
||||
}
|
||||
|
||||
bodyBytes, err := json.Marshal(body)
|
||||
if err != nil {
|
||||
return "", 0, fmt.Errorf("litellm: marshal request: %w", err)
|
||||
}
|
||||
|
||||
httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, e.baseURL+"/v1/chat/completions", bytes.NewReader(bodyBytes))
|
||||
if err != nil {
|
||||
return "", 0, fmt.Errorf("litellm: create request: %w", err)
|
||||
}
|
||||
httpReq.Header.Set("Content-Type", "application/json")
|
||||
if e.apiKey != "" {
|
||||
httpReq.Header.Set("Authorization", "Bearer "+e.apiKey)
|
||||
}
|
||||
|
||||
t0 := time.Now()
|
||||
resp, err := e.httpClient.Do(httpReq)
|
||||
if err != nil {
|
||||
return "", 0, fmt.Errorf("litellm: request failed: %w", err)
|
||||
}
|
||||
defer resp.Body.Close() //nolint:errcheck
|
||||
durationMs := time.Since(t0).Milliseconds()
|
||||
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
return "", 0, fmt.Errorf("litellm: server returned status %d", resp.StatusCode)
|
||||
}
|
||||
|
||||
var chatResp litellmResponse
|
||||
if err := json.NewDecoder(resp.Body).Decode(&chatResp); err != nil {
|
||||
return "", 0, fmt.Errorf("litellm: decode response: %w", err)
|
||||
}
|
||||
if len(chatResp.Choices) == 0 {
|
||||
return "", 0, fmt.Errorf("litellm: no choices in response")
|
||||
}
|
||||
|
||||
return stripResultJSON(chatResp.Choices[0].Message.Content), durationMs, nil
|
||||
}
|
||||
|
||||
// stripResultJSON removes trailing ```json blocks that match the old structured
|
||||
// result schema (containing "status" and "phase" keys). Some local models produce
|
||||
// correct markdown prose but then append the old JSON format out of habit.
|
||||
func stripResultJSON(text string) string {
|
||||
const fence = "```json"
|
||||
idx := len(text) - 1
|
||||
// Walk backwards past trailing whitespace.
|
||||
for idx >= 0 && (text[idx] == '\n' || text[idx] == '\r' || text[idx] == ' ') {
|
||||
idx--
|
||||
}
|
||||
// Must end with closing fence.
|
||||
if idx < 2 || text[idx-2:idx+1] != "```" {
|
||||
return text
|
||||
}
|
||||
// Find the matching opening fence.
|
||||
start := len(text[:idx-2]) - 1
|
||||
for start >= 0 {
|
||||
if start+len(fence) <= len(text) && text[start:start+len(fence)] == fence {
|
||||
block := text[start : idx+1]
|
||||
if strings.Contains(block, `"status"`) && strings.Contains(block, `"phase"`) {
|
||||
return strings.TrimRight(text[:start], " \t\r\n")
|
||||
}
|
||||
break
|
||||
}
|
||||
start--
|
||||
}
|
||||
return text
|
||||
}
|
||||
124
internal/exec/litellm_test.go
Normal file
124
internal/exec/litellm_test.go
Normal file
@@ -0,0 +1,124 @@
|
||||
package exec_test
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
iexec "github.com/mathiasbq/supervisor/internal/exec"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func chatResponse(t *testing.T, content string) []byte {
|
||||
t.Helper()
|
||||
resp := map[string]any{
|
||||
"choices": []map[string]any{
|
||||
{"message": map[string]any{"role": "assistant", "content": content}},
|
||||
},
|
||||
}
|
||||
data, err := json.Marshal(resp)
|
||||
require.NoError(t, err)
|
||||
return data
|
||||
}
|
||||
|
||||
func TestLiteLLMReturnsText(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
assert.Equal(t, "/v1/chat/completions", r.URL.Path)
|
||||
assert.Equal(t, "application/json", r.Header.Get("Content-Type"))
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write(chatResponse(t, "here is my analysis"))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
|
||||
text, dur, err := ex.Complete(context.Background(), "ollama/devstral", "system prompt", "user prompt")
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, "here is my analysis", text)
|
||||
assert.GreaterOrEqual(t, dur, int64(0))
|
||||
}
|
||||
|
||||
func TestLiteLLMSendsAuthHeader(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
assert.Equal(t, "Bearer secret", r.Header.Get("Authorization"))
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write(chatResponse(t, "ok"))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
ex := iexec.NewLiteLLM(srv.URL, "secret", 5*time.Second)
|
||||
_, _, err := ex.Complete(context.Background(), "model", "sys", "user")
|
||||
require.NoError(t, err)
|
||||
}
|
||||
|
||||
func TestLiteLLMErrorOnNonOKStatus(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusServiceUnavailable)
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
|
||||
_, _, err := ex.Complete(context.Background(), "model", "sys", "user")
|
||||
assert.ErrorContains(t, err, "503")
|
||||
}
|
||||
|
||||
func TestLiteLLMErrorOnEmptyChoices(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write([]byte(`{"choices":[]}`))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
|
||||
_, _, err := ex.Complete(context.Background(), "model", "sys", "user")
|
||||
assert.ErrorContains(t, err, "no choices")
|
||||
}
|
||||
|
||||
func TestLiteLLMStripsTrailingResultJSON(t *testing.T) {
|
||||
content := "## Hypotheses\n\n**H1 (high):** nil map access.\n\n```json\n{\n \"status\": \"pass\",\n \"phase\": \"debug\",\n \"skill\": \"debug\"\n}\n```"
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write(chatResponse(t, content))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
|
||||
text, _, err := ex.Complete(context.Background(), "model", "sys", "user")
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, text, "nil map access")
|
||||
assert.NotContains(t, text, `"status"`)
|
||||
assert.NotContains(t, text, "```json")
|
||||
}
|
||||
|
||||
func TestLiteLLMKeepsNonResultJSONFence(t *testing.T) {
|
||||
// A json block that is part of the actual answer (no status/phase) should be kept.
|
||||
content := "Use this config:\n\n```json\n{\"model\": \"koala/phi4\"}\n```"
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write(chatResponse(t, content))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
|
||||
text, _, err := ex.Complete(context.Background(), "model", "sys", "user")
|
||||
require.NoError(t, err)
|
||||
assert.Contains(t, text, `"model"`)
|
||||
assert.Contains(t, text, "```json")
|
||||
}
|
||||
|
||||
func TestLiteLLMRespectsContextCancellation(t *testing.T) {
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
cancel()
|
||||
|
||||
ex := iexec.NewLiteLLM("http://invalid.example.com", "", 1*time.Second)
|
||||
_, _, err := ex.Complete(ctx, "model", "sys", "user")
|
||||
assert.Error(t, err)
|
||||
}
|
||||
@@ -1,61 +0,0 @@
|
||||
package exec
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// Result is the structured JSON output from every supervisor invocation.
|
||||
// The JSON schema constant is passed to claude via --json-schema so Claude
|
||||
// validates its own output before returning.
|
||||
type Result struct {
|
||||
Status string `json:"status"` // pass | fail | error
|
||||
Phase string `json:"phase"` // red | green | refactor
|
||||
Skill string `json:"skill"` // tdd | review | ...
|
||||
FilePath string `json:"file_path"` // absolute path to generated file
|
||||
RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner
|
||||
Verified bool `json:"verified"` // based on exit code, never self-report
|
||||
ModelUsed string `json:"model_used"` // model name or "self"
|
||||
Message string `json:"message"` // one sentence summary
|
||||
}
|
||||
|
||||
var validStatuses = map[string]bool{"pass": true, "fail": true, "error": true}
|
||||
var validPhases = map[string]bool{
|
||||
"red": true,
|
||||
"green": true,
|
||||
"refactor": true,
|
||||
"retrospective": true,
|
||||
}
|
||||
|
||||
func (r Result) Validate() error {
|
||||
var errs []string
|
||||
if !validStatuses[r.Status] {
|
||||
errs = append(errs, "status must be pass|fail|error, got: "+r.Status)
|
||||
}
|
||||
if !validPhases[r.Phase] {
|
||||
errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase)
|
||||
}
|
||||
if r.Skill == "" {
|
||||
errs = append(errs, "skill is required")
|
||||
}
|
||||
if len(errs) > 0 {
|
||||
return errors.New(strings.Join(errs, "; "))
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Schema is passed to claude --json-schema to enforce structured output.
|
||||
const Schema = `{
|
||||
"type": "object",
|
||||
"required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"],
|
||||
"properties": {
|
||||
"status": {"type": "string", "enum": ["pass","fail","error"]},
|
||||
"phase": {"type": "string", "enum": ["red","green","refactor"]},
|
||||
"skill": {"type": "string"},
|
||||
"file_path": {"type": "string"},
|
||||
"runner_output": {"type": "string"},
|
||||
"verified": {"type": "boolean"},
|
||||
"model_used": {"type": "string"},
|
||||
"message": {"type": "string"}
|
||||
}
|
||||
}`
|
||||
@@ -1,71 +0,0 @@
|
||||
package exec_test
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"testing"
|
||||
|
||||
"github.com/mathiasbq/supervisor/internal/exec"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestResultParsesValidJSON(t *testing.T) {
|
||||
raw := `{
|
||||
"status": "pass",
|
||||
"phase": "red",
|
||||
"skill": "tdd",
|
||||
"file_path": "/tmp/foo_test.go",
|
||||
"runner_output": "--- FAIL: TestFoo",
|
||||
"verified": true,
|
||||
"model_used": "self",
|
||||
"message": "test fails as expected"
|
||||
}`
|
||||
var r exec.Result
|
||||
require.NoError(t, json.Unmarshal([]byte(raw), &r))
|
||||
assert.Equal(t, "pass", r.Status)
|
||||
assert.Equal(t, "red", r.Phase)
|
||||
assert.True(t, r.Verified)
|
||||
}
|
||||
|
||||
func TestResultValidation(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
result exec.Result
|
||||
wantErr bool
|
||||
}{
|
||||
{
|
||||
name: "valid pass result",
|
||||
result: exec.Result{
|
||||
Status: "pass", Phase: "red", Skill: "tdd",
|
||||
FilePath: "/tmp/x_test.go", RunnerOutput: "FAIL",
|
||||
Verified: true, ModelUsed: "self", Message: "ok",
|
||||
},
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "empty status",
|
||||
result: exec.Result{Phase: "red", Skill: "tdd"},
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "invalid status",
|
||||
result: exec.Result{Status: "unknown", Phase: "red", Skill: "tdd"},
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "invalid phase",
|
||||
result: exec.Result{Status: "pass", Phase: "bad", Skill: "tdd"},
|
||||
wantErr: true,
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
err := tt.result.Validate()
|
||||
if tt.wantErr {
|
||||
assert.Error(t, err)
|
||||
} else {
|
||||
assert.NoError(t, err)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -43,6 +43,11 @@ func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
||||
return
|
||||
}
|
||||
|
||||
// JSON-RPC 2.0 notifications (no id) must not receive a response.
|
||||
if req.ID == nil {
|
||||
return
|
||||
}
|
||||
|
||||
var result any
|
||||
var rpcErr *rpcError
|
||||
|
||||
|
||||
@@ -5,6 +5,7 @@ import (
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/mathiasbq/supervisor/internal/mcp"
|
||||
@@ -76,3 +77,39 @@ func TestMCPUnknownMethod(t *testing.T) {
|
||||
require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
|
||||
assert.NotNil(t, resp["error"])
|
||||
}
|
||||
|
||||
func TestMCPNotificationKnownMethodGetsNoResponseBody(t *testing.T) {
|
||||
reg := registry.New()
|
||||
srv := mcp.NewServer(reg)
|
||||
|
||||
// JSON-RPC 2.0 notification: "id" field absent. Per spec, server MUST NOT
|
||||
// reply. notifications/initialized is part of the standard MCP handshake.
|
||||
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
|
||||
"jsonrpc": "2.0",
|
||||
"method": "notifications/initialized",
|
||||
}))
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
rr := httptest.NewRecorder()
|
||||
srv.ServeHTTP(rr, req)
|
||||
|
||||
assert.Equal(t, http.StatusOK, rr.Code)
|
||||
assert.Empty(t, strings.TrimSpace(rr.Body.String()),
|
||||
"notifications must not receive a response body")
|
||||
}
|
||||
|
||||
func TestMCPNotificationUnknownMethodGetsNoResponseBody(t *testing.T) {
|
||||
reg := registry.New()
|
||||
srv := mcp.NewServer(reg)
|
||||
|
||||
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
|
||||
"jsonrpc": "2.0",
|
||||
"method": "notifications/totally-unknown",
|
||||
}))
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
rr := httptest.NewRecorder()
|
||||
srv.ServeHTTP(rr, req)
|
||||
|
||||
assert.Equal(t, http.StatusOK, rr.Code)
|
||||
assert.Empty(t, strings.TrimSpace(rr.Body.String()),
|
||||
"unknown notifications must also receive no response body")
|
||||
}
|
||||
|
||||
56
internal/session/history.go
Normal file
56
internal/session/history.go
Normal file
@@ -0,0 +1,56 @@
|
||||
// internal/session/history.go
|
||||
package session
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// FormatHistory formats prior session entries as a structured block for
|
||||
// injection into a worker task prompt. Entries matching excludePhase are
|
||||
// omitted (pass the current phase to avoid circular injection).
|
||||
func FormatHistory(entries []Entry, excludePhase string) string {
|
||||
var filtered []Entry
|
||||
for _, e := range entries {
|
||||
if e.Phase != excludePhase {
|
||||
filtered = append(filtered, e)
|
||||
}
|
||||
}
|
||||
if len(filtered) == 0 {
|
||||
return ""
|
||||
}
|
||||
|
||||
var b strings.Builder
|
||||
b.WriteString("## Session history\n\n")
|
||||
for _, e := range filtered {
|
||||
fmt.Fprintf(&b, "### Phase: %s\n", e.Phase) //nolint:errcheck // strings.Builder never errors
|
||||
fmt.Fprintf(&b, "- Skill: %s\n", e.Skill) //nolint:errcheck
|
||||
fmt.Fprintf(&b, "- Status: %s\n", e.FinalStatus) //nolint:errcheck
|
||||
if e.FilePath != "" {
|
||||
fmt.Fprintf(&b, "- File: %s\n", e.FilePath) //nolint:errcheck
|
||||
}
|
||||
if e.Message != "" {
|
||||
fmt.Fprintf(&b, "- Summary: %s\n", e.Message) //nolint:errcheck
|
||||
}
|
||||
b.WriteString("\n")
|
||||
}
|
||||
return b.String()
|
||||
}
|
||||
|
||||
// PrependHistory reads the session log for sessionID and prepends a formatted
|
||||
// history block to task. Returns task unchanged if sessionID or sessionsDir is
|
||||
// empty, or if no prior entries exist.
|
||||
func PrependHistory(sessionsDir, sessionID, currentPhase, task string) string {
|
||||
if sessionID == "" || sessionsDir == "" {
|
||||
return task
|
||||
}
|
||||
entries, err := Read(sessionsDir, sessionID)
|
||||
if err != nil || len(entries) == 0 {
|
||||
return task
|
||||
}
|
||||
history := FormatHistory(entries, currentPhase)
|
||||
if history == "" {
|
||||
return task
|
||||
}
|
||||
return history + "\n---\n\n" + task
|
||||
}
|
||||
85
internal/session/history_test.go
Normal file
85
internal/session/history_test.go
Normal file
@@ -0,0 +1,85 @@
|
||||
// internal/session/history_test.go
|
||||
package session_test
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/mathiasbq/supervisor/internal/session"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestFormatHistoryEmpty(t *testing.T) {
|
||||
result := session.FormatHistory(nil, "")
|
||||
assert.Equal(t, "", result)
|
||||
}
|
||||
|
||||
func TestFormatHistoryFormatsEntries(t *testing.T) {
|
||||
entries := []session.Entry{
|
||||
{
|
||||
Skill: "tdd", Phase: "red", FinalStatus: "pass",
|
||||
FilePath: "internal/foo/foo_test.go",
|
||||
Message: "wrote failing test for Foo",
|
||||
Timestamp: time.Now(),
|
||||
},
|
||||
}
|
||||
result := session.FormatHistory(entries, "")
|
||||
assert.Contains(t, result, "## Session history")
|
||||
assert.Contains(t, result, "Phase: red")
|
||||
assert.Contains(t, result, "wrote failing test for Foo")
|
||||
assert.Contains(t, result, "internal/foo/foo_test.go")
|
||||
}
|
||||
|
||||
func TestFormatHistoryExcludesCurrentPhase(t *testing.T) {
|
||||
entries := []session.Entry{
|
||||
{Skill: "tdd", Phase: "red", Message: "red done", FinalStatus: "pass"},
|
||||
{Skill: "tdd", Phase: "green", Message: "green done", FinalStatus: "pass"},
|
||||
}
|
||||
result := session.FormatHistory(entries, "green")
|
||||
assert.Contains(t, result, "red done")
|
||||
assert.NotContains(t, result, "green done")
|
||||
}
|
||||
|
||||
func TestPrependHistoryNoSessionID(t *testing.T) {
|
||||
result := session.PrependHistory("", "", "review", "do the task")
|
||||
assert.Equal(t, "do the task", result)
|
||||
}
|
||||
|
||||
func TestPrependHistoryNoLog(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
result := session.PrependHistory(dir, "sess-abc", "review", "do the task")
|
||||
assert.Equal(t, "do the task", result)
|
||||
}
|
||||
|
||||
func TestPrependHistoryPrependsHistory(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
entry := session.Entry{
|
||||
SessionID: "sess-abc", Skill: "tdd", Phase: "red",
|
||||
FinalStatus: "pass", Message: "wrote test",
|
||||
Timestamp: time.Now(),
|
||||
}
|
||||
require.NoError(t, session.Append(dir, "sess-abc", entry))
|
||||
|
||||
result := session.PrependHistory(dir, "sess-abc", "review", "do the task")
|
||||
assert.Contains(t, result, "## Session history")
|
||||
assert.Contains(t, result, "wrote test")
|
||||
assert.True(t, strings.HasSuffix(result, "do the task"))
|
||||
}
|
||||
|
||||
func TestPrependHistoryExcludesCurrentPhase(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
require.NoError(t, session.Append(dir, "sess-abc", session.Entry{
|
||||
SessionID: "sess-abc", Skill: "tdd", Phase: "red",
|
||||
FinalStatus: "pass", Message: "red done", Timestamp: time.Now(),
|
||||
}))
|
||||
require.NoError(t, session.Append(dir, "sess-abc", session.Entry{
|
||||
SessionID: "sess-abc", Skill: "tdd", Phase: "green",
|
||||
FinalStatus: "pass", Message: "green done", Timestamp: time.Now(),
|
||||
}))
|
||||
|
||||
result := session.PrependHistory(dir, "sess-abc", "green", "do the task")
|
||||
assert.Contains(t, result, "red done")
|
||||
assert.NotContains(t, result, "green done")
|
||||
}
|
||||
@@ -32,9 +32,14 @@ type Entry struct {
|
||||
type Attempt struct {
|
||||
Attempt int `json:"attempt"`
|
||||
Model string `json:"model"`
|
||||
Tier string `json:"tier"` // local | subagent | managed
|
||||
DurationMs int64 `json:"duration_ms"`
|
||||
WarmStart bool `json:"warm_start"` // model already loaded in llama-swap
|
||||
Verified bool `json:"verified"`
|
||||
Verdict string `json:"verdict,omitempty"` // accept | escalate | error
|
||||
Feedback string `json:"feedback,omitempty"` // verifier feedback on escalation
|
||||
OutputSummary string `json:"output_summary,omitempty"`
|
||||
RunnerOutput string `json:"runner_output,omitempty"`
|
||||
Verified bool `json:"verified"`
|
||||
}
|
||||
|
||||
// Append writes entry as a single JSON line to sessionsDir/{sessionID}.jsonl.
|
||||
|
||||
@@ -61,3 +61,22 @@ func TestRead_EmptyWhenNoFile(t *testing.T) {
|
||||
require.NoError(t, err)
|
||||
assert.Empty(t, entries)
|
||||
}
|
||||
|
||||
func TestAttemptRoundTrip(t *testing.T) {
|
||||
a := session.Attempt{
|
||||
Attempt: 1,
|
||||
Model: "ollama/devstral",
|
||||
Tier: "local",
|
||||
DurationMs: 4200,
|
||||
WarmStart: true,
|
||||
Verified: false,
|
||||
Verdict: "escalate",
|
||||
Feedback: "missing line references",
|
||||
}
|
||||
data, err := json.Marshal(a)
|
||||
require.NoError(t, err)
|
||||
|
||||
var got session.Attempt
|
||||
require.NoError(t, json.Unmarshal(data, &got))
|
||||
assert.Equal(t, a, got)
|
||||
}
|
||||
|
||||
@@ -10,13 +10,19 @@ import (
|
||||
"net/http"
|
||||
)
|
||||
|
||||
// Handle dispatches brain_query and brain_write tool calls.
|
||||
// Handle dispatches brain tool calls.
|
||||
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
|
||||
switch tool {
|
||||
case "brain_query":
|
||||
return s.query(ctx, args)
|
||||
case "brain_write":
|
||||
return s.write(ctx, args)
|
||||
case "brain_ingest_raw":
|
||||
return s.ingestRaw(ctx, args)
|
||||
case "brain_ingest":
|
||||
return s.ingest(ctx, args)
|
||||
case "brain_search":
|
||||
return s.search(ctx, args)
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown brain tool: %s", tool)
|
||||
}
|
||||
@@ -59,12 +65,101 @@ func (s *Skill) write(ctx context.Context, args json.RawMessage) (json.RawMessag
|
||||
return s.post(ctx, "/write", a)
|
||||
}
|
||||
|
||||
type ingestArgs struct {
|
||||
Content string `json:"content,omitempty"`
|
||||
Source string `json:"source,omitempty"`
|
||||
Path string `json:"path,omitempty"`
|
||||
DryRun bool `json:"dry_run,omitempty"`
|
||||
}
|
||||
|
||||
func (s *Skill) ingest(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
|
||||
var a ingestArgs
|
||||
if err := json.Unmarshal(args, &a); err != nil {
|
||||
return nil, fmt.Errorf("parse args: %w", err)
|
||||
}
|
||||
if s.cfg.IngestSvcURL == "" {
|
||||
return nil, fmt.Errorf("brain_ingest: INGEST_SVC_URL not configured")
|
||||
}
|
||||
if a.Path != "" && a.Content != "" {
|
||||
return nil, fmt.Errorf("path and content+source are mutually exclusive: provide one or the other")
|
||||
}
|
||||
if a.Path != "" {
|
||||
return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest-path", map[string]any{
|
||||
"path": a.Path,
|
||||
"source": a.Source,
|
||||
"dry_run": a.DryRun,
|
||||
})
|
||||
}
|
||||
if a.Content != "" && a.Source != "" {
|
||||
return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest", map[string]any{
|
||||
"content": a.Content,
|
||||
"source": a.Source,
|
||||
"dry_run": a.DryRun,
|
||||
})
|
||||
}
|
||||
return nil, fmt.Errorf("either content+source or path is required")
|
||||
}
|
||||
|
||||
type ingestRawArgs struct {
|
||||
Source string `json:"source"`
|
||||
Pages []any `json:"pages"`
|
||||
DryRun bool `json:"dry_run,omitempty"`
|
||||
}
|
||||
|
||||
func (s *Skill) ingestRaw(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
|
||||
var a ingestRawArgs
|
||||
if err := json.Unmarshal(args, &a); err != nil {
|
||||
return nil, fmt.Errorf("parse args: %w", err)
|
||||
}
|
||||
if s.cfg.IngestSvcURL == "" {
|
||||
return nil, fmt.Errorf("brain_ingest_raw: INGEST_SVC_URL not configured")
|
||||
}
|
||||
if a.Source == "" {
|
||||
return nil, fmt.Errorf("source is required")
|
||||
}
|
||||
if len(a.Pages) == 0 {
|
||||
return nil, fmt.Errorf("pages is required and must be non-empty")
|
||||
}
|
||||
return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest-raw", map[string]any{
|
||||
"source": a.Source,
|
||||
"pages": a.Pages,
|
||||
"dry_run": a.DryRun,
|
||||
})
|
||||
}
|
||||
|
||||
type searchArgs struct {
|
||||
Query string `json:"query"`
|
||||
Collection string `json:"collection,omitempty"`
|
||||
Limit int `json:"limit,omitempty"`
|
||||
}
|
||||
|
||||
func (s *Skill) search(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
|
||||
var a searchArgs
|
||||
if err := json.Unmarshal(args, &a); err != nil {
|
||||
return nil, fmt.Errorf("parse args: %w", err)
|
||||
}
|
||||
if a.Query == "" {
|
||||
return nil, fmt.Errorf("query is required")
|
||||
}
|
||||
if a.Limit == 0 {
|
||||
a.Limit = 5
|
||||
}
|
||||
if s.cfg.KBRetrievalURL == "" {
|
||||
return nil, fmt.Errorf("brain_search: KB_RETRIEVAL_URL not configured")
|
||||
}
|
||||
return s.postTo(ctx, s.cfg.KBRetrievalURL+"/api/v1/search", a)
|
||||
}
|
||||
|
||||
func (s *Skill) post(ctx context.Context, path string, body any) (json.RawMessage, error) {
|
||||
return s.postTo(ctx, s.cfg.IngestBaseURL+path, body)
|
||||
}
|
||||
|
||||
func (s *Skill) postTo(ctx context.Context, url string, body any) (json.RawMessage, error) {
|
||||
b, err := json.Marshal(body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("marshal request: %w", err)
|
||||
}
|
||||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, s.cfg.IngestBaseURL+path, bytes.NewReader(b))
|
||||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(b))
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("build request: %w", err)
|
||||
}
|
||||
|
||||
@@ -63,3 +63,60 @@ func TestHandle_UnknownTool_ReturnsError(t *testing.T) {
|
||||
_, err := s.Handle(context.Background(), "brain_unknown", nil)
|
||||
assert.Error(t, err)
|
||||
}
|
||||
|
||||
func TestIngest_RoutesToIngestPath(t *testing.T) {
|
||||
var capturedPath string
|
||||
var capturedBody map[string]any
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
capturedPath = r.URL.Path
|
||||
require.NoError(t, json.NewDecoder(r.Body).Decode(&capturedBody))
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{"pages": []string{"wiki/foo.md"}})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
s := brain.New(brain.Config{IngestSvcURL: srv.URL})
|
||||
args, _ := json.Marshal(map[string]any{"path": "/tmp/some-file.md"})
|
||||
out, err := s.Handle(context.Background(), "brain_ingest", args)
|
||||
require.NoError(t, err)
|
||||
|
||||
assert.Equal(t, "/ingest-path", capturedPath)
|
||||
assert.Equal(t, "/tmp/some-file.md", capturedBody["path"])
|
||||
|
||||
var result map[string]any
|
||||
require.NoError(t, json.Unmarshal(out, &result))
|
||||
pages := result["pages"].([]any)
|
||||
assert.Len(t, pages, 1)
|
||||
}
|
||||
|
||||
func TestIngest_RoutesToIngest(t *testing.T) {
|
||||
var capturedPath string
|
||||
var capturedBody map[string]any
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
capturedPath = r.URL.Path
|
||||
require.NoError(t, json.NewDecoder(r.Body).Decode(&capturedBody))
|
||||
_ = json.NewEncoder(w).Encode(map[string]any{"pages": []string{"wiki/bar.md"}})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
s := brain.New(brain.Config{IngestSvcURL: srv.URL})
|
||||
args, _ := json.Marshal(map[string]any{"content": "some content", "source": "my-source.md"})
|
||||
out, err := s.Handle(context.Background(), "brain_ingest", args)
|
||||
require.NoError(t, err)
|
||||
|
||||
assert.Equal(t, "/ingest", capturedPath)
|
||||
assert.Equal(t, "some content", capturedBody["content"])
|
||||
assert.Equal(t, "my-source.md", capturedBody["source"])
|
||||
|
||||
var result map[string]any
|
||||
require.NoError(t, json.Unmarshal(out, &result))
|
||||
pages := result["pages"].([]any)
|
||||
assert.Len(t, pages, 1)
|
||||
}
|
||||
|
||||
func TestIngest_MissingRequiredFields(t *testing.T) {
|
||||
s := brain.New(brain.Config{IngestSvcURL: "http://localhost:3300"})
|
||||
args, _ := json.Marshal(map[string]any{})
|
||||
_, err := s.Handle(context.Background(), "brain_ingest", args)
|
||||
require.Error(t, err)
|
||||
assert.Contains(t, err.Error(), "either content+source or path is required")
|
||||
}
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user