Wraps pipeline.RunRaw directly. Same dry-run semantics as the HTTP
/ingest-raw endpoint. Test exercises a single concept page; asserts
returned path and that no file is written under dry_run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
api.WriteNote captures the file-write logic that was previously inline
in Handler.Write. The existing HTTP endpoint now delegates to it; the
new MCP brain_write tool reuses the same function. Path-traversal
guard is strengthened to explicitly reject filenames containing path
separators or "..", so the rejection is surfaced before filepath.Base
strips the suspicious component (the previous defense-in-depth prefix
check became unreachable for these inputs after Base normalisation).
HTTP error code for caller-input errors shifts from 500 to 400, which
is semantically correct and not exercised by any existing test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the existing search.Query function. Same BM25 over
brain/knowledge/ and brain/wiki/ that the HTTP /query endpoint serves.
Plan note: handleCall switch replaces the single-line stub from Task 1
— no unknownToolError type to remove since Task 1 inlined the error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy of internal/session from the supervisor module — the ingestion
service needs it for the upcoming session_log MCP tool. The supervisor
copy will be removed in the supervisor-retirement plan; until then
the two packages are intentionally identical and pinned (no edits).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an MCP HTTP handler under ingestion/internal/mcp. Implements
initialize, tools/list, and the JSON-RPC notification skip from prior
work. Tool dispatch is stubbed (returns unknown-tool error) and will be
filled in by subsequent tasks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allows callers to provide pre-structured RawPage data directly, bypassing the
LLM extraction step. The pipeline still handles slug computation, frontmatter,
link canonicalization, source back-references, and dedup — only the extraction
is skipped. Useful when a more capable model or manual curation produces the
structured data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change prompt to reflect new output format: title, type, subtype, domain, content
- Remove slug/path generation responsibility from LLM — pipeline now handles it
- Wikilinks change from [[slug|Display Name]] to [[Display Name]] only
- LLM no longer includes frontmatter or paths in output
docs(schema): update LLM output format and wikilink convention for Level 3
- Specify JSON schema: title, type, subtype, domain, content fields
- Remove frontmatter requirements from schema output (handled by pipeline)
- Simplify wikilink format to [[Display Name]] — no slug or pipe
- Pipeline now responsible for slug generation and frontmatter construction
These changes shift slug/frontmatter generation from LLM to pipeline,
reducing cognitive load on the model and improving control over output.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strips slug authority from the LLM. The new RawPage type carries only
{title, type, subtype, domain, content} — no paths or frontmatter.
Pipeline will derive slugs deterministically (Task 4).
pipeline.go gets a temporary bridge stub (TODO task4) to keep the
package compiling between tasks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New extract package: Text() dispatcher for .md/.txt passthrough and
PDF extraction via pdftotext subprocess
- wiki.Entry gains Aliases []string, loaded from YAML frontmatter
- Fuzzy entity resolution in pipeline: normalizes titles (lowercase,
strip articles, collapse hyphens) and matches proposed pages against
existing inventory slugs and aliases to prevent proliferation
- Watcher and API handler now use extract.Text() instead of os.ReadFile
- Dockerfile: apk add poppler-utils in Alpine runtime stage
Files dropped into brain/raw/ are now copied to processed/ or failed/ rather
than moved. A .processed or .failed marker is written next to the original so
the watcher skips it on subsequent polls without deleting it. This keeps
Syncthing-synced Obsidian vaults intact after ingestion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Start background watcher on startup when INGEST_WATCH_INTERVAL > 0
- Procfile: add INGEST_WATCH_INTERVAL=30 and INGEST_SVC_URL for supervisor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLAUDE.md has a specific meaning in the Claude Code ecosystem (agent
instructions). The wiki schema for the ingestion pipeline should live
in schema.md to avoid confusion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires pipeline.Run into the HTTP layer so callers can ingest raw text
or files/directories without touching the filesystem directly. Rewrites
main.go to parse LLM and watcher env vars and build pipeline.Config.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Polls brain/raw/ on a configurable ticker, derives human-readable source
names from filenames, runs the pipeline, and moves files to
processed/YYYY-MM-DD/ on success or failed/ on error with a log.md entry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds prompt.go (BuildPrompt + systemPrompt) and pipeline.go (Run, Config,
Result, mergeAll) that wire chunking, LLM calls, parse, merge, index rebuild,
and log append into a single ingestion pipeline. Includes integration tests
covering write, dry-run, and duplicate-path merge scenarios.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
brain_write with a custom filename omitted the .md extension, causing
search to skip the file (search.go filters on HasSuffix .md).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ingestion server is a pure-Go HTTP binary — alpine runtime, no node.js.
CD now builds both supervisor and ingestion images on every push,
updates both deployment.yaml files in the infra repo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty or whitespace-only queries would silently pass through to search,
returning meaningless results. Also removed the Domain field from
queryRequest — it was accepted but silently ignored since search.Query
has no domain parameter, which would confuse callers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements POST /query (BM25 search via internal/search) and POST /write
(raw file persistence to brain/raw/) as an api.Handler struct. Filename
is auto-generated when absent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both walk-level errors and ReadFile failures now use best-effort
semantics (warn via slog, continue) instead of mixed abort/silent-skip.
filepath.Rel error is now propagated from the callback instead of
discarded.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements search.Query which walks brainDir/wiki/**/*.md, scores files
by term-frequency across query tokens, and returns results sorted by
score descending. Uses only stdlib — no external search deps.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>