Files
hyperguild/ingestion/cmd/server/main.go
Mathias 57462b52ff
All checks were successful
CI / Lint / Test / Vet (push) Successful in 15s
CI / Mirror to GitHub (push) Successful in 3s
feat(brain): hybrid BM25 + pgvector retrieval (opt-in)
Wires nomic-embed-text (iguana ollama) + pgvector on the shared
postgres18 into brain_query / brain_answer via Reciprocal Rank Fusion.
Pure BM25 stays the default; setting BRAIN_PG_DSN and BRAIN_EMBED_URL
together opts in. Setting one without the other is misconfiguration →
exit 1.

New packages:

- internal/embed
  Client.Embed(ctx, text) → []float32 via POST {URL}/api/embed.
  Defaults to nomic-embed-text:latest (768 dim). nil-on-empty-URL so
  callers gate on a single nil check.

- internal/vectorstore
  PGStore wraps a pgxpool against postgres18. Init creates
  brain_embeddings(path PK, vector(768), updated_at) + HNSW cosine
  index idempotently. Upsert / Delete / Search / KnownPaths.
  Sync(brainDir, store, embedder) diffs brain/wiki/ against the store
  and upserts new files / deletes removed ones; StartSync runs it on
  a ticker (default 300s). Integration tests gated by BRAIN_PG_TEST_DSN.

- scripts/brain-embeddings-init.sql
  One-time DBA setup: brain DB, brain_app role, vector extension,
  GRANTs. Idempotent.

Search layer:

- search.QueryOptions gains Vector + Embedder fields.
- QueryContext is the cancellable variant; Query stays for callers.
- When both are set, BM25 (top-N) and pgvector (top-4N) candidates
  merge via Reciprocal Rank Fusion (k=60, Cormack et al. 2009 — no
  tuning knob, robust to scale differences between rankers).
- Vector-only hits are hydrated from disk so callers see uniform
  Result records (path, title, excerpt, wing, hall, score).
- Wing/hall filters still apply to vector candidates via path-prefix.
- On embedder/vector errors the search falls back to BM25 — embedding
  outage degrades quality but doesn't take the brain offline.

MCP wiring:

- mcp.Server.WithHybridRetrieval(v, e) opt-in setter, same shape as
  WithReranker.
- brainQuery and brainAnswer pass the wired vector/embedder through
  to search.QueryContext.

REST:

- POST /backfill-embeddings drives Sync synchronously. Returns
  {added, deleted, errors[]}. 503 when feature is unconfigured.

cmd/server/main.go:

- BRAIN_PG_DSN + BRAIN_EMBED_URL together enable hybrid; one alone
  → exit 1.
- vectorAdapter bridges *PGStore (returns []Hit) to
  search.VectorSearcher (which takes []VectorHit) without either
  package importing the other.
- BRAIN_EMBED_SYNC_INTERVAL (default 300s) controls the background
  Sync ticker.

Backend pivot from Qdrant to pgvector recorded in DECISIONS.md
2026-05-18 (supersedes 2026-04-08): postgres18 already runs in
databases/ ns, Qdrant was never deployed, one engine beats two.

Dependency: github.com/jackc/pgx/v5 — modern, native pgvector via
parametric vector literals.

Tests:
- embed.Client: empty-URL nil, request shape, dimension, upstream
  error propagation, empty-text rejection.
- vectorstore.PGStore: dimension validation (unit); upsert/search/
  KnownPaths (integration, BRAIN_PG_TEST_DSN-gated).
- vectorstore.Sync: adds new files, skips known, deletes
  disappeared, skips _index.md, no-op when nil, collects embedder
  errors.
- search.Query: hybrid promotes vector-only hits via RRF; falls
  back to BM25 on embedder error.

Closes hyperguild#8.
2026-05-18 23:11:25 +02:00

245 lines
8.3 KiB
Go

// ingestion/cmd/server/main.go
package main
import (
"context"
"fmt"
"log/slog"
"net/http"
"os"
"strconv"
"strings"
"time"
"github.com/mathiasbq/hyperguild/ingestion/internal/api"
"github.com/mathiasbq/hyperguild/ingestion/internal/auth"
"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
"github.com/mathiasbq/hyperguild/ingestion/internal/mcp"
"github.com/mathiasbq/hyperguild/ingestion/internal/embed"
"github.com/mathiasbq/hyperguild/ingestion/internal/oauth"
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
"github.com/mathiasbq/hyperguild/ingestion/internal/reranker"
"github.com/mathiasbq/hyperguild/ingestion/internal/search"
"github.com/mathiasbq/hyperguild/ingestion/internal/vectorstore"
"github.com/mathiasbq/hyperguild/ingestion/internal/watcher"
)
// vectorAdapter bridges *vectorstore.PGStore (returns []vectorstore.Hit)
// to the search.VectorSearcher interface (which uses []search.VectorHit).
// Kept here, not in either package, so neither has to import the other.
type vectorAdapter struct{ s *vectorstore.PGStore }
func (a vectorAdapter) Search(ctx context.Context, q []float32, limit int) ([]search.VectorHit, error) {
hits, err := a.s.Search(ctx, q, limit)
if err != nil {
return nil, err
}
out := make([]search.VectorHit, len(hits))
for i, h := range hits {
out[i] = search.VectorHit{Path: h.Path, Distance: h.Distance}
}
return out, nil
}
func envOr(key, fallback string) string {
if v := os.Getenv(key); v != "" {
return v
}
return fallback
}
func envInt(key string, fallback int) int {
if v := os.Getenv(key); v != "" {
if n, err := strconv.Atoi(v); err == nil {
return n
}
}
return fallback
}
func main() {
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
brainDir := envOr("INGEST_BRAIN_DIR", "../brain")
port := envOr("INGEST_PORT", "3300")
llmURL := envOr("INGEST_LLM_URL", "http://iguana:4000/v1")
llmKey := os.Getenv("INGEST_LLM_KEY")
llmModel := envOr("INGEST_LLM_MODEL", "koala/qwen35-9b-fast")
llmTimeoutMins := envInt("INGEST_LLM_TIMEOUT", 15)
chunkSize := envInt("INGEST_CHUNK_SIZE", 6000)
watchInterval := envInt("INGEST_WATCH_INTERVAL", 30)
llmClient := llm.New(llmURL, llmKey, llmModel, time.Duration(llmTimeoutMins)*time.Minute)
pipelineCfg := pipeline.Config{
Complete: llmClient.Complete,
ChunkSize: chunkSize,
}
h := api.NewHandler(brainDir, logger, pipelineCfg)
var answerComplete pipeline.CompleteFunc
if primaryURL := os.Getenv("BRAIN_LLM_PRIMARY_URL"); primaryURL != "" {
primaryModel := envOr("BRAIN_LLM_PRIMARY_MODEL", "gemma4:31b")
primaryKey := os.Getenv("BERGET_API_KEY")
timeoutMS := envInt("BRAIN_LLM_TIMEOUT_MS", 10000)
timeout := time.Duration(timeoutMS) * time.Millisecond
primary := llm.New(primaryURL, primaryKey, primaryModel, timeout)
router := &llm.Router{Primary: primary}
if fallbackURL := os.Getenv("BRAIN_LLM_FALLBACK_URL"); fallbackURL != "" {
fallbackModel := envOr("BRAIN_LLM_FALLBACK_MODEL", "gemma4:31b")
router.Fallback = llm.New(fallbackURL, "", fallbackModel, timeout)
}
answerComplete = router.Complete
logger.Info("brain answer LLM configured", "primary", primaryURL, "model", primaryModel)
}
mcpSrv := mcp.NewServer(brainDir, &pipelineCfg, llmClient.Complete, answerComplete)
if rerankURL := os.Getenv("BRAIN_RERANKER_URL"); rerankURL != "" {
rerankModel := envOr("BRAIN_RERANKER_MODEL", "dengcao/Qwen3-Reranker-0.6B:F16")
mcpSrv = mcpSrv.WithReranker(reranker.New(rerankURL, rerankModel))
logger.Info("brain reranker configured", "url", rerankURL, "model", rerankModel)
}
// Hybrid retrieval (pgvector + nomic-embed-text). Both env vars must
// be set together for the path to wire on; otherwise BM25-only.
var vectorStore *vectorstore.PGStore
pgDSN := os.Getenv("BRAIN_PG_DSN")
embedURL := os.Getenv("BRAIN_EMBED_URL")
switch {
case pgDSN != "" && embedURL != "":
embedModel := envOr("BRAIN_EMBED_MODEL", "nomic-embed-text:latest")
store, err := vectorstore.New(context.Background(), pgDSN)
if err != nil {
logger.Error("vector store init", "err", err)
os.Exit(1)
}
if err := store.Init(context.Background()); err != nil {
logger.Error("vector store migrate", "err", err)
os.Exit(1)
}
vectorStore = store
embedder := embed.New(embedURL, embedModel)
mcpSrv = mcpSrv.WithHybridRetrieval(vectorAdapter{s: store}, embedder)
h.WithEmbedSync(store, embedder)
logger.Info("brain hybrid retrieval enabled",
"pg", pgDSN[:strings.IndexByte(pgDSN+"@", '@')], // crude redaction
"embed_url", embedURL, "embed_model", embedModel)
case pgDSN == "" && embedURL == "":
// disabled — fine
default:
logger.Error("BRAIN_PG_DSN and BRAIN_EMBED_URL must be set together")
os.Exit(1)
}
mcpToken := os.Getenv("BRAIN_MCP_TOKEN")
if mcpToken == "" {
logger.Error("BRAIN_MCP_TOKEN not set")
os.Exit(1)
}
ctx := context.Background()
if watchInterval > 0 {
watcher.Start(ctx, watcher.Config{
BrainDir: brainDir,
Interval: time.Duration(watchInterval) * time.Second,
Pipeline: pipelineCfg,
})
}
if vectorStore != nil {
embedSyncInterval := envInt("BRAIN_EMBED_SYNC_INTERVAL", 300)
vectorstore.StartSync(ctx, brainDir, vectorStore,
embed.New(os.Getenv("BRAIN_EMBED_URL"),
envOr("BRAIN_EMBED_MODEL", "nomic-embed-text:latest")),
time.Duration(embedSyncInterval)*time.Second)
logger.Info("embed sync started", "interval_s", embedSyncInterval)
}
mux := http.NewServeMux()
mux.HandleFunc("POST /query", h.Query)
mux.HandleFunc("POST /write", h.Write)
mux.HandleFunc("POST /index", h.Index)
mux.HandleFunc("POST /ingest", h.Ingest)
mux.HandleFunc("POST /ingest-path", h.IngestPath)
mux.HandleFunc("POST /ingest-raw", h.IngestRaw)
mux.HandleFunc("POST /backfill-refs", h.BackfillRefs)
mux.HandleFunc("POST /backfill-embeddings", h.BackfillEmbeddings)
mux.HandleFunc("GET /pass-rate", h.PassRate)
var jwtValidator *auth.Validator
if dexURL := os.Getenv("DEX_ISSUER_URL"); dexURL != "" {
audience := os.Getenv("MCP_AUDIENCE")
v, err := auth.NewValidator(dexURL, audience)
if err != nil {
logger.Error("build jwt validator", "err", err)
os.Exit(1)
}
jwtValidator = v
logger.Info("jwt auth enabled", "issuer", dexURL)
}
// Resource-metadata URL is only emitted on 401 when Dex OAuth is
// configured. Static-Bearer-only deployments leave this empty so
// clients never see an OAuth challenge.
var resourceMetadataURL string
if dexURL := os.Getenv("DEX_ISSUER_URL"); dexURL != "" {
resourceURL := os.Getenv("MCP_RESOURCE_URL")
mux.HandleFunc("GET /.well-known/oauth-protected-resource",
auth.ProtectedResourceHandler(resourceURL, dexURL))
if resourceURL != "" {
resourceMetadataURL = strings.TrimRight(resourceURL, "/") + "/.well-known/oauth-protected-resource"
}
}
mux.Handle("/mcp", mcp.BearerAuth(mcpToken, jwtValidator, resourceMetadataURL, mcpSrv))
// Opt-in OAuth 2.0 client_credentials flow for claude.ai's custom-MCP
// integration UI, which has no static-Bearer field. Setting both
// OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET enables the token exchange;
// setting only one is misconfiguration → fail fast.
oauthID := os.Getenv("OAUTH_CLIENT_ID")
oauthSecret := os.Getenv("OAUTH_CLIENT_SECRET")
switch {
case oauthID != "" && oauthSecret != "":
issuer := os.Getenv("MCP_RESOURCE_URL")
if issuer == "" {
logger.Error("OAUTH_CLIENT_ID/SECRET set but MCP_RESOURCE_URL is empty; cannot derive issuer")
os.Exit(1)
}
mux.HandleFunc("GET /.well-known/oauth-authorization-server",
oauth.MetadataHandler(issuer))
mux.HandleFunc("POST /oauth/token", oauth.TokenHandler(oauth.TokenConfig{
ClientID: oauthID,
ClientSecret: oauthSecret,
AccessToken: mcpToken,
}))
logger.Info("oauth client_credentials enabled", "issuer", strings.TrimRight(issuer, "/"))
case oauthID == "" && oauthSecret == "":
// disabled — that's fine
default:
logger.Error("OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET must be set together")
os.Exit(1)
}
addr := ":" + port
watchIntervalLog := "disabled"
if watchInterval > 0 {
watchIntervalLog = fmt.Sprintf("%ds", watchInterval)
}
logger.Info("ingestion server starting",
"addr", addr,
"brain_dir", brainDir,
"llm_url", llmURL,
"llm_model", llmModel,
"chunk_size", chunkSize,
"watch_interval", watchIntervalLog,
"mcp_enabled", true,
)
if err := http.ListenAndServe(addr, mux); err != nil {
logger.Error("server stopped", "err", err)
os.Exit(1)
}
}