feat(brain_answer): Qwen3-Reranker cross-encoder filter (opt-in)
All checks were successful
CI / Lint / Test / Vet (push) Successful in 10s
CI / Mirror to GitHub (push) Successful in 3s

Adds an opt-in cross-encoder rerank step between BM25 retrieval and LLM
synthesis. With BRAIN_RERANKER_URL set, brain_answer retrieves BM25
top-20, scores each excerpt against the query via Qwen3-Reranker on
Ollama, drops the "no" answers, and forwards up to 5 surviving sources
to the LLM. Unset, behaviour is unchanged (BM25 top-10 → LLM).

The reranker is a *filter*, not a re-ranker: Qwen3-Reranker emits a
binary yes/no token under its native chat template, and ties within the
"yes" set are broken by BM25 rank — what got retrieved first stays
ahead.

New package ingestion/internal/reranker:
- Client with URL, Model, HTTP fields.
- New(url, model) returns nil on empty url so callers can treat
  "feature disabled" as a single nil check.
- Score(ctx, query, docs) issues one /api/generate call per doc using
  the Qwen3-Reranker yes/no chat template (verbatim, because the model
  was trained on this exact wording). Parses the first non-think token.

Wiring:
- mcp.Server gains a WithReranker fluent setter to keep NewServer
  signature stable.
- brain_answer's BM25 limit jumps to 20 only when a reranker is wired,
  to give the filter something to do.
- cmd/server/main.go reads BRAIN_RERANKER_URL (+ optional
  BRAIN_RERANKER_MODEL, default dengcao/Qwen3-Reranker-0.6B:F16).

Tests cover: nil-on-empty-url, ordered yes/no scoring, request shape
(model, prompt contents, yes/no template), ambiguous response → 0,
empty doc slice, upstream-error propagation, plus an end-to-end
brain_answer integration that proves only the relevant note reaches the
LLM when noise.md is rejected.

Closes hyperguild#7.
This commit is contained in:
Mathias
2026-05-18 22:55:46 +02:00
parent 58c57412a9
commit a56a4db963
6 changed files with 346 additions and 1 deletions

View File

@@ -10,6 +10,7 @@ import (
"net/http"
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
"github.com/mathiasbq/hyperguild/ingestion/internal/reranker"
)
type request struct {
@@ -37,6 +38,7 @@ type Server struct {
pipeline pipeline.Config
llm pipeline.CompleteFunc
answerLLM pipeline.CompleteFunc // nil = brain_answer and brain_classify unavailable
reranker *reranker.Client // nil = no rerank, BM25 top-10 → LLM
}
// NewServer constructs a Server bound to brainDir. pipelineCfg supplies the
@@ -50,6 +52,15 @@ func NewServer(brainDir string, pipelineCfg *pipeline.Config, llm pipeline.Compl
return &Server{brainDir: brainDir, pipeline: cfg, llm: llm, answerLLM: answerLLM}
}
// WithReranker installs an opt-in cross-encoder reranker. When set,
// brain_answer retrieves a wider BM25 candidate set and prunes it to
// the relevant ones before LLM synthesis. Returns the server for
// fluent chaining.
func (s *Server) WithReranker(r *reranker.Client) *Server {
s.reranker = r
return s
}
func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// MCP streamable HTTP: GET establishes the SSE stream for server-to-client events.
if r.Method == http.MethodGet {