feat(brain): add Qwen3-Reranker to brain_answer for improved RAG quality #7

Closed
opened 2026-05-12 15:34:18 +00:00 by mathias · 1 comment
Owner

Context

brain_answer currently does BM25 top-10 → LLM synthesis. BM25 recall is decent but ranking is keyword-frequency based — semantically relevant chunks can rank low if they don't share exact terms with the query.

Qwen3-Reranker is available on iguana (cross-encoder, runs via Ollama). Adding a rerank step between retrieval and synthesis should improve answer quality with no change to the LLM call.

Proposed change

  1. After BM25 retrieval (top-10 or top-20), call Qwen3-Reranker on iguana to score each chunk against the query
  2. Take top-5 by reranker score, pass those to LLM synthesis
  3. Add BRAIN_RERANKER_URL env var (opt-in, same pattern as BRAIN_LLM_PRIMARY_URL)

Why deferred

Needs Qwen3-Reranker confirmed running on iguana and a clean HTTP API to call it (Ollama /api/rerank or custom wrapper). Verify model availability before building.

Acceptance criteria

  • BRAIN_RERANKER_URL unset → behaviour unchanged (BM25 top-10 direct to LLM)
  • BRAIN_RERANKER_URL set → BM25 top-20 → rerank → top-5 → LLM
  • Unit tests cover both paths
  • task check passes
## Context `brain_answer` currently does BM25 top-10 → LLM synthesis. BM25 recall is decent but ranking is keyword-frequency based — semantically relevant chunks can rank low if they don't share exact terms with the query. Qwen3-Reranker is available on iguana (cross-encoder, runs via Ollama). Adding a rerank step between retrieval and synthesis should improve answer quality with no change to the LLM call. ## Proposed change 1. After BM25 retrieval (top-10 or top-20), call Qwen3-Reranker on iguana to score each chunk against the query 2. Take top-5 by reranker score, pass those to LLM synthesis 3. Add `BRAIN_RERANKER_URL` env var (opt-in, same pattern as `BRAIN_LLM_PRIMARY_URL`) ## Why deferred Needs Qwen3-Reranker confirmed running on iguana and a clean HTTP API to call it (Ollama `/api/rerank` or custom wrapper). Verify model availability before building. ## Acceptance criteria - [ ] `BRAIN_RERANKER_URL` unset → behaviour unchanged (BM25 top-10 direct to LLM) - [ ] `BRAIN_RERANKER_URL` set → BM25 top-20 → rerank → top-5 → LLM - [ ] Unit tests cover both paths - [ ] `task check` passes
Author
Owner

Shipped in a56a4db.

Design note (deviation from spec)

The issue spoke of a reranker "score" with a top-5 by reranker score cut. Qwen3-Reranker as published on Ollama (no native /api/rerank in v0.21.1) returns a single yes/no token under its trained chat template — there's no logprob surface to extract a fine-grained float per pair.

Implementation choice: treat the reranker as a filter rather than a ranker.

  • BM25 retrieves top-20 (was top-10 unconditionally)
  • For each candidate, the cross-encoder yes/no template runs against the excerpt
  • "no" candidates are dropped
  • Ties within the surviving "yes" set are broken by BM25 rank — first 5 survivors go to the LLM

Net effect matches the spec's intent (better RAG quality, fewer irrelevant chunks to LLM) with the API actually available today. If /api/rerank or a logprob path lands in Ollama, swap parseYesNo for a float decoder; the rest of the wiring is stable.

Acceptance criteria

  • BRAIN_RERANKER_URL unset → behaviour unchanged (BM25 top-10 → LLM) — TestBrainAnswer_Synthesizes still green with no reranker injected
  • BRAIN_RERANKER_URL set → BM25 top-20 → rerank → ≤5 → LLM — TestBrainAnswer_RerankerFiltersBeforeLLM proves noise.md is dropped before the LLM call
  • Unit tests cover both paths plus reranker client edge cases (empty docs, ambiguous response, upstream 5xx)
  • task check clean

New env vars

Var Default Purpose
BRAIN_RERANKER_URL "" (disabled) Ollama-compatible base URL, e.g. http://iguana:11434
BRAIN_RERANKER_MODEL dengcao/Qwen3-Reranker-0.6B:F16 Model tag for the cross-encoder

Verified dengcao/Qwen3-Reranker-0.6B:F16 already loaded on iguana ollama before coding (per the issue's "verify model availability" gate).

Deploy

CI/CD auto-rebuilds the ingestion image. Add BRAIN_RERANKER_URL=http://iguana:11434 to the supervisor pod env to flip it on once the image rolls out.

Closing.

Shipped in `a56a4db`. ### Design note (deviation from spec) The issue spoke of a reranker "score" with a `top-5 by reranker score` cut. Qwen3-Reranker as published on Ollama (no native `/api/rerank` in `v0.21.1`) returns a single yes/no token under its trained chat template — there's no logprob surface to extract a fine-grained float per pair. Implementation choice: treat the reranker as a **filter** rather than a ranker. - BM25 retrieves top-20 (was top-10 unconditionally) - For each candidate, the cross-encoder yes/no template runs against the excerpt - "no" candidates are dropped - Ties within the surviving "yes" set are broken by BM25 rank — first 5 survivors go to the LLM Net effect matches the spec's intent (better RAG quality, fewer irrelevant chunks to LLM) with the API actually available today. If `/api/rerank` or a logprob path lands in Ollama, swap `parseYesNo` for a float decoder; the rest of the wiring is stable. ### Acceptance criteria - [x] `BRAIN_RERANKER_URL` unset → behaviour unchanged (BM25 top-10 → LLM) — `TestBrainAnswer_Synthesizes` still green with no reranker injected - [x] `BRAIN_RERANKER_URL` set → BM25 top-20 → rerank → ≤5 → LLM — `TestBrainAnswer_RerankerFiltersBeforeLLM` proves `noise.md` is dropped before the LLM call - [x] Unit tests cover both paths plus reranker client edge cases (empty docs, ambiguous response, upstream 5xx) - [x] `task check` clean ### New env vars | Var | Default | Purpose | |-----|---------|---------| | `BRAIN_RERANKER_URL` | "" (disabled) | Ollama-compatible base URL, e.g. `http://iguana:11434` | | `BRAIN_RERANKER_MODEL` | `dengcao/Qwen3-Reranker-0.6B:F16` | Model tag for the cross-encoder | Verified `dengcao/Qwen3-Reranker-0.6B:F16` already loaded on iguana ollama before coding (per the issue's "verify model availability" gate). ### Deploy CI/CD auto-rebuilds the ingestion image. Add `BRAIN_RERANKER_URL=http://iguana:11434` to the supervisor pod env to flip it on once the image rolls out. Closing.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mathias/hyperguild#7