feat(brain): add Qwen3-Reranker to brain_answer for improved RAG quality #7

New Issue

mathias · 2026-05-12T15:34:18Z

mathias commented

2026-05-12 15:34:18 +00:00

Context

brain_answer currently does BM25 top-10 → LLM synthesis. BM25 recall is decent but ranking is keyword-frequency based — semantically relevant chunks can rank low if they don't share exact terms with the query.

Qwen3-Reranker is available on iguana (cross-encoder, runs via Ollama). Adding a rerank step between retrieval and synthesis should improve answer quality with no change to the LLM call.

Proposed change

After BM25 retrieval (top-10 or top-20), call Qwen3-Reranker on iguana to score each chunk against the query
Take top-5 by reranker score, pass those to LLM synthesis
Add BRAIN_RERANKER_URL env var (opt-in, same pattern as BRAIN_LLM_PRIMARY_URL)

Why deferred

Needs Qwen3-Reranker confirmed running on iguana and a clean HTTP API to call it (Ollama /api/rerank or custom wrapper). Verify model availability before building.

Acceptance criteria

BRAIN_RERANKER_URL unset → behaviour unchanged (BM25 top-10 direct to LLM)
BRAIN_RERANKER_URL set → BM25 top-20 → rerank → top-5 → LLM
Unit tests cover both paths
task check passes

## Context `brain_answer` currently does BM25 top-10 → LLM synthesis. BM25 recall is decent but ranking is keyword-frequency based — semantically relevant chunks can rank low if they don't share exact terms with the query. Qwen3-Reranker is available on iguana (cross-encoder, runs via Ollama). Adding a rerank step between retrieval and synthesis should improve answer quality with no change to the LLM call. ## Proposed change 1. After BM25 retrieval (top-10 or top-20), call Qwen3-Reranker on iguana to score each chunk against the query 2. Take top-5 by reranker score, pass those to LLM synthesis 3. Add `BRAIN_RERANKER_URL` env var (opt-in, same pattern as `BRAIN_LLM_PRIMARY_URL`) ## Why deferred Needs Qwen3-Reranker confirmed running on iguana and a clean HTTP API to call it (Ollama `/api/rerank` or custom wrapper). Verify model availability before building. ## Acceptance criteria - [ ] `BRAIN_RERANKER_URL` unset → behaviour unchanged (BM25 top-10 direct to LLM) - [ ] `BRAIN_RERANKER_URL` set → BM25 top-20 → rerank → top-5 → LLM - [ ] Unit tests cover both paths - [ ] `task check` passes

mathias commented

2026-05-18 20:56:08 +00:00

Shipped in a56a4db.

Design note (deviation from spec)

The issue spoke of a reranker "score" with a top-5 by reranker score cut. Qwen3-Reranker as published on Ollama (no native /api/rerank in v0.21.1) returns a single yes/no token under its trained chat template — there's no logprob surface to extract a fine-grained float per pair.

Implementation choice: treat the reranker as a filter rather than a ranker.

BM25 retrieves top-20 (was top-10 unconditionally)
For each candidate, the cross-encoder yes/no template runs against the excerpt
"no" candidates are dropped
Ties within the surviving "yes" set are broken by BM25 rank — first 5 survivors go to the LLM

Net effect matches the spec's intent (better RAG quality, fewer irrelevant chunks to LLM) with the API actually available today. If /api/rerank or a logprob path lands in Ollama, swap parseYesNo for a float decoder; the rest of the wiring is stable.

Acceptance criteria

BRAIN_RERANKER_URL unset → behaviour unchanged (BM25 top-10 → LLM) — TestBrainAnswer_Synthesizes still green with no reranker injected
BRAIN_RERANKER_URL set → BM25 top-20 → rerank → ≤5 → LLM — TestBrainAnswer_RerankerFiltersBeforeLLM proves noise.md is dropped before the LLM call
Unit tests cover both paths plus reranker client edge cases (empty docs, ambiguous response, upstream 5xx)
task check clean

New env vars

Var	Default	Purpose
`BRAIN_RERANKER_URL`	"" (disabled)	Ollama-compatible base URL, e.g. `http://iguana:11434`
`BRAIN_RERANKER_MODEL`	`dengcao/Qwen3-Reranker-0.6B:F16`	Model tag for the cross-encoder

Verified dengcao/Qwen3-Reranker-0.6B:F16 already loaded on iguana ollama before coding (per the issue's "verify model availability" gate).

Deploy

CI/CD auto-rebuilds the ingestion image. Add BRAIN_RERANKER_URL=http://iguana:11434 to the supervisor pod env to flip it on once the image rolls out.

Closing.

Shipped in `a56a4db`. ### Design note (deviation from spec) The issue spoke of a reranker "score" with a `top-5 by reranker score` cut. Qwen3-Reranker as published on Ollama (no native `/api/rerank` in `v0.21.1`) returns a single yes/no token under its trained chat template — there's no logprob surface to extract a fine-grained float per pair. Implementation choice: treat the reranker as a **filter** rather than a ranker. - BM25 retrieves top-20 (was top-10 unconditionally) - For each candidate, the cross-encoder yes/no template runs against the excerpt - "no" candidates are dropped - Ties within the surviving "yes" set are broken by BM25 rank — first 5 survivors go to the LLM Net effect matches the spec's intent (better RAG quality, fewer irrelevant chunks to LLM) with the API actually available today. If `/api/rerank` or a logprob path lands in Ollama, swap `parseYesNo` for a float decoder; the rest of the wiring is stable. ### Acceptance criteria - [x] `BRAIN_RERANKER_URL` unset → behaviour unchanged (BM25 top-10 → LLM) — `TestBrainAnswer_Synthesizes` still green with no reranker injected - [x] `BRAIN_RERANKER_URL` set → BM25 top-20 → rerank → ≤5 → LLM — `TestBrainAnswer_RerankerFiltersBeforeLLM` proves `noise.md` is dropped before the LLM call - [x] Unit tests cover both paths plus reranker client edge cases (empty docs, ambiguous response, upstream 5xx) - [x] `task check` clean ### New env vars | Var | Default | Purpose | |-----|---------|---------| | `BRAIN_RERANKER_URL` | "" (disabled) | Ollama-compatible base URL, e.g. `http://iguana:11434` | | `BRAIN_RERANKER_MODEL` | `dengcao/Qwen3-Reranker-0.6B:F16` | Model tag for the cross-encoder | Verified `dengcao/Qwen3-Reranker-0.6B:F16` already loaded on iguana ollama before coding (per the issue's "verify model availability" gate). ### Deploy CI/CD auto-rebuilds the ingestion image. Add `BRAIN_RERANKER_URL=http://iguana:11434` to the supervisor pod env to flip it on once the image rolls out. Closing.

mathias closed this issue

2026-05-18 20:56:11 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: mathias/hyperguild#7