Solutions · RAG & Knowledge · Reranker

Reranker, the precision multiplier.

Retrieval gets the top-k passages close to the query. A reranker looks at each one carefully and reorders by true relevance. Often the single highest-leverage component in a RAG pipeline. Runs on-device with the same SDK, plugs in after any retriever.

5-minute quickstart API reference

Second-pass scoring Any retriever On-device

Class

`Reranker`

Pass a query and a list of passages. Get back relevance scores per passage.

Use case

Precision boost

Recall comes from the retriever. Precision comes from the reranker.

Integrates with

Every retriever

Vector search, BM25, hybrid, custom. Just feed it the top-k results.

Why a reranker

Retrieval is not enough.

Vector similarity is fast and approximate. BM25 is fast and lexical. Both surface candidate passages, but neither reads the query carefully and judges each passage on its actual merits. A reranker does. The cost is one extra inference pass on the top-k; the benefit is significantly higher precision at the top of the list.

01 · Fixes false positives

Vector hits with the right shape, wrong meaning

Dense embeddings group passages by topical similarity. A passage about "Spring Framework" can rank highly for "spring season" because both share vocabulary. The reranker reads both and demotes the wrong one.

02 · Surfaces buried answers

Pulls the right passage from position 8 to position 1

Often the best passage exists in the top-20 from vector search but is not at position 1. A reranker promotes it. This is the single most consistent quality lift in RAG benchmarks.

03 · Composable

Works after any retriever

Plug it after VectorRetrievalStrategy, Bm25RetrievalStrategy, HybridRetrievalStrategy, or your own retriever. Reranker only needs a query and a list of candidate passages.

04 · Local + bounded cost

One on-device inference per pipeline

The reranker scores top-k passages (typically 20-100) in a single batched pass. Latency is bounded, predictable, and stays inside your process. No per-call cloud cost.

How it works

Three patterns, same Reranker.

Simplest pattern. You already have candidate passages from any source (file system scan, SQL query, prior search). Score them against a query, sort, take the top-N.

StandaloneReranker.cs

using LMKit.Model;
using LMKit.Embeddings;

var reranker = new Reranker(LM.LoadFromModelID("bge-m3-reranker"));

var candidates = new[]
{
    "Spring framework is a popular Java web framework.",
    "In spring, trees regrow leaves and birds return.",
    "Hibernate is a Java ORM.",
};

var scores = await reranker.RerankAsync("when does spring start", candidates);

// Sort by descending relevance.
foreach (var hit in scores.OrderByDescending(s => s.Score))
{
    Console.WriteLine($"{hit.Score:F3}  {hit.Passage}");
}

Drop the reranker right after a vector search. Retrieve a wide top-k (50-100), rerank, take the top-5 to feed to the LLM. Wider initial retrieval costs almost nothing because ANN is fast; the precision lift compounds.

VectorThenRerank.cs

// 1. Wide vector search.
float[] qvec = embedder.GetEmbeddings(query);
var wide  = await store.SearchAsync(qvec, topK: 50);

// 2. Rerank those 50 passages with the dedicated model.
var passages = wide.Select(h => h.Text).ToList();
var reranked = await reranker.RerankAsync(query, passages);

// 3. Take the precision-multiplied top-5.
var top5 = reranked.OrderByDescending(s => s.Score).Take(5);

Inside a full RAG pipeline, the reranker is wrapped by RagReranker and called automatically when you opt in. Same model, less plumbing.

InsideRag.cs

using LMKit.Retrieval;

var rag = new DocumentRag(chatModel, embedModel)
{
    Retriever = new HybridRetrievalStrategy(),
    Reranker  = new RagReranker(rerankerModel),
    TopK      = 50,   // wide retrieval...
    TopN      = 5,    // ...narrow rerank output.
};

var result = await rag.QueryAsync("What were Q3 revenue figures?");
Console.WriteLine(result.Answer);
foreach (var src in result.SourceReferences)
    Console.WriteLine($"  {src.Name}, page {src.PageNumber}");

When to use it

Reranker checklist.

Add it when:

The LLM gets context that doesn't actually answer the user's question
Top-5 retrieval hits are similar in vector space but only some are relevant
You have headroom for one extra inference pass (typically 50-200 ms)
You need higher answer quality more than higher recall
You are running benchmarks and want to compare retrieval-only vs retrieval+rerank

Skip it when:

The retriever already produces near-perfect top-3 (rare)
End-to-end latency budget is below 100ms with no spare headroom
You can fit all candidates in the LLM context and let the model self-rank
The corpus is so small that a single LLM pass over everything is cheaper

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Reranker

Console A/B: same corpus, embedding-only vs cross-encoder rerank, two columns side by side.

Open on GitHub → Sample

Reranker walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → How-to guide

Improve RAG results with reranking

How-to: drop a reranker after any retriever, measure the lift.

Read the guide → How-to guide

Boost retrieval with hybrid search

Hybrid BM25 + vector retrieval, then reranker on top.

Read the guide → API reference

Reranker

API reference for the Reranker class.

Open the reference →

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

RAG & Knowledge

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

One extra pass, much better answers.

Start in 5 minutes RAG & Knowledge hub