Reranker
Pass a query and a list of passages. Get back relevance scores per passage.
Retrieval gets the top-k passages close to the query. A reranker looks at each one carefully and reorders by true relevance. Often the single highest-leverage component in a RAG pipeline. Runs on-device with the same SDK, plugs in after any retriever.
RerankerPass a query and a list of passages. Get back relevance scores per passage.
Recall comes from the retriever. Precision comes from the reranker.
Vector search, BM25, hybrid, custom. Just feed it the top-k results.
Vector similarity is fast and approximate. BM25 is fast and lexical. Both surface candidate passages, but neither reads the query carefully and judges each passage on its actual merits. A reranker does. The cost is one extra inference pass on the top-k; the benefit is significantly higher precision at the top of the list.
01 · Fixes false positives
Dense embeddings group passages by topical similarity. A passage about "Spring Framework" can rank highly for "spring season" because both share vocabulary. The reranker reads both and demotes the wrong one.
02 · Surfaces buried answers
Often the best passage exists in the top-20 from vector search but is not at position 1. A reranker promotes it. This is the single most consistent quality lift in RAG benchmarks.
03 · Composable
Plug it after VectorRetrievalStrategy, Bm25RetrievalStrategy, HybridRetrievalStrategy, or your own retriever. Reranker only needs a query and a list of candidate passages.
04 · Local + bounded cost
The reranker scores top-k passages (typically 20-100) in a single batched pass. Latency is bounded, predictable, and stays inside your process. No per-call cloud cost.
Simplest pattern. You already have candidate passages from any source (file system scan, SQL query, prior search). Score them against a query, sort, take the top-N.
using LMKit.Model; using LMKit.Embeddings; var reranker = new Reranker(LM.LoadFromModelID("bge-m3-reranker")); var candidates = new[] { "Spring framework is a popular Java web framework.", "In spring, trees regrow leaves and birds return.", "Hibernate is a Java ORM.", }; var scores = await reranker.RerankAsync("when does spring start", candidates); // Sort by descending relevance. foreach (var hit in scores.OrderByDescending(s => s.Score)) { Console.WriteLine($"{hit.Score:F3} {hit.Passage}"); }
Drop the reranker right after a vector search. Retrieve a wide top-k (50-100), rerank, take the top-5 to feed to the LLM. Wider initial retrieval costs almost nothing because ANN is fast; the precision lift compounds.
// 1. Wide vector search. float[] qvec = embedder.GetEmbeddings(query); var wide = await store.SearchAsync(qvec, topK: 50); // 2. Rerank those 50 passages with the dedicated model. var passages = wide.Select(h => h.Text).ToList(); var reranked = await reranker.RerankAsync(query, passages); // 3. Take the precision-multiplied top-5. var top5 = reranked.OrderByDescending(s => s.Score).Take(5);
Inside a full RAG pipeline, the reranker is wrapped by
RagReranker and called automatically when you
opt in. Same model, less plumbing.
using LMKit.Retrieval; var rag = new DocumentRag(chatModel, embedModel) { Retriever = new HybridRetrievalStrategy(), Reranker = new RagReranker(rerankerModel), TopK = 50, // wide retrieval... TopN = 5, // ...narrow rerank output. }; var result = await rag.QueryAsync("What were Q3 revenue figures?"); Console.WriteLine(result.Answer); foreach (var src in result.SourceReferences) Console.WriteLine($" {src.Name}, page {src.PageNumber}");
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
How-to: drop a reranker after any retriever, measure the lift.
Read the guide → How-to guideHybrid BM25 + vector retrieval, then reranker on top.
Read the guide → API referenceAPI reference for the Reranker class.
Open the reference →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.