HydeRetriever
Generate a hypothetical answer, embed THAT, retrieve.
Users ask "how do I cancel" when the docs say "subscription termination workflow." Vector search struggles when the vocabulary gap is wide. Query expansion uses the LLM to rewrite the query into language the corpus actually uses, runs retrieval against each rewrite, and merges the hits. Three named strategies ship in LM-Kit.NET.
HydeRetrieverGenerate a hypothetical answer, embed THAT, retrieve.
MultiQueryRetrieverAsk the LLM for N rewrites. Retrieve for each. Merge.
QueryContextualizerRewrite the current question using prior chat turns.
Embeddings are good at semantic similarity but they are not magic. The closer the query vocabulary is to the corpus vocabulary, the higher the recall. When users speak casually and your corpus is technical, when they ask in one language and your corpus is in another, when their question is short and the answer requires multi-step framing, retrieval drops. Query expansion repairs that gap.
HydeRetriever asks the LLM to generate the
answer it thinks the user wants (a hypothetical
document). It then embeds that synthetic answer and uses
the resulting vector for retrieval. The hypothetical
answer sits in the same semantic neighbourhood as the
real ones in the corpus, so recall jumps.
using LMKit.Retrieval; var hyde = new HydeRetriever(chatModel, embedder, store) { Options = new HydeOptions { HypotheticalAnswerMaxTokens = 256, TopK = 20, } }; var hits = await hyde.RetrieveAsync("how do I cancel");
MultiQueryRetriever asks the LLM for N
rewrites of the original query (rephrasings, synonyms,
different angles). It runs retrieval for each rewrite,
deduplicates the hits, and returns the merged set. Recall
scales sublinearly with N; precision tends to hold or
improve.
var mq = new MultiQueryRetriever(chatModel, baseRetriever) { Options = new MultiQueryOptions { QueryCount = 4, // generate 4 rewrites TopK = 10, // per rewrite DeduplicateOnId = true, } }; var hits = await mq.RetrieveAsync("is the trial renewable");
QueryContextualizer rewrites the current
turn of a multi-turn conversation into a standalone
query that the retriever can use without seeing the
earlier history. Essential when the user says "what
about for enterprise?" after a paragraph about pricing.
RagChat, PdfChat) with follow-up questionsvar ctx = new QueryContextualizer(chatModel) { Options = new QueryContextualizationOptions { IncludeLastNTurns = 4, } }; // User turn N: "what about for enterprise?" // History : ["pricing for the team plan", "answer paragraph..."] string standalone = await ctx.ContextualiseAsync(history, "what about for enterprise?"); // -> "What is the pricing for the enterprise plan?" // Now feed the standalone query into any retriever. var hits = await store.SearchAsync(embedder.GetEmbeddings(standalone), topK: 5);
The strongest RAG pipelines stack expansion strategies. A typical production setup contextualises the conversational turn, expands via multi-query, retrieves from a hybrid index, and reranks. Each stage tightens precision or widens recall at the right point.
Step 1
QueryContextualizer rewrites the user's latest turn into a standalone query using the conversation history.
Step 2
Either MultiQueryRetriever (N rewrites) or HydeRetriever (hypothetical answer) widens recall.
Step 3
HybridRetrievalStrategy combines BM25 lexical and dense vector retrieval against the expanded query set.
Step 4
Reranker scores each candidate against the original (or contextualised) query and orders by precision.
Step 5
DocumentRag, RagChat, or PdfChat ground the answer in the top-N passages with citations.
Step 6
A/B each stage independently. Multi-query alone often gives the biggest single lift on conversational corpora; reranking gives the biggest on technical corpora.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
How-to: generate alternative queries with the LLM, retrieve in parallel, merge.
Read the guide → How-to guideFoundational guide; query expansion plugs in at the retrieval stage.
Read the guide → API referenceAPI reference for the HyDE retrieval strategy.
Open the reference →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.