Solutions · RAG & Knowledge · Query Expansion

When the user's wording is far from the corpus.

Users ask "how do I cancel" when the docs say "subscription termination workflow." Vector search struggles when the vocabulary gap is wide. Query expansion uses the LLM to rewrite the query into language the corpus actually uses, runs retrieval against each rewrite, and merges the hits. Three named strategies ship in LM-Kit.NET.

5-minute quickstart API reference

HyDE Multi-query Contextualisation

Strategy

`HydeRetriever`

Generate a hypothetical answer, embed THAT, retrieve.

Strategy

`MultiQueryRetriever`

Ask the LLM for N rewrites. Retrieve for each. Merge.

Strategy

`QueryContextualizer`

Rewrite the current question using prior chat turns.

Why expand the query

Vocabulary mismatch is the main retrieval failure.

Embeddings are good at semantic similarity but they are not magic. The closer the query vocabulary is to the corpus vocabulary, the higher the recall. When users speak casually and your corpus is technical, when they ask in one language and your corpus is in another, when their question is short and the answer requires multi-step framing, retrieval drops. Query expansion repairs that gap.

Three strategies

Pick the one that fits your failure mode.

HydeRetriever asks the LLM to generate the answer it thinks the user wants (a hypothetical document). It then embeds that synthetic answer and uses the resulting vector for retrieval. The hypothetical answer sits in the same semantic neighbourhood as the real ones in the corpus, so recall jumps.

Best for: short user queries; technical corpora; "how do I X" questions where the answer paragraph reads very differently from the question
Cost: one extra LLM completion per query (the hypothetical answer)
Combine with: vector retrieval, reranker

HyDE.cs

using LMKit.Retrieval;

var hyde = new HydeRetriever(chatModel, embedder, store)
{
    Options = new HydeOptions
    {
        HypotheticalAnswerMaxTokens = 256,
        TopK = 20,
    }
};

var hits = await hyde.RetrieveAsync("how do I cancel");

MultiQueryRetriever asks the LLM for N rewrites of the original query (rephrasings, synonyms, different angles). It runs retrieval for each rewrite, deduplicates the hits, and returns the merged set. Recall scales sublinearly with N; precision tends to hold or improve.

Best for: ambiguous queries with multiple valid interpretations
Cost: one LLM completion to generate N rewrites + N retrieval calls
Combine with: hybrid retrieval, reranker (essential when N is large)

MultiQuery.cs

var mq = new MultiQueryRetriever(chatModel, baseRetriever)
{
    Options = new MultiQueryOptions
    {
        QueryCount     = 4,    // generate 4 rewrites
        TopK           = 10,   // per rewrite
        DeduplicateOnId = true,
    }
};

var hits = await mq.RetrieveAsync("is the trial renewable");

QueryContextualizer rewrites the current turn of a multi-turn conversation into a standalone query that the retriever can use without seeing the earlier history. Essential when the user says "what about for enterprise?" after a paragraph about pricing.

Best for: conversational RAG (RagChat, PdfChat) with follow-up questions
Cost: one LLM completion per turn
Combine with: any retriever; works as a preprocessor

Contextualisation.cs

var ctx = new QueryContextualizer(chatModel)
{
    Options = new QueryContextualizationOptions
    {
        IncludeLastNTurns = 4,
    }
};

// User turn N: "what about for enterprise?"
// History     : ["pricing for the team plan", "answer paragraph..."]
string standalone = await ctx.ContextualiseAsync(history, "what about for enterprise?");
// -> "What is the pricing for the enterprise plan?"

// Now feed the standalone query into any retriever.
var hits = await store.SearchAsync(embedder.GetEmbeddings(standalone), topK: 5);

Combining them

All three compose.

The strongest RAG pipelines stack expansion strategies. A typical production setup contextualises the conversational turn, expands via multi-query, retrieves from a hybrid index, and reranks. Each stage tightens precision or widens recall at the right point.

Step 1

Contextualise

QueryContextualizer rewrites the user's latest turn into a standalone query using the conversation history.

Step 2

Expand

Either MultiQueryRetriever (N rewrites) or HydeRetriever (hypothetical answer) widens recall.

Step 3

Retrieve

HybridRetrievalStrategy combines BM25 lexical and dense vector retrieval against the expanded query set.

Step 4

Rerank

Reranker scores each candidate against the original (or contextualised) query and orders by precision.

Step 5

Generate

DocumentRag, RagChat, or PdfChat ground the answer in the top-N passages with citations.

Step 6

Measure

A/B each stage independently. Multi-query alone often gives the biggest single lift on conversational corpora; reranking gives the biggest on technical corpora.

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Query Expansion (Multi-Query and HyDE)

Console A/B/C: same question, three QueryGenerationMode strategies. See what each retrieves.

Open on GitHub → Sample

Query Expansion (Multi-Query and HyDE) walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → How-to guide

Improve recall with multi-query and HyDE

How-to: generate alternative queries with the LLM, retrieve in parallel, merge.

Read the guide → How-to guide

Build a RAG pipeline

Foundational guide; query expansion plugs in at the retrieval stage.

Read the guide → API reference

HydeRetriever

API reference for the HyDE retrieval strategy.

Open the reference →

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

RAG & Knowledge

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Bridge the vocabulary gap.

Start in 5 minutes RAG & Knowledge hub