Solutions · RAG & Knowledge · Query Expansion

When the user's wording is far from the corpus.

Users ask "how do I cancel" when the docs say "subscription termination workflow." Vector search struggles when the vocabulary gap is wide. Query expansion uses the LLM to rewrite the query into language the corpus actually uses, runs retrieval against each rewrite, and merges the hits. Three named strategies ship in LM-Kit.NET.

HyDE Multi-query Contextualisation
Strategy

HydeRetriever

Generate a hypothetical answer, embed THAT, retrieve.

Strategy

MultiQueryRetriever

Ask the LLM for N rewrites. Retrieve for each. Merge.

Strategy

QueryContextualizer

Rewrite the current question using prior chat turns.

Why expand the query

Vocabulary mismatch is the main retrieval failure.

Embeddings are good at semantic similarity but they are not magic. The closer the query vocabulary is to the corpus vocabulary, the higher the recall. When users speak casually and your corpus is technical, when they ask in one language and your corpus is in another, when their question is short and the answer requires multi-step framing, retrieval drops. Query expansion repairs that gap.

Three strategies

Pick the one that fits your failure mode.

HydeRetriever asks the LLM to generate the answer it thinks the user wants (a hypothetical document). It then embeds that synthetic answer and uses the resulting vector for retrieval. The hypothetical answer sits in the same semantic neighbourhood as the real ones in the corpus, so recall jumps.

  • Best for: short user queries; technical corpora; "how do I X" questions where the answer paragraph reads very differently from the question
  • Cost: one extra LLM completion per query (the hypothetical answer)
  • Combine with: vector retrieval, reranker
HyDE.cs
using LMKit.Retrieval;

var hyde = new HydeRetriever(chatModel, embedder, store)
{
    Options = new HydeOptions
    {
        HypotheticalAnswerMaxTokens = 256,
        TopK = 20,
    }
};

var hits = await hyde.RetrieveAsync("how do I cancel");
Combining them

All three compose.

The strongest RAG pipelines stack expansion strategies. A typical production setup contextualises the conversational turn, expands via multi-query, retrieves from a hybrid index, and reranks. Each stage tightens precision or widens recall at the right point.

Step 1

Contextualise

QueryContextualizer rewrites the user's latest turn into a standalone query using the conversation history.

Step 2

Expand

Either MultiQueryRetriever (N rewrites) or HydeRetriever (hypothetical answer) widens recall.

Step 3

Retrieve

HybridRetrievalStrategy combines BM25 lexical and dense vector retrieval against the expanded query set.

Step 4

Rerank

Reranker scores each candidate against the original (or contextualised) query and orders by precision.

Step 5

Generate

DocumentRag, RagChat, or PdfChat ground the answer in the top-N passages with citations.

Step 6

Measure

A/B each stage independently. Multi-query alone often gives the biggest single lift on conversational corpora; reranking gives the biggest on technical corpora.

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Bridge the vocabulary gap.

Start in 5 minutes RAG & Knowledge hub