Solutions · Data Processing · Conversational RAG

Turn any knowledge base into a multi-turn conversation.

Single-shot retrieval misses context. Follow-up questions fail because the retriever forgets what was asked before. LM-Kit's RagChat combines multi-turn conversation, advanced query reformulation, and grounded retrieval in a single class that runs entirely on your hardware.

Follow-up questions lose context from earlier turns
Ambiguous queries retrieve irrelevant passages
RAG pipelines need manual orchestration of retrieval, prompt building, and generation
Cloud RAG APIs expose sensitive corporate knowledge

Start building free View API reference

100% on-device 4 query modes Tool calling built in

Core

Multi-turn conversation

Full conversation history with automatic context carry-over. Follow-up questions resolve pronouns and references seamlessly.

Advanced

Smart query reformulation

Four retrieval strategies: Original, Contextual, MultiQuery, and HyDE. Each one optimized for different query complexity levels.

Extensible

Tool calling and skills

4
Query modes

0
Cloud deps

6+
Events

Conversational Q&A over any knowledge base.

RagChat is LM-Kit.NET's turnkey solution for multi-turn conversational question-answering over custom knowledge bases. Unlike document-centric RAG, which manages the full document lifecycle, RagChat operates on a pre-populated RagEngine that you own and manage. This makes it ideal for custom corpora, multi-source knowledge bases, and enterprise data that comes from heterogeneous systems.

A single SubmitAsync() call orchestrates the full pipeline: query reformulation, semantic retrieval, prompt construction, and grounded response generation, all with full conversation history preserved across turns.

Complete ownership of your data. RagChat runs 100% on-device. No API calls, no cloud dependencies, no data leaving your infrastructure. Compliant with GDPR, HIPAA, and air-gapped environments by design.

RagChat.cs

// Build your knowledge base
var ragEngine = new RagEngine(embeddingModel);
ragEngine.ImportText(corporateKnowledge);
ragEngine.ImportText(productDocs);

// Start a multi-turn conversation
using var chat = new RagChat(ragEngine, chatModel);
chat.QueryGenerationMode = QueryGenerationMode.Contextual;

// Ask questions naturally
var r1 = await chat.SubmitAsync("What is our refund policy?");

// Follow-up: context is preserved
var r2 = await chat.SubmitAsync("Does that apply to digital products?");

// Access retrieved partitions
foreach (var p in r2.RetrievedPartitions)
    Console.WriteLine($"Source: {p.DataSourceIdentifier}");

Query intelligence

Four query generation strategies.

Choose the optimal retrieval strategy for your query complexity. Each mode trades off between speed, recall, and precision.

Mode 01

Original

The user's question is sent directly to the retrieval engine. No reformulation, no overhead. The fastest path from question to answer.

Zero additional latency
Best for self-contained questions
Ideal for single-turn interactions

Best for: Direct, self-contained queries

Mode 02

Contextual

Rewrites the user's follow-up question into a self-contained query using conversation history. Resolves pronouns, ellipsis, and implicit references automatically.

Pronoun and coreference resolution
Configurable history depth
Essential for multi-turn dialogue

Best for: Multi-turn conversations

Mode 03

MultiQuery

Generates multiple query variants and searches independently, then fuses results using Reciprocal Rank Fusion (RRF). Maximizes recall for complex or ambiguous questions.

Configurable variant count
RRF result fusion
Higher recall on ambiguous queries

Best for: Complex, multi-faceted queries

Mode 04

HyDE (Hypothetical Answer)

Generates a hypothetical answer first, then retrieves passages similar to that answer. Bridges the vocabulary gap between questions and documents.

Bridges query/document vocabulary gap
Best precision for factual retrieval
Effective on technical corpora

Best for: Technical, domain-specific questions

Under the hood

What happens when you call SubmitAsync.

RagChat orchestrates a five-stage pipeline for every question. Each stage is observable via events and fully configurable.

Stage 01

Contextualize

Follow-up questions are rewritten into self-contained queries using conversation history.

Stage 02

Retrieve

Semantic search across your RagEngine's data sources. Respects MinRelevanceScore and MaxRetrievedPartitions.

Stage 03

Rank & order

Results ordered by source, section, and partition index. Optional reranking refines relevance scores.

Stage 04

Build prompt

Retrieved context injected into your PromptTemplate via @context and @query placeholders.

Stage 05

Generate

Grounded response generation with streaming via AfterTextCompletion. Supports tool calls, skills, and memory.

Agentic capabilities

Beyond simple Q&A.

RagChat implements IMultiTurnConversation, giving it the full power of an AI agent combined with grounded retrieval.

Tool calling & skills

Extend RAG with actions

Register custom tools, built-in tools, and Agent Skills that the model can invoke during conversation. Combine knowledge retrieval with live computations, web searches, database lookups, or any external operation.

Tools

ToolRegistry integration

Skills

Agent Skills

Define complex capabilities via SKILL.md files. Skills combine system prompts, tools, and behavioral rules into reusable packages.

Approval

Tool approval workflow

ToolApprovalRequired event for human-in-the-loop control. Approve or deny tool invocations before they execute.

Memory & observability

Persistent context and full visibility.

Connect AgentMemory for long-term knowledge that persists across sessions. Every stage of the pipeline emits events for complete observability.

Memory

`AgentMemory`

RAG-backed persistent memory that survives across sessions. Recall relevant facts from past conversations automatically.

Learn about Agent Memory →

Events

`RetrievalCompleted` event

Fires after partition retrieval with full details: query used, partitions found, count requested, and elapsed time.

Streaming

Streaming generation

AfterTextCompletion event streams tokens in real time for responsive user experiences. Build interactive chat UIs effortlessly.

Choosing the right tool

`RagChat` vs PdfChat.

Both provide conversational RAG, but they serve different use cases. Choose the one that matches your data ownership model.

Custom corpora

`RagChat`

Operates on a pre-populated RagEngine you manage. Full control over data sources, chunking, and the retrieval pipeline. The caller owns the engine lifecycle.

Bring your own RagEngine
Multi-source knowledge bases
4 query generation modes
Tool calling, skills, memory
Custom chunking and storage

Best for: Custom corpora, enterprise knowledge, multi-source data

Document Q&A

`PdfChat`

Manages the full document lifecycle: import, chunk, embed, cache, and query. Optimized for quick document Q&A with automatic context management.

Automatic document lifecycle
Built-in caching
Smart context loading
OCR and VLM integration
Page-level source references

Best for: Document Q&A, PDF chat, single-document focus

Explore the full RAG & Knowledge platform

Production features

Everything you need for enterprise RAG.

Fine-grained control over every aspect of the retrieval and generation pipeline.

Reranking

Semantic reranking

Cross-encoder rerankers refine initial retrieval for significantly higher precision. Access raw and reranked scores on every partition.

Filtering

Relevance filtering

Set MinRelevanceScore (0.0 to 1.0) and MaxRetrievedPartitions to control quality and quantity of retrieved context.

Templates

Custom prompt templates

Configure how context reaches the model with @context and @query placeholders. Optimize prompts for your domain.

Learn more about prompt templates →

Reasoning

Reasoning control

Adjust ReasoningLevel (None, Medium, High) for models that support extended thinking. Balance quality with latency for your use case.

MMR

MMR diversity

Configure Maximal Marginal Relevance on the underlying RagEngine to balance relevance with diversity in retrieved results.

Privacy

100% on-device

Zero cloud dependencies. Your knowledge base, embeddings, and conversations stay on your hardware. GDPR and HIPAA compliant by design.

Storage

Flexible vector storage

Works with any IVectorStore backend: in-memory, built-in file store, Qdrant, PostgreSQL (pgvector), or your own custom implementation.

APIs

Async and sync APIs

Both SubmitAsync and Submit methods available. Build responsive UIs with async or use synchronous for batch processing and console applications.

Sessions

Conversation management

Full ChatHistory access. ClearHistory() resets conversation state without affecting the knowledge base. Start fresh conversations at any time.

API reference

Core classes.

Complete API documentation for building conversational RAG applications.

`RagChat`

Browse the complete collection of RAG and conversational AI samples.

View repository

Learn more

Related concepts.

Explore the techniques that power RagChat's retrieval and generation pipeline.

RAG Embeddings Semantic Similarity Chunking Vector Database Reranking Query Contextualization Multi-Query Retrieval HyDE Hybrid Search Maximal Marginal Relevance Reciprocal Rank Fusion Agentic RAG Agent Memory Context Windows Prompt Templates

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Build conversational RAG today.

Turn your knowledge base into an intelligent, multi-turn conversation. One NuGet package, zero cloud dependencies, complete data ownership.

Get started Install from NuGet

Turn any knowledge base into a multi-turn conversation.

Multi-turn conversation

Smart query reformulation

Tool calling and skills

Original

Contextual

MultiQuery

HyDE (Hypothetical Answer)

Contextualize

Retrieve

Rank & order

Build prompt

Generate

Extend RAG with actions

ToolRegistry integration

Agent Skills

Tool approval workflow

AgentMemory

RetrievalCompleted event

Streaming generation

RagChat

PdfChat

Semantic reranking

Relevance filtering

Custom prompt templates

Reasoning control

MMR diversity

100% on-device

Flexible vector storage

Async and sync APIs

Conversation management

RagChat

RagEngine

RagQueryResult

QueryGenerationMode

DataSource

PartitionSimilarity

Embedder

AgentMemory

IVectorStore

Conversational RAG

Retrieval quality tuning

Help desk knowledge base

PDF chat demo

Persistent memory assistant

Chat with PDF

Chat with PDF walkthrough

Build a conversational RAG with RagChat

RagChat

`AgentMemory`

`RetrievalCompleted` event

`RagChat`

`PdfChat`

`RagChat`

`RagEngine`

`RagQueryResult`

`QueryGenerationMode`

`DataSource`

`PartitionSimilarity`

`Embedder`

`AgentMemory`

`IVectorStore`