Get Free Community License
Conversational RAG

Turn Any Knowledge Base Into aMulti-Turn Conversation.

Single-shot retrieval misses context. Follow-up questions fail because the retriever forgets what was asked before. LM-Kit's RagChat combines multi-turn conversation, advanced query reformulation, and grounded retrieval in a single class that runs entirely on your hardware.

Follow-up questions lose context from earlier turns
Ambiguous queries retrieve irrelevant passages
RAG pipelines need manual orchestration of retrieval, prompt building, and generation
Cloud RAG APIs expose sensitive corporate knowledge
100% On-Device 4 Query Modes Tool Calling Built In

Multi-Turn Conversation

Full conversation history with automatic context carry-over. Follow-up questions resolve pronouns and references seamlessly.

Core

Smart Query Reformulation

Four retrieval strategies: Original, Contextual, MultiQuery, and HyDE. Each one optimized for different query complexity levels.

Advanced

Tool Calling and Skills

Register tools, built-in functions, and Agent Skills. Extend Q&A with web search, calculations, or any custom operation.

Extensible
4
Query Modes
0
Cloud Dependencies
6+
Events for Observability

Conversational Q&A Over Any Knowledge Base

RagChat is LM-Kit.NET's turnkey solution for multi-turn conversational question-answering over custom knowledge bases. Unlike document-centric RAG, which manages the full document lifecycle, RagChat operates on a pre-populated RagEngine that you own and manage. This makes it ideal for custom corpora, multi-source knowledge bases, and enterprise data that comes from heterogeneous systems.

A single SubmitAsync() call orchestrates the full pipeline: query reformulation, semantic retrieval, prompt construction, and grounded response generation, all with full conversation history preserved across turns.

Complete ownership of your data. RagChat runs 100% on-device. No API calls, no cloud dependencies, no data leaving your infrastructure. Compliant with GDPR, HIPAA, and air-gapped environments by design.

RagChat.cs
// Build your knowledge base
var ragEngine = new RagEngine(embeddingModel);
ragEngine.ImportText(corporateKnowledge);
ragEngine.ImportText(productDocs);

// Start a multi-turn conversation
using var chat = new RagChat(ragEngine, chatModel);
chat.QueryGenerationMode =
    QueryGenerationMode.Contextual;

// Ask questions naturally
var r1 = await chat.SubmitAsync(
    "What is our refund policy?");

// Follow-up: context is preserved
var r2 = await chat.SubmitAsync(
    "Does that apply to digital products?");

// Access retrieved partitions
foreach (var p in r2.RetrievedPartitions)
    Console.WriteLine(
        $"Source: {p.DataSourceIdentifier}");

Four Query Generation Strategies

Choose the optimal retrieval strategy for your query complexity. Each mode trades off between speed, recall, and precision.

Original

The user's question is sent directly to the retrieval engine. No reformulation, no overhead. The fastest path from question to answer.

  • Zero additional latency
  • Best for self-contained questions
  • Ideal for single-turn interactions
Best for: Direct, self-contained queries

Contextual

Rewrites the user's follow-up question into a self-contained query using conversation history. Resolves pronouns, ellipsis, and implicit references automatically.

  • Pronoun and coreference resolution
  • Configurable history depth
  • Essential for multi-turn dialogue
Best for: Multi-turn conversations

MultiQuery

Generates multiple query variants and searches independently, then fuses results using Reciprocal Rank Fusion (RRF). Maximizes recall for complex or ambiguous questions.

  • Configurable variant count
  • RRF result fusion
  • Higher recall on ambiguous queries
Best for: Complex, multi-faceted queries

HyDE (Hypothetical Answer)

Generates a hypothetical answer first, then retrieves passages similar to that answer. Bridges the vocabulary gap between questions and documents.

  • Bridges query/document vocabulary gap
  • Best precision for factual retrieval
  • Effective on technical corpora
Best for: Technical, domain-specific questions

What Happens When You Call SubmitAsync

RagChat orchestrates a five-stage pipeline for every question. Each stage is observable via events and fully configurable.

1

Contextualize

Follow-up questions are rewritten into self-contained queries using conversation history.

2

Retrieve

Semantic search across your RagEngine's data sources. Respects MinRelevanceScore and MaxRetrievedPartitions.

3

Rank & Order

Results ordered by source, section, and partition index. Optional reranking refines relevance scores.

4

Build Prompt

Retrieved context injected into your PromptTemplate via @context and @query placeholders.

5

Generate

Grounded response generation with streaming via AfterTextCompletion. Supports tool calls, skills, and memory.

Beyond Simple Q&A

RagChat implements IMultiTurnConversation, giving it the full power of an AI agent combined with grounded retrieval.

Tool Calling & Skills

Extend RAG With Actions

Register custom tools, built-in tools, and Agent Skills that the model can invoke during conversation. Combine knowledge retrieval with live computations, web searches, database lookups, or any external operation.

  • ToolRegistry Integration

    Register any ITool implementation. The model decides when and how to invoke tools during RAG conversations.

  • Agent Skills

    Define complex capabilities via SKILL.md files. Skills combine system prompts, tools, and behavioral rules into reusable packages.

  • Tool Approval Workflow

    ToolApprovalRequired event for human-in-the-loop control. Approve or deny tool invocations before they execute.

Memory & Observability

Persistent Context and Full Visibility

Connect AgentMemory for long-term knowledge that persists across sessions. Every stage of the pipeline emits events for complete observability.

  • AgentMemory

    RAG-backed persistent memory that survives across sessions. Recall relevant facts from past conversations automatically.

    Learn about Agent Memory →
  • RetrievalCompleted Event

    Fires after partition retrieval with full details: query used, partitions found, count requested, and elapsed time.

  • Streaming Generation

    AfterTextCompletion event streams tokens in real time for responsive user experiences. Build interactive chat UIs effortlessly.

RagChat vs PdfChat

Both provide conversational RAG, but they serve different use cases. Choose the one that matches your data ownership model.

RagChat

Operates on a pre-populated RagEngine you manage. Full control over data sources, chunking, and the retrieval pipeline. The caller owns the engine lifecycle.

  • Bring your own RagEngine
  • Multi-source knowledge bases
  • 4 query generation modes
  • Tool calling, skills, memory
  • Custom chunking and storage
Best for: Custom corpora, enterprise knowledge, multi-source data

PdfChat

Manages the full document lifecycle: import, chunk, embed, cache, and query. Optimized for quick document Q&A with automatic context management.

  • Automatic document lifecycle
  • Built-in caching
  • Smart context loading
  • OCR and VLM integration
  • Page-level source references
Best for: Document Q&A, PDF chat, single-document focus

Everything You Need for Enterprise RAG

Fine-grained control over every aspect of the retrieval and generation pipeline.

Semantic Reranking

Cross-encoder rerankers refine initial retrieval for significantly higher precision. Access raw and reranked scores on every partition.

Relevance Filtering

Set MinRelevanceScore (0.0 to 1.0) and MaxRetrievedPartitions to control quality and quantity of retrieved context.

Custom Prompt Templates

Configure how context reaches the model with @context and @query placeholders. Optimize prompts for your domain. Learn more about prompt templates.

Reasoning Control

Adjust ReasoningLevel (None, Medium, High) for models that support extended thinking. Balance quality with latency for your use case.

MMR Diversity

Configure Maximal Marginal Relevance on the underlying RagEngine to balance relevance with diversity in retrieved results.

100% On-Device

Zero cloud dependencies. Your knowledge base, embeddings, and conversations stay on your hardware. GDPR and HIPAA compliant by design.

Flexible Vector Storage

Works with any IVectorStore backend: in-memory, built-in SQLite, Qdrant, or your own custom implementation.

Async and Sync APIs

Both SubmitAsync and Submit methods available. Build responsive UIs with async or use synchronous for batch processing and console applications.

Conversation Management

Full ChatHistory access. ClearHistory() resets conversation state without affecting the knowledge base. Start fresh conversations at any time.

Core Classes

Complete API documentation for building conversational RAG applications.

RagChat
Multi-turn conversational Q&A over a user-managed RagEngine with tool calling and memory.
View Documentation
RagEngine
Core retrieval engine managing data sources, embeddings, and similarity search.
View Documentation
RagQueryResult
Response container with generated text and retrieved partition references.
View Documentation
QueryGenerationMode
Enum for retrieval strategies: Original, Contextual, MultiQuery, HypotheticalAnswer.
View Documentation
DataSource
Content repository for text partitions and sections with metadata support.
View Documentation
PartitionSimilarity
Retrieved partition with similarity scores, source metadata, and embeddings.
View Documentation
Embedder
Generate text and image embeddings for semantic similarity search.
View Documentation
AgentMemory
RAG-backed persistent memory for long-term context across conversation sessions.
View Documentation
IVectorStore
Interface for pluggable vector storage backends (in-memory, SQLite, Qdrant, custom).
View Documentation

Get Started in Minutes

Clone working examples from our GitHub repository and start building conversational RAG applications.

Conversational RAG

Multi-turn RAG conversation using RagChat with query contextualization and streaming responses.

View Sample

Retrieval Quality Tuning

Compare query generation modes, relevance thresholds, and reranking strategies to optimize retrieval quality.

View Sample

Help Desk Knowledge Base

Build a help desk assistant with RagChat, multi-source FAQ data, and real-time response streaming.

View Sample

PDF Chat Demo

Conversational Q&A over PDF documents with PdfChat for comparison with RagChat's approach.

View Sample

Persistent Memory Assistant

Agent with RAG-backed long-term memory that remembers context across conversation sessions.

View Sample

All Samples on GitHub

Browse the complete collection of RAG and conversational AI samples.

View Repository

Build Conversational RAG Today

Turn your knowledge base into an intelligent, multi-turn conversation. One NuGet package, zero cloud dependencies, complete data ownership.