Solutions · RAG & Knowledge

Your AI keeps making things up. Ground it in real data.

LLMs hallucinate. They confidently cite documents that don't exist and invent facts that sound plausible. LM-Kit's RAG engine solves this by grounding every response in your actual documents, with page-level citations you can verify.

  • PDFs with tables and complex layouts lose structure
  • Scanned documents are completely ignored
  • Cloud RAG services see your confidential data
  • No way to verify which page the answer came from
100% on-device OCR plus VLM support Page-level citations
Core

DocumentRag

Multi-page processing with OCR and VLM-based document understanding. Preserves layout, tables, and structure.

Ready

PdfChat

Conversational Q&A over documents. Multi-turn dialogue with automatic context management and caching.

New

RagChat

Multi-turn conversational Q&A over custom knowledge bases. 4 query generation modes, tool calling, and agent memory.

Flexible

IVectorStore

4 storage backends: in-memory, built-in DB, Qdrant, or custom. Switch without changing code.

0
Cloud dependencies
4
Vector backends
4
Query modes

Ground your AI in real data.

Traditional LLMs hallucinate. LM-Kit.NET's RAG engine grounds every response in your actual documents, databases, and knowledge bases. Semantic retrieval finds the most relevant passages, then generation synthesizes accurate, cited answers.

From simple text files to complex multi-page PDFs with tables, forms, and scanned content, LM-Kit.NET handles it all with intelligent document understanding, OCR, and vision-based parsing. Need multi-turn conversations? RagChat adds conversational Q&A with four query generation modes over any knowledge base.

100% on-device processing. Your documents never leave your infrastructure. Meet GDPR, HIPAA, and data residency requirements by design.

DocumentRag.cs
// Document-centric RAG with full lifecycle management
var docRag = new DocumentRag(embeddingModel);

// Enable OCR for scanned documents
docRag.OcrEngine = new OcrEngine();

// Enable VLM for complex layouts
docRag.VisionParser = new VlmOcr(visionModel);

// Import with metadata for lifecycle tracking
var metadata = new DocumentMetadata(attachment, id: "report-2024-q4");
await docRag.ImportDocumentAsync(attachment, metadata, "reports");

// Query with source references
var result = await docRag.QueryPartitionsAsync(
    "What was Q4 revenue?", matches, conversation);

foreach (var reference in result.SourceReferences)
    Console.WriteLine($"Page {reference.PageNumber}");
Document Intelligence

DocumentRag: beyond simple text retrieval.

Multi-page document processing with OCR, vision-based understanding, and complete document lifecycle management.

DocumentRag Class

Intelligent document processing

DocumentRag extends RagEngine with specialized handling for multi-page documents. It automatically extracts text page-by-page, handles mixed content types, and maintains document structure for accurate retrieval.

Multi-page

Multi-page processing

Automatic page-by-page extraction with structure preservation. Handles PDFs, images, and multi-page formats seamlessly.

OCR

OCR integration

Built-in OCR engine extracts text from scanned documents and image-based pages. No external dependencies required.

VLM

Vision-based understanding

VisionParser uses VLMs for advanced document understanding, preserving layout and structure as markdown for complex documents.

Lifecycle

Document lifecycle management

Import, update, and delete documents with explicit IDs. Track document versions and manage your knowledge base programmatically.

Source References

Grounded answers with citations.

Every response includes source references with document names and page numbers. Build trust with your users by showing exactly where information comes from.

Page-level

Page-level attribution

Know exactly which page contains the source information. Enable users to verify and explore original documents.

Events

Progress events

Monitor document import with real-time progress callbacks. Track page processing, embedding generation, and indexing status.

Filtering

Metadata filtering

Attach custom metadata to documents and filter queries by category, date, author, or any custom attribute.

Processing strategies

Three intelligent processing modes.

Choose the optimal strategy for your document types, or let Auto mode select the best approach per page.

Fast

Text extraction

Traditional text extraction with optional OCR for image-based pages. Fast and efficient for text-heavy documents.

  • Fastest processing speed
  • OCR for scanned content
  • Low resource usage
  • Best for simple layouts

VLM

Document understanding

Vision language models for advanced parsing. Preserves layout, tables, and structure as markdown.

  • VLM-powered analysis
  • Layout preservation
  • Table structure extraction
  • Complex document handling
Conversational document Q&A

PdfChat: chat with your documents.

A complete conversational interface for document question-answering. Multi-turn dialogue, automatic context management, and intelligent retrieval in one class.

Multi-turn

Multi-turn conversation

Maintain context across questions. Follow-up queries understand conversation history for natural dialogue flow.

Context

Smart context management

Small documents load in full for complete context. Large documents use passage retrieval to inject only relevant excerpts.

Cache

Document caching

Vector store caching for fast subsequent queries. Load a document once, query it indefinitely.

Tools

Tool calling support

Register custom tools the model can invoke during conversation. Extend document Q&A with calculations, lookups, or external APIs.

Reranking

Semantic reranking

Optional reranker refines passage retrieval results for higher precision. Get the most relevant content every time.

Reasoning

Reasoning control

Adjust reasoning depth for models that support extended thinking. Balance response quality with latency.

Memory

Agent memory integration

Connect to AgentMemory for RAG-backed persistent context that survives across conversation sessions.

Events

Comprehensive events

CacheAccessed, PassageRetrievalCompleted, ResponseGenerationStarted, and more. Full observability into the RAG pipeline.

Conversational RAG

RagChat: multi-turn Q&A over any knowledge base.

Turn any pre-populated RagEngine into a conversational interface with automatic query contextualization, tool calling, and agent memory.

Mode 01

Original mode

Send the user's question directly to retrieval. Zero overhead, fastest path.

  • Self-contained questions
  • Single-turn interactions

Mode 03

MultiQuery mode

Generates multiple query variants and fuses results via Reciprocal Rank Fusion.

  • Higher recall
  • Complex queries

Mode 04

HyDE (Hypothetical Answer)

Generates a hypothetical answer first, then retrieves passages similar to that answer. Bridges the vocabulary gap between questions and documents.

  • Best precision for factual retrieval
  • Effective on technical corpora
Explore RagChat in depth
Vector storage

Four flexible storage strategies.

Choose the storage that fits your application's lifecycle. Switch between backends seamlessly via the IVectorStore interface.

In-memory

In-memory

Fast prototyping, live classification, and immediate feedback. Embeddings stored in RAM with optional serialization to disk.

  • Zero setup required
  • Instant feedback
  • Serializable to disk
  • Best for: Prototypes, testing

Qdrant

Qdrant integration

Enterprise-scale vector search. HNSW indexing, automatic sharding, and distributed deployment via open-source connector.

  • Billions of vectors
  • Cloud or local Docker
  • Sub-second search
  • Best for: Distributed, production

Custom

Custom via IVectorStore

Implement the IVectorStore interface to connect any proprietary database, internal API, or specialized storage system.

  • Full backend control
  • Custom storage logic
  • Future-proof architecture
  • Best for: Proprietary systems
Learn more about vector storage options
Advanced capabilities

Production-ready RAG features.

Everything you need to build enterprise-grade retrieval systems.

Reranking

Semantic reranking

Cross-encoder rerankers refine initial retrieval results for significantly higher precision.

Learn how →

Chunking

Advanced chunking

Markdown-aware, semantic, and layout-based chunking strategies. IChunking interface for custom implementations.

Multimodal

Multimodal RAG

Retrieve relevant content from both text and images. Image embeddings enable visual similarity search.

Learn how →

Filtering

Metadata filtering

Attach custom metadata to partitions. Filter queries by category, date range, author, or any attribute.

Memory

Agent memory

RAG-backed persistent memory for conversational agents. Store and recall context across sessions.

Learn how →

Privacy

Data privacy

100% on-device processing. Documents never leave your infrastructure. GDPR, HIPAA compliant by design.

APIs

Async/Sync APIs

Every method available in both synchronous and asynchronous variants. Build responsive UIs or batch processes.

Streaming

Streaming responses

Real-time token streaming for responsive user experiences. AfterTextCompletion event for incremental updates.

Templates

Custom prompt templates

Configure how retrieved context is presented to the model. Optimize prompts for your specific use case.

API reference

Core RAG classes.

Comprehensive API documentation for building custom RAG pipelines.

DocumentRag

Document-centric RAG with OCR, VLM parsing, and lifecycle management.

View documentation

PdfChat

Conversational Q&A over PDFs with multi-turn dialogue and caching.

View documentation

RagChat

Multi-turn conversational Q&A over custom knowledge bases with 4 query modes.

View documentation

RagEngine

Core retrieval-augmented generation engine with data source management.

View documentation

DataSource

Content repository for text partitions with section organization.

View documentation

TextChunking

Recursive text partitioning with configurable strategies.

View documentation

QdrantEmbeddingStore

Qdrant vector database integration via IVectorStore.

View documentation

Code samples

Get started in minutes.

Clone working examples from our GitHub repository and customize for your use case.

Conversational RAG

Multi-turn RAG with RagChat, query contextualization, and streaming responses.

View sample

Single-turn RAG

Build a knowledge-grounded Q&A system using RagEngine with file-based persistence.

View sample

RAG with Qdrant vector store

Enterprise-scale RAG using Qdrant for vector storage and search.

View sample

Help desk knowledge base

Production-grade RAG system with category-scoped search and markdown-aware chunking.

View sample

Retrieval quality tuning

Compare Vector, BM25, and Hybrid search with reranking and MMR diversity filtering.

View sample

PDF chat demo

Conversational Q&A over PDF documents with PdfChat class.

View sample

Image similarity search

Multimodal RAG with image embeddings for visual content retrieval.

View sample

All RAG samples on GitHub

Browse the complete collection of RAG samples and examples.

View repository

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Build context-aware AI today.

Add retrieval-augmented generation to your .NET application with a single NuGet package. No cloud dependencies. No external services.

Get started Install from NuGet