Solutions · RAG & Knowledge

Your AI keeps making things up. Ground it in real data.

LLMs hallucinate. They confidently cite documents that don't exist and invent facts that sound plausible. LM-Kit's RAG engine solves this by grounding every response in your actual documents, with page-level citations you can verify.

PDFs with tables and complex layouts lose structure
Scanned documents are completely ignored
Cloud RAG services see your confidential data
No way to verify which page the answer came from

Start building free View API docs

100% on-device OCR plus VLM support Page-level citations

Core

`DocumentRag`

Multi-page processing with OCR and VLM-based document understanding. Preserves layout, tables, and structure.

Ready

`PdfChat`

Conversational Q&A over documents. Multi-turn dialogue with automatic context management and caching.

New

`RagChat`

Multi-turn conversational Q&A over custom knowledge bases. 4 query generation modes, tool calling, and agent memory.

Flexible

`IVectorStore`

Storage backends: in-memory, built-in DB, Qdrant, PostgreSQL (pgvector), or custom. Switch without changing code.

0
Cloud dependencies

4
Vector backends

4
Query modes

Ground your AI in real data.

Traditional LLMs hallucinate. LM-Kit.NET's RAG engine grounds every response in your actual documents, databases, and knowledge bases. Semantic retrieval finds the most relevant passages, then generation synthesizes accurate, cited answers.

From simple text files to complex multi-page PDFs with tables, forms, and scanned content, LM-Kit.NET handles it all with intelligent document understanding, OCR, and vision-based parsing. Need multi-turn conversations? RagChat adds conversational Q&A with four query generation modes over any knowledge base.

100% on-device processing. Your documents never leave your infrastructure. Meet GDPR, HIPAA, and data residency requirements by design.

DocumentRag.cs

// Document-centric RAG with full lifecycle management
var docRag = new DocumentRag(embeddingModel);

// Enable OCR for scanned documents
docRag.OcrEngine = new OcrEngine();

// Enable VLM for complex layouts
docRag.VisionParser = new VlmOcr(visionModel);

// Import with metadata for lifecycle tracking
var metadata = new DocumentMetadata(attachment, id: "report-2024-q4");
await docRag.ImportDocumentAsync(attachment, metadata, "reports");

// Query with source references
var result = await docRag.QueryPartitionsAsync(
    "What was Q4 revenue?", matches, conversation);

foreach (var reference in result.SourceReferences)
    Console.WriteLine($"Page {reference.PageNumber}");

Document Intelligence

`DocumentRag`: beyond simple text retrieval.

Multi-page document processing with OCR, vision-based understanding, and complete document lifecycle management.

DocumentRag Class

Intelligent document processing

DocumentRag extends RagEngine with specialized handling for multi-page documents. It automatically extracts text page-by-page, handles mixed content types, and maintains document structure for accurate retrieval.

Multi-page

Multi-page processing

Automatic page-by-page extraction with structure preservation. Handles PDFs, images, and multi-page formats seamlessly.

OCR

OCR integration

Built-in OCR engine extracts text from scanned documents and image-based pages. No external dependencies required.

VLM

Vision-based understanding

VisionParser uses VLMs for advanced document understanding, preserving layout and structure as markdown for complex documents.

Lifecycle

Document lifecycle management

Import, update, and delete documents with explicit IDs. Track document versions and manage your knowledge base programmatically.

Source References

Grounded answers with citations.

Every response includes source references with document names and page numbers. Build trust with your users by showing exactly where information comes from.

Page-level

Page-level attribution

Know exactly which page contains the source information. Enable users to verify and explore original documents.

Events

Progress events

Monitor document import with real-time progress callbacks. Track page processing, embedding generation, and indexing status.

Filtering

Metadata filtering

Attach custom metadata to documents and filter queries by category, date, author, or any custom attribute.

Processing strategies

Three intelligent processing modes.

Choose the optimal strategy for your document types, or let Auto mode select the best approach per page.

Recommended

Auto mode

Intelligent per-page selection. Automatically chooses the best processing strategy based on content type and available engines.

Detects text vs. image-based pages
Falls back gracefully
Optimal quality/speed balance
Recommended for mixed documents

Fast

Text extraction

Traditional text extraction with optional OCR for image-based pages. Fast and efficient for text-heavy documents.

Fastest processing speed
OCR for scanned content
Low resource usage
Best for simple layouts

VLM

Document understanding

Vision language models for advanced parsing. Preserves layout, tables, and structure as markdown.

VLM-powered analysis
Layout preservation
Table structure extraction
Complex document handling

Conversational document Q&A

`PdfChat`: chat with your documents.

A complete conversational interface for document question-answering. Multi-turn dialogue, automatic context management, and intelligent retrieval in one class.

Multi-turn

Multi-turn conversation

Maintain context across questions. Follow-up queries understand conversation history for natural dialogue flow.

Context

Smart context management

Small documents load in full for complete context. Large documents use passage retrieval to inject only relevant excerpts.

Cache

Document caching

Vector store caching for fast subsequent queries. Load a document once, query it indefinitely.

Tools

Tool calling support

Register custom tools the model can invoke during conversation. Extend document Q&A with calculations, lookups, or external APIs.

Reranking

Semantic reranking

Optional reranker refines passage retrieval results for higher precision. Get the most relevant content every time.

Reasoning

Reasoning control

Adjust reasoning depth for models that support extended thinking. Balance response quality with latency.

Memory

Agent memory integration

Connect to AgentMemory for RAG-backed persistent context that survives across conversation sessions.

Events

Comprehensive events

CacheAccessed, PassageRetrievalCompleted, ResponseGenerationStarted, and more. Full observability into the RAG pipeline.

Conversational RAG

`RagChat`: multi-turn Q&A over any knowledge base.

Turn any pre-populated RagEngine into a conversational interface with automatic query contextualization, tool calling, and agent memory.

Mode 01

Original mode

Send the user's question directly to retrieval. Zero overhead, fastest path.

Self-contained questions
Single-turn interactions

Mode 02

Contextual mode

Rewrites follow-up questions into self-contained queries using conversation history.

Pronoun resolution
Multi-turn dialogue

Mode 03

MultiQuery mode

Generates multiple query variants and fuses results via Reciprocal Rank Fusion.

Higher recall
Complex queries

Mode 04

HyDE (Hypothetical Answer)

Generates a hypothetical answer first, then retrieves passages similar to that answer. Bridges the vocabulary gap between questions and documents.

Best precision for factual retrieval
Effective on technical corpora

Explore RagChat in depth

Vector storage

Four flexible storage strategies.

Choose the storage that fits your application's lifecycle. Switch between backends seamlessly via the IVectorStore interface.

In-memory

Fast prototyping, live classification, and immediate feedback. Embeddings stored in RAM with optional serialization to disk.

Zero setup required
Instant feedback
Serializable to disk
Best for: Prototypes, testing

Built-in

Built-in vector DB

SQLite for vectors. File-based persistence with zero external dependencies. Handles millions of vectors on standard hardware.

No infrastructure needed
Portable and shareable
Millions of vectors
Best for: Desktop, offline, air-gapped

Qdrant

Qdrant integration

Enterprise-scale vector search. HNSW indexing, automatic sharding, and distributed deployment via open-source connector.

Billions of vectors
Cloud or local Docker
Sub-second search
Best for: Distributed, production

pgvector

PostgreSQL (pgvector)

Keep embeddings alongside your relational data. Cosine similarity search with automatic extension, schema, and index provisioning via the PostgreSQL connector.

Vectors next to your tables
Cloud or local Docker
Reuse existing PostgreSQL operations
Best for: PostgreSQL-backed apps

Custom

Custom via `IVectorStore`

Implement the IVectorStore interface to connect any proprietary database, internal API, or specialized storage system.

Full backend control
Custom storage logic
Future-proof architecture
Best for: Proprietary systems

Learn more about vector storage options

Advanced capabilities

Production-ready RAG features.

Everything you need to build enterprise-grade retrieval systems.

Reranking

Semantic reranking

Cross-encoder rerankers refine initial retrieval results for significantly higher precision.

Learn how →

Chunking

Advanced chunking

Markdown-aware, semantic, and layout-based chunking strategies. IChunking interface for custom implementations.

Multimodal

Multimodal RAG

Retrieve relevant content from both text and images. Image embeddings enable visual similarity search.

Learn how →

Filtering

Metadata filtering

Attach custom metadata to partitions. Filter queries by category, date range, author, or any attribute.

Memory

Agent memory

RAG-backed persistent memory for conversational agents. Store and recall context across sessions.

Learn how →

Privacy

Data privacy

100% on-device processing. Documents never leave your infrastructure. GDPR, HIPAA compliant by design.

APIs

Async/Sync APIs

Every method available in both synchronous and asynchronous variants. Build responsive UIs or batch processes.

Streaming

Streaming responses

Real-time token streaming for responsive user experiences. AfterTextCompletion event for incremental updates.

Templates

Custom prompt templates

Configure how retrieved context is presented to the model. Optimize prompts for your specific use case.

API reference

Core RAG classes.

Comprehensive API documentation for building custom RAG pipelines.

`DocumentRag`

Document-centric RAG with OCR, VLM parsing, and lifecycle management.

View documentation

`PdfChat`

Browse the complete collection of RAG samples and examples.

View repository

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

RAG (retrieval-augmented generation)

Console demo: ingest, embed, retrieve, ground answers with citations.

Open on GitHub → How-to guide

Build a RAG pipeline

End-to-end how-to for production RAG in .NET.

Read the guide → How-to guide

Improve RAG results with reranking

Hybrid retrieval plus rerank for precision-critical workloads.

Read the guide → How-to guide

Boost retrieval with hybrid search

Dense plus sparse plus lexical signals in one query.

Read the guide →

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

You are here

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Build context-aware AI today.

Add retrieval-augmented generation to your .NET application with a single NuGet package. No cloud dependencies. No external services.

Get started Install from NuGet

Your AI keeps making things up. Ground it in real data.

DocumentRag

PdfChat

RagChat

IVectorStore

Intelligent document processing

Multi-page processing

OCR integration

Vision-based understanding

Document lifecycle management

Page-level attribution

Progress events

Metadata filtering

Auto mode

Text extraction

Document understanding

Multi-turn conversation

Smart context management

Document caching

Tool calling support

Semantic reranking

Reasoning control

Agent memory integration

Comprehensive events

Original mode

Contextual mode

MultiQuery mode

HyDE (Hypothetical Answer)

In-memory

Built-in vector DB

Qdrant integration

PostgreSQL (pgvector)

Custom via IVectorStore

Semantic reranking

Advanced chunking

Multimodal RAG

Metadata filtering

Agent memory

Data privacy

Async/Sync APIs

Streaming responses

Custom prompt templates

DocumentRag

PdfChat

RagChat

RagEngine

DataSource

Embedder

TextChunking

IVectorStore

QdrantEmbeddingStore

AgentMemory

Conversational RAG

Single-turn RAG

RAG with Qdrant vector store

RAG with PostgreSQL (pgvector)

Help desk knowledge base

Retrieval quality tuning

PDF chat demo

Image similarity search

RAG (retrieval-augmented generation)

Build a RAG pipeline

Improve RAG results with reranking

Boost retrieval with hybrid search

Orchestration patterns

Parse PDFs, images, EML

VLMs, image classification, chat with image

Vector search and retrieval

Classification, NER, PII, sentiment

Audio transcription, STT

Conversations, rewriting, summaries

Local Inference

`DocumentRag`

`PdfChat`

`RagChat`

`IVectorStore`

Custom via `IVectorStore`

`DocumentRag`

`PdfChat`

`RagChat`

`RagEngine`

`DataSource`

`Embedder`

`TextChunking`

`IVectorStore`

`QdrantEmbeddingStore`

`AgentMemory`