Your AI Keeps Making Things Up.Ground It in Real Data.
LLMs hallucinate. They confidently cite documents that don't exist and invent facts that sound plausible. LM-Kit's RAG engine solves this by grounding every response in your actual documents, with page-level citations you can verify.
DocumentRag
Multi-page processing with OCR and VLM-based document understanding. Preserves layout, tables, and structure.
PdfChat
Conversational Q&A over documents. Multi-turn dialogue with automatic context management and caching.
IVectorStore
4 storage backends: in-memory, built-in DB, Qdrant, or custom. Switch without changing code.
Ground Your AI in Real Data
Traditional LLMs hallucinate. LM-Kit.NET's RAG engine grounds every response in your actual documents, databases, and knowledge bases. Semantic retrieval finds the most relevant passages, then generation synthesizes accurate, cited answers.
From simple text files to complex multi-page PDFs with tables, forms, and scanned content, LM-Kit.NET handles it all with intelligent document understanding, OCR, and vision-based parsing.
100% on-device processing. Your documents never leave your infrastructure. Meet GDPR, HIPAA, and data residency requirements by design.
// Document-centric RAG with full lifecycle management var docRag = new DocumentRag(embeddingModel); // Enable OCR for scanned documents docRag.OcrEngine = new OcrEngine(); // Enable VLM for complex layouts docRag.VisionParser = new VlmOcr(visionModel); // Import with metadata for lifecycle tracking var metadata = new DocumentMetadata( attachment, id: "report-2024-q4"); await docRag.ImportDocumentAsync( attachment, metadata, "reports"); // Query with source references var result = await docRag.QueryPartitionsAsync( "What was Q4 revenue?", matches, conversation); foreach (var reference in result.SourceReferences) Console.WriteLine( $"Page {reference.PageNumber}");
DocumentRag: Beyond Simple Text Retrieval
Multi-page document processing with OCR, vision-based understanding, and complete document lifecycle management.
Intelligent Document Processing
DocumentRag extends RagEngine with specialized handling for multi-page documents. It automatically extracts text page-by-page, handles mixed content types, and maintains document structure for accurate retrieval.
-
Multi-Page Processing
Automatic page-by-page extraction with structure preservation. Handles PDFs, images, and multi-page formats seamlessly.
-
OCR Integration
Built-in OCR engine extracts text from scanned documents and image-based pages. No external dependencies required.
-
Vision-Based Understanding
VisionParser uses VLMs for advanced document understanding, preserving layout and structure as markdown for complex documents.
-
Document Lifecycle Management
Import, update, and delete documents with explicit IDs. Track document versions and manage your knowledge base programmatically.
Grounded Answers with Citations
Every response includes source references with document names and page numbers. Build trust with your users by showing exactly where information comes from.
-
Page-Level Attribution
Know exactly which page contains the source information. Enable users to verify and explore original documents.
-
Progress Events
Monitor document import with real-time progress callbacks. Track page processing, embedding generation, and indexing status.
-
Metadata Filtering
Attach custom metadata to documents and filter queries by category, date, author, or any custom attribute.
Three Intelligent Processing Modes
Choose the optimal strategy for your document types, or let Auto mode select the best approach per page.
Auto Mode
Intelligent per-page selection. Automatically chooses the best processing strategy based on content type and available engines.
- Detects text vs. image-based pages
- Falls back gracefully
- Optimal quality/speed balance
- Recommended for mixed documents
Text Extraction
Traditional text extraction with optional OCR for image-based pages. Fast and efficient for text-heavy documents.
- Fastest processing speed
- OCR for scanned content
- Low resource usage
- Best for simple layouts
Document Understanding
Vision language models for advanced parsing. Preserves layout, tables, and structure as markdown.
- VLM-powered analysis
- Layout preservation
- Table structure extraction
- Complex document handling
PdfChat: Chat With Your Documents
A complete conversational interface for document question-answering. Multi-turn dialogue, automatic context management, and intelligent retrieval in one class.
Multi-Turn Conversation
Maintain context across questions. Follow-up queries understand conversation history for natural dialogue flow.
Smart Context Management
Small documents load in full for complete context. Large documents use passage retrieval to inject only relevant excerpts.
Document Caching
Vector store caching for fast subsequent queries. Load a document once, query it indefinitely.
Tool Calling Support
Register custom tools the model can invoke during conversation. Extend document Q&A with calculations, lookups, or external APIs.
Semantic Reranking
Optional reranker refines passage retrieval results for higher precision. Get the most relevant content every time.
Reasoning Control
Adjust reasoning depth for models that support extended thinking. Balance response quality with latency.
Agent Memory Integration
Connect to AgentMemory for RAG-backed persistent context that survives across conversation sessions.
Comprehensive Events
CacheAccessed, PassageRetrievalCompleted, ResponseGenerationStarted, and more. Full observability into the RAG pipeline.
Four Flexible Storage Strategies
Choose the storage that fits your application's lifecycle. Switch between backends seamlessly via the IVectorStore interface.
In-Memory
Fast prototyping, live classification, and immediate feedback. Embeddings stored in RAM with optional serialization to disk.
- Zero setup required
- Instant feedback
- Serializable to disk
Built-in Vector DB
SQLite for vectors. File-based persistence with zero external dependencies. Handles millions of vectors on standard hardware.
- No infrastructure needed
- Portable and shareable
- Millions of vectors
Qdrant Integration
Enterprise-scale vector search. HNSW indexing, automatic sharding, and distributed deployment via open-source connector.
- Billions of vectors
- Cloud or local Docker
- Sub-second search
Custom via IVectorStore
Implement the IVectorStore interface to connect any proprietary database, internal API, or specialized storage system.
- Full backend control
- Custom storage logic
- Future-proof architecture
Production-Ready RAG Features
Everything you need to build enterprise-grade retrieval systems.
Semantic Reranking
Cross-encoder rerankers refine initial retrieval results for significantly higher precision on complex queries.
Advanced Chunking
Markdown-aware, semantic, and layout-based chunking strategies. IChunking interface for custom implementations.
Multimodal RAG
Retrieve relevant content from both text and images. Image embeddings enable visual similarity search.
Metadata Filtering
Attach custom metadata to partitions. Filter queries by category, date range, author, or any attribute.
Agent Memory
RAG-backed persistent memory for conversational agents. Store and recall context across sessions.
Data Privacy
100% on-device processing. Documents never leave your infrastructure. GDPR, HIPAA compliant by design.
Async/Sync APIs
Every method available in both synchronous and asynchronous variants. Build responsive UIs or batch processes.
Streaming Responses
Real-time token streaming for responsive user experiences. AfterTextCompletion event for incremental updates.
Custom Prompt Templates
Configure how retrieved context is presented to the model. Optimize prompts for your specific use case.
Core RAG Classes
Comprehensive API documentation for building custom RAG pipelines.
Get Started in Minutes
Clone working examples from our GitHub repository and customize for your use case.
Custom Chatbot with RAG
Build a knowledge-grounded chatbot using RagEngine with multiple data sources.
View SampleRAG with Qdrant Vector Store
Enterprise-scale RAG using Qdrant for vector storage and search.
View SampleImage Similarity Search
Multimodal RAG with image embeddings for visual content retrieval.
View SampleBuild Context-Aware AI Today
Add retrieval-augmented generation to your .NET application with a single NuGet package. No cloud dependencies. No external services.