100+
Pre-configured cards plus any GGUF from Hugging Face.
Seven capability pillars on one adaptive inference engine. Agents, document intelligence, vision, RAG, text analysis, speech, generation. One NuGet, zero cloud calls, full control of your data, your latency, and your bill.
LM-Kit.NET is the complete in-process AI runtime for .NET. No Python sidecar, no Docker, no HTTP service. The same NuGet that loads an LLM also runs OCR, speech-to-text, vision chat, structured extraction, agents with tools, RAG pipelines, classifiers, and embeddings.
Pre-configured cards plus any GGUF from Hugging Face.
Agents, Docs, Vision, RAG, Text, Speech, Generation.
Atomic, security-first tools. Constantly growing catalog.
CPU, AVX2, CUDA 12/13, Vulkan, Metal. Same code path.
Every model runs on your hardware. No data leaves the box.
In-process SDK. No Python runtime, no Docker, no daemons.
In-memory, built-in file DB, Qdrant, bring-your-own.
Whisper-family STT with VAD and hallucination suppression.
The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.
Underneath every LM-Kit call sits an adaptive inference engine that steers each token in real time using structural awareness, contextual signals, and grammar-aligned validation. It is the reason a 4B local model can match fine-tuned cloud behaviour on extraction, classification, function calling, and structured generation. Always on, model-agnostic, no retraining required.
Pillar A
Dynamic grammar guarantees JSON, schemas, and tool-call shapes always parse. A novel hybrid path runs roughly twice as fast as classical grammar sampling.
Pillar B
Per-token contextual perplexity, semantic memory for codes and identifiers, structural rejection of malformed runs. Hallucinations drop, recoveries happen in place.
Pillar C
No architecture coupling, no fine-tuning, no per-model adapter. Drop in a new open-weight release and the layer keeps working from day one.
LM-Kit ships a growing catalog of agent tools across eight categories. Each tool performs exactly one operation, exposes rich metadata, and integrates with the permission policy system for enterprise-grade access control. One tool, one feature. Compose freely.
01 · Data
JSON, XML, CSV, YAML, HTML, Markdown, databases, spreadsheets, QR codes. Predictable, typed I/O.
02 · Document
PDF manipulation, image preprocessing, OCR, format conversion between Markdown, EML, HTML, DOCX.
03 · Text
Diff, regex, templating, encoding, slugification, fuzzy matching, phonetics. The stuff prompts cannot do.
04 · Numeric
Calculator, unit conversion, statistics, financial math, random, expression evaluation.
05 · Security
Hashing, encryption, JWT, validation, password generation, checksums. Audit-friendly defaults.
06 · Utility
Date and time, cron, URLs, colors, locales, MIME types, paths, scheduling, time zones.
07 · IO
File system, process execution, compression, clipboard, environment, file watching.
08 · Net
HTTP verbs, FTP, web search (DuckDuckGo, Brave, Tavily, Serper, SearXNG), SMTP, RSS feeds, diagnostics.
Permission policies
Every tool implements IToolMetadata with explicit risk level, side
effect kind (LocalRead / LocalWrite / NetworkRead / NetworkWrite / Irreversible),
default approval mode, and read-only flag. Pair with ToolPermissionPolicy
for centralized allow/deny rules, wildcard patterns, and approval gates.
Production-safe out of the box.
A constantly growing catalog of open-weight models covering text generation, vision, embeddings, OCR, and speech. Load any of them by ID, or point at any GGUF on Hugging Face.
Text LLMs
Gemma 3 (1B / 4B / 12B / 27B), Qwen 3 (0.6B to 14B), Llama 3.1, Phi-4, GLM 4.7 Flash, GPT-OSS 20B. Chat, reasoning, tool use, multilingual.
Vision
Qwen 2/2.5/3 VL, Gemma 3 VL, GLM-V 4.6 Flash. Dedicated OCR via PaddleOCR-VL and GLM-OCR. Drop an Attachment into any conversation.
Embeddings
EmbeddingGemma 300M, Qwen3-Embedding 0.6B / 4B / 8B, BGE-M3, Nomic-Embed-Text and Nomic-Embed-Vision. Multilingual, cross-modal.
Speech
Tiny through Large V3 and Large Turbo V3. 100+ languages, real-time translation to English, Voice Activity Detection.
Task models
Sentiment-analysis 2.0 and lmkit-tasks variants for fast on-device classification, NER, and PII work.
Bring-your-own
Point new LM(uri) at any GGUF on Hugging Face or your own storage. The catalog is curation, not constraint.
A complete document understanding and retrieval stack. PDF text and table extraction, OCR that beats commercial engines, layout-aware parsing, typed field extraction, document splitting, multi-document chat, and RAG pipelines with page-level citations.
Doc chat
PdfChat loads several documents into one conversation. Multi-turn, cited answers, automatic context management.
Document RAG
DocumentRag with OCR plus VLM document understanding. Multi-page processing, auto-detection of text vs scanned, source references on every answer.
Extraction
Schema-driven extraction with grammar-constrained generation. Invoices, contracts, forms, ID cards. Output parses every time.
Structured extractionSplitting
Intelligent document splitting finds logical boundaries in long PDFs. Cut a batch into per-document files automatically.
Document splittingOCR
CPU-efficient native engine plus VLM OCR (PaddleOCR-VL, GLM-OCR). SOTA benchmark accuracy, fast on a single core, no commercial licence to budget for.
OCRRAG pipeline
End-to-end RAG: vector and BM25 retrieval, hybrid search, MMR diversity, cross-encoder reranking, multiple query generation strategies.
RAG & knowledgeVector storage
In-memory, built-in file DB, Qdrant connector, bring-your-own via IVectorStore. Switch backend without changing code.
Embeddings
Unified Embedder for text and images, batch-friendly, async-first. Cross-modal similarity out of the box.
A strongly-typed agent class with system prompts, planning strategies, tool registries, persistent memory, MCP clients, multi-agent orchestration, and production-grade observability. Compose freely, ship confidently.
Agent runtime
First-class Agent with builders, identity, tools, planning strategy, retry policies, streaming, and event-level observability.
Planning
Pick the planning strategy per agent. Bounded steps, recoverable errors, deterministic by configuration.
Agent reasoningOrchestration
Sequential pipelines, parallel fan-out, supervisor delegation. Compose agents into deterministic graphs.
Multi-agent workflowsGraph
Sequential, Parallel, and Conditional nodes. Thread-safe context, channel-based streaming, compose any workflow shape.
Graph orchestrationMemory
RAG-backed long-term memory that survives sessions. Recall, summarisation, semantic retrieval out of the box.
Agent memorySkills
Drop SKILL.md files into a folder. The registry picks them up. Versionable, testable, swappable.
Agent skillsMCP
Connect to any MCP-compatible server. The built-in tool catalog and external MCP tools coexist in the same registry.
MCP integrationGuardrails
Centralized policy: allow/deny patterns, max risk level, approval requirements per tool. Audit-friendly, production-safe.
PermissionsObservability
Events for plan, tool call, model decision, retry, error. Integrate with your existing telemetry pipeline.
ObservabilityLM-Kit.NET is not just an LLM library. The same NuGet ships state-of-the-art vision-language models, speech-to-text with Voice Activity Detection, sentiment and emotion classifiers, NER and PII extractors, multilingual translation, and grammar-aware text correction.
Vision
Drop an Attachment into MultiTurnConversation. Multiple images per turn, streaming tokens, tool calls. Image understanding, classification, labeling, background removal.
Speech
Whisper-family STT. 100+ languages, real-time translation, Voice Activity Detection, hallucination suppression, voice-command dictation formatting.
Speech & audioSentiment
Multilingual polarity and emotion classification with neutral support. LoRA-fine-tunable on your domain.
SentimentNER
Built-in entity types plus custom EntityDefinition. Character offsets on every mention for precise downstream use.
PII
Emails, phones, IDs, addresses, custom domain labels. Batch processing for high-volume document scanning.
PII extractionGeneration
Single-turn, multi-turn, and stateless primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text generationBeyond the headline features, LM-Kit.NET ships the production controls a team needs once a prototype meets real workloads: memory hibernation, encrypted model loading, multi-GPU split, LoRA, fine-tuning, quantization, and sampling levers.
Hibernation
Serialize an entire conversation context (KV-cache and all session state) to disk. Free RAM/VRAM on demand. Rehydrate transparently on the next call.
Context hibernationEncryption
Stream-decrypted GGUF. No plaintext model files on disk at any point. Ship the model, keep the secret.
Encrypted modelsMulti-GPU
Split big models across multiple GPUs. Per-tensor placement overrides for fine-grained control on heterogeneous hardware.
Multi-GPUSampling
Per-turn sampling levers composed on top of Dynamic Sampling. Switch from creative to deterministic without rebuilding the conversation.
Sampling controlsFine-tuning
Bring data, build a LoRA, or fine-tune the full model. The training loop runs in the same NuGet that runs inference.
LLM fine-tuningLoRA
Compose multiple LoRA adapters at load time. Swap personas, domains, or tasks without reloading the base model.
LoRA integrationQuantization
Quantize models to fit the hardware budget without leaving the SDK. From Q8 down to Q2_K, with knobs for precision-critical layers.
Model quantizationBackends
Same C# call dispatches to the fastest available backend. Deploy once, run on anything from a Raspberry Pi to an 8-GPU server.
Hardware backends
LM-Kit.NET plugs into the Microsoft AI ecosystem without rewriting your existing
orchestration code. Existing IChatClient and Semantic Kernel
pipelines keep working with a local model behind them.
Bridge
Stream tokens, call functions, embed text. Every IChatClient, IEmbeddingGenerator, and middleware-aware abstraction you wrote against the official package keeps working with LM-Kit as the local backend.
Bridge
Use LM-Kit.NET as a Semantic Kernel connector. Plug local chat completion, embeddings, and function-calling into existing SK plans, planners, and skills.
Semantic Kernel bridgeProtocol
MCP clients ship in the agent runtime. Connect to any MCP server; built-in tools and external MCP tools coexist in a unified registry with the same permission policy.
MCP integration
The same LM.LoadFromModelID call dispatches to the fastest backend
available on the host. No environment branches, no per-platform builds. Deploy
once, run on every developer machine and every production target.
CUDA
Tensor-core acceleration with multi-GPU split for big models. The dependency package is pulled in transitively; one package, one switch.
Vulkan
AMD, Intel, and any GPU with a Vulkan driver. One backend, every vendor, no per-platform code.
Metal
Native Metal on M1, M2, M3, and beyond. Unified-memory routing for laptops; no extra runtime.
CPU / AVX2
Optimised SSE 4.1/4.2 and AVX2 kernels. The same model runs without a GPU; smaller models stay fast on a laptop CPU.
Same NuGet, same API surface, every supported target. Targets .NET Standard 2.0 so it slots into existing .NET Framework 4.6.2+ codebases too.
Three honest comparisons with the alternatives a .NET team actually weighs. No straw men, no invented numbers.
Compare
No per-token bill. No data leaving your network. Latency you can predict. Inference cost equals the cost of compute you already own. Works offline by design.
Local vs Cloud, in depthCompare
No FastAPI sidecar, no HTTP shim, no two-runtime tax. LM-Kit links into your .NET process, picks up the right native acceleration, and stays out of the way. Async/await all the way down.
LM-Kit vs LangChainCompare
Most ship inference only. LM-Kit ships the full runtime: agents, RAG, OCR, structured extraction, speech, vision, classifiers, embeddings, plus the symbolic layer that makes small models behave.
LM-Kit vs LlamaSharpLM-Kit.NET is built for the .NET applications that cannot send data to a cloud endpoint, cannot rely on a network connection, or cannot afford per-token costs at scale.
Regulated
HIPAA, GDPR, and data-residency requirements satisfied by design. Patient records, claims, and citizen data stay on the box.
Enterprise
RAG over policies, runbooks, wikis, contracts, support tickets. Cited answers without sending source material to a third party.
Edge
Field laptops, rugged kiosks, vehicle telemetry, manufacturing floors. Inference works without connectivity.
Cost
Batch document processing, classification pipelines, customer support analysis. Marginal cost is compute, not token bills.
Product
Wrap LM-Kit in a Windows/macOS desktop product. Customers run inference on their own hardware. No backend to operate.
Speed
Voice assistants, code editors, interactive analysis. Local inference removes the round-trip; first-token times in milliseconds.
Add a single package to your .csproj. The runtime, native binaries
for every supported backend, and the entire AI stack come with it. No Python
runtime, no Docker, no daemons.
# 1. Add LM-Kit.NET to your project
$ dotnet add package LM-Kit.NET
# 2. (Optional) Plug in a GPU backend
# The dependency package is pulled in transitively.
$ dotnet add package LM-Kit.NET.Backend.Cuda13.Windows
# or:
$ dotnet add package LM-Kit.NET.Backend.Cuda13.Linux
using LMKit.Model;
using LMKit.TextGeneration;
var model = LM.LoadFromModelID("qwen3.5:4b");
var chat = new MultiTurnConversation(model);
var reply = await chat.SubmitAsync("Hello, LM-Kit.");
Console.WriteLine(reply.Text);
Run the full SDK on your own hardware at no cost. Buy a commercial license when LM-Kit becomes part of a product you sell.
Freeforever
Full SDK access for any company or individual. Build and deploy non-commercial applications, or evaluate LM-Kit end to end before shipping.
Customper project
For products that ship LM-Kit to customers. Pricing scaled to deployment size and value. Includes dedicated support and direct roadmap input.
No. Everything runs in-process inside your .NET application. No Python, no Docker, no daemons, no HTTP service. One NuGet, one process.
NVIDIA via CUDA 12/13, Apple Silicon via Metal, AMD and Intel via Vulkan. CPU and AVX2 act as a fallback. The same C# code dispatches to whichever backend is fastest on the host.
Yes. LM-Kit targets .NET Standard 2.0, so it slots into .NET Framework 4.6.2+ codebases alongside .NET 8 / 9 / 10.
Point new LM(new Uri("...")) at any GGUF on Hugging Face or your own storage. The catalog is a curated set, not a constraint.
Never, unless you explicitly call an external tool like WebSearch. Model inference, RAG, OCR, speech, and embeddings all run on your hardware.
Yes. LM-Kit ships in-process LoRA training and full fine-tuning. The same NuGet that runs inference runs the training loop.
Both are first-class bridges. Existing IChatClient pipelines and SK connectors work unchanged with LM-Kit as the local backend.
Yes. The agent runtime includes Model Context Protocol clients. Built-in tools and external MCP tools coexist in the same registry under the same permission policy.
Install the NuGet, load a model, ship the feature. The free Community Edition is enough to evaluate the entire surface.