Pillar A
Constrained output
Dynamic grammar guarantees JSON, schemas, and tool-call shapes always parse. A novel hybrid path runs roughly twice as fast as classical grammar sampling.
Seven capability pillars on one adaptive inference engine. Agents, document intelligence, vision, RAG, text analysis, speech, generation. One NuGet, zero cloud calls, full control of your data and your latency.
Most "local LLM" tools are inference engines. LM-Kit.NET is the runtime that sits on top: agents that reason and call tools, RAG with page-level citations, OCR that holds its own against commercial engines, structured extraction that emits typed C# objects, multilingual speech-to-text, image understanding, embeddings, and a growing catalog of built-in tools. Every capability ships in the same NuGet, runs on the same model graph, and respects the same adaptive sampling layer underneath.
LM-Kit.NET ships seven pillars and the local runtime they all sit on. Use the parts you need, ignore the rest.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.
The reason a 4B local model can match fine-tuned cloud behaviour on extraction, classification, and structured generation. Dynamic Sampling is an adaptive inference engine that sits underneath every LM-Kit call, steering each token with structural awareness, contextual signals, and grammar-aligned validation. Always on, model-agnostic, no retraining.
Pillar A
Dynamic grammar guarantees JSON, schemas, and tool-call shapes always parse. A novel hybrid path runs roughly twice as fast as classical grammar sampling.
Pillar B
Per-token contextual perplexity, semantic memory for codes and identifiers, structural rejection of malformed runs. Hallucinations drop, recoveries happen in place.
Pillar C
No architecture coupling, no fine-tuning, no per-model adapter. Drop in a new open-weight release and the layer keeps working from day one.
LM-Kit.NET is the answer when the alternatives don't fit. Three honest comparisons, no straw men.
Compare
No per-token bill. No data leaving your network. Latency you can predict. Inference cost equals the cost of compute you already own. Works offline by design.
Local vs Cloud, in depthCompare
No FastAPI sidecar, no HTTP shim, no two-runtime tax. LM-Kit links into your .NET process, picks up the right native acceleration, and stays out of the way. Async/await all the way down.
LM-Kit vs LangChainCompare
Most ship inference only. LM-Kit ships the full runtime: agents, RAG, OCR, structured extraction, speech, vision, classifiers, embeddings, plus the symbolic layer that makes small models behave.
LM-Kit vs LlamaSharp
Existing IChatClient pipelines work unchanged. LM-Kit becomes the
local backend behind code you already wrote.
Bridge
Stream tokens, call functions, embed text. Every IChatClient, IEmbeddingGenerator, and middleware-aware abstraction you wrote against the official package keeps working with a local model behind it.
Bridge
Use LM-Kit.NET as a Semantic Kernel connector. Plug local chat completion, embeddings, and function-calling into existing SK plans, planners, and skills without rewriting the orchestration layer.
Open the Semantic Kernel bridgeNo sidecar service, no special runtime. LM-Kit links into your application, picks up the right native acceleration for the host, and gets out of the way.
Run the full SDK on your own hardware at no cost. Buy a commercial license when LM-Kit is part of a product you sell to customers.
Freeforever
Full SDK access for any company or individual. Build and deploy non-commercial applications, or evaluate LM-Kit before shipping.
Customper project
For products that ship LM-Kit to customers. Pricing is scaled to deployment size and value. Includes dedicated support and roadmap input.
Get started