LM-Kit.NET · The .NET SDK for local AI

The complete local AI runtime for .NET.

Seven capability pillars on one adaptive inference engine. Agents, document intelligence, vision, RAG, text analysis, speech, generation. One NuGet, zero cloud calls, full control of your data, your latency, and your bill.

NuGet: LM-Kit.NET Targets: .NET Standard 2.0 · .NET 8 / 9 / 10 Platforms: Windows · Linux · macOS
What ships in the box

One package. The whole AI stack.

LM-Kit.NET is the complete in-process AI runtime for .NET. No Python sidecar, no Docker, no HTTP service. The same NuGet that loads an LLM also runs OCR, speech-to-text, vision chat, structured extraction, agents with tools, RAG pipelines, classifiers, and embeddings.

Models

100+

Pre-configured cards plus any GGUF from Hugging Face.

Pillars

7

Agents, Docs, Vision, RAG, Text, Speech, Generation.

Built-in tools

8 categories

Atomic, security-first tools. Constantly growing catalog.

Backends

5

CPU, AVX2, CUDA 12/13, Vulkan, Metal. Same code path.

Cloud calls

0

Every model runs on your hardware. No data leaves the box.

External services

0

In-process SDK. No Python runtime, no Docker, no daemons.

Vector backends

4

In-memory, built-in file DB, Qdrant, bring-your-own.

Speech languages

100+

Whisper-family STT with VAD and hallucination suppression.

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation
Core technology

Dynamic Sampling, the symbolic layer.

Underneath every LM-Kit call sits an adaptive inference engine that steers each token in real time using structural awareness, contextual signals, and grammar-aligned validation. It is the reason a 4B local model can match fine-tuned cloud behaviour on extraction, classification, function calling, and structured generation. Always on, model-agnostic, no retraining required.

Pillar A

Constrained output

Dynamic grammar guarantees JSON, schemas, and tool-call shapes always parse. A novel hybrid path runs roughly twice as fast as classical grammar sampling.

Pillar C

Model-agnostic

No architecture coupling, no fine-tuning, no per-model adapter. Drop in a new open-weight release and the layer keeps working from day one.

Open the Dynamic Sampling deep dive →

Built-in tools

Eight categories of atomic, security-first tools.

LM-Kit ships a growing catalog of agent tools across eight categories. Each tool performs exactly one operation, exposes rich metadata, and integrates with the permission policy system for enterprise-grade access control. One tool, one feature. Compose freely.

01 · Data

Parse and transform

JSON, XML, CSV, YAML, HTML, Markdown, databases, spreadsheets, QR codes. Predictable, typed I/O.

02 · Document

PDF, OCR, format conversion

PDF manipulation, image preprocessing, OCR, format conversion between Markdown, EML, HTML, DOCX.

03 · Text

String operations

Diff, regex, templating, encoding, slugification, fuzzy matching, phonetics. The stuff prompts cannot do.

04 · Numeric

Compute primitives

Calculator, unit conversion, statistics, financial math, random, expression evaluation.

05 · Security

Hashing, JWT, validation

Hashing, encryption, JWT, validation, password generation, checksums. Audit-friendly defaults.

06 · Utility

Date, URL, locale, MIME

Date and time, cron, URLs, colors, locales, MIME types, paths, scheduling, time zones.

07 · IO

Filesystem, processes, files

File system, process execution, compression, clipboard, environment, file watching.

08 · Net

HTTP, web search, RSS

HTTP verbs, FTP, web search (DuckDuckGo, Brave, Tavily, Serper, SearXNG), SMTP, RSS feeds, diagnostics.

Permission policies

Every tool implements IToolMetadata with explicit risk level, side effect kind (LocalRead / LocalWrite / NetworkRead / NetworkWrite / Irreversible), default approval mode, and read-only flag. Pair with ToolPermissionPolicy for centralized allow/deny rules, wildcard patterns, and approval gates. Production-safe out of the box.

Model catalog

Open-weight models, curated and ready.

A constantly growing catalog of open-weight models covering text generation, vision, embeddings, OCR, and speech. Load any of them by ID, or point at any GGUF on Hugging Face.

Text LLMs

From 0.6B to 30B-class MoE

Gemma 3 (1B / 4B / 12B / 27B), Qwen 3 (0.6B to 14B), Llama 3.1, Phi-4, GLM 4.7 Flash, GPT-OSS 20B. Chat, reasoning, tool use, multilingual.

Vision

VLMs for chat and OCR

Qwen 2/2.5/3 VL, Gemma 3 VL, GLM-V 4.6 Flash. Dedicated OCR via PaddleOCR-VL and GLM-OCR. Drop an Attachment into any conversation.

Embeddings

Text and image vectors

EmbeddingGemma 300M, Qwen3-Embedding 0.6B / 4B / 8B, BGE-M3, Nomic-Embed-Text and Nomic-Embed-Vision. Multilingual, cross-modal.

Speech

Whisper family STT

Tiny through Large V3 and Large Turbo V3. 100+ languages, real-time translation to English, Voice Activity Detection.

Task models

Purpose-trained classifiers

Sentiment-analysis 2.0 and lmkit-tasks variants for fast on-device classification, NER, and PII work.

Bring-your-own

Any GGUF, any URI

Point new LM(uri) at any GGUF on Hugging Face or your own storage. The catalog is curation, not constraint.

Browse the catalog Pick a model
Document intelligence & RAG

Read every PDF, ground every answer, cite every source.

A complete document understanding and retrieval stack. PDF text and table extraction, OCR that beats commercial engines, layout-aware parsing, typed field extraction, document splitting, multi-document chat, and RAG pipelines with page-level citations.

Agents & orchestration

Production agent patterns, not toy demos.

A strongly-typed agent class with system prompts, planning strategies, tool registries, persistent memory, MCP clients, multi-agent orchestration, and production-grade observability. Compose freely, ship confidently.

Production-grade

The boring parts that ship.

Beyond the headline features, LM-Kit.NET ships the production controls a team needs once a prototype meets real workloads: memory hibernation, encrypted model loading, multi-GPU split, LoRA, fine-tuning, quantization, and sampling levers.

Runs where your code already runs

Cross-platform by default, not by accident.

Same NuGet, same API surface, every supported target. Targets .NET Standard 2.0 so it slots into existing .NET Framework 4.6.2+ codebases too.

Runtime
.NET Standard 2.0 · .NET 8 · 9 · 10
Operating systems
Windows 10+ · Linux x64 & ARM64 · macOS Universal
GPU acceleration
CUDA 12 · CUDA 13 · Vulkan · Metal
CPU acceleration
SSE 4.1 / 4.2 · AVX · AVX2
Models
Gemma 3, Qwen 3, Llama, Phi-4, GLM 4.7, GPT OSS, Whisper, embeddings, OCR, VLMs
Storage
In-memory · built-in file DB · Qdrant · bring-your-own
Languages
C# · F# · VB.NET · any .NET-compatible language
Bridges
Microsoft.Extensions.AI · Semantic Kernel · MCP clients
Where teams ship LM-Kit

Workflows where local AI actually wins.

LM-Kit.NET is built for the .NET applications that cannot send data to a cloud endpoint, cannot rely on a network connection, or cannot afford per-token costs at scale.

Regulated

Healthcare, finance, government

HIPAA, GDPR, and data-residency requirements satisfied by design. Patient records, claims, and citizen data stay on the box.

Enterprise

Internal copilots

RAG over policies, runbooks, wikis, contracts, support tickets. Cited answers without sending source material to a third party.

Edge

Offline and air-gapped

Field laptops, rugged kiosks, vehicle telemetry, manufacturing floors. Inference works without connectivity.

Cost

High-volume workloads

Batch document processing, classification pipelines, customer support analysis. Marginal cost is compute, not token bills.

Product

Shipping AI in a desktop app

Wrap LM-Kit in a Windows/macOS desktop product. Customers run inference on their own hardware. No backend to operate.

Speed

Latency-critical UIs

Voice assistants, code editors, interactive analysis. Local inference removes the round-trip; first-token times in milliseconds.

Install

Zero dependencies. One NuGet.

Add a single package to your .csproj. The runtime, native binaries for every supported backend, and the entire AI stack come with it. No Python runtime, no Docker, no daemons.

terminal
# 1. Add LM-Kit.NET to your project
$ dotnet add package LM-Kit.NET

# 2. (Optional) Plug in a GPU backend
#    The dependency package is pulled in transitively.
$ dotnet add package LM-Kit.NET.Backend.Cuda13.Windows
# or:
$ dotnet add package LM-Kit.NET.Backend.Cuda13.Linux
Program.cs
using LMKit.Model;
using LMKit.TextGeneration;

var model = LM.LoadFromModelID("qwen3.5:4b");
var chat  = new MultiTurnConversation(model);

var reply = await chat.SubmitAsync("Hello, LM-Kit.");
Console.WriteLine(reply.Text);
Open the Quickstart API reference Code samples GitHub samples
Licensing

Free for builders. Commercial when you ship.

Run the full SDK on your own hardware at no cost. Buy a commercial license when LM-Kit becomes part of a product you sell.

Community

Freeforever

Full SDK access for any company or individual. Build and deploy non-commercial applications, or evaluate LM-Kit end to end before shipping.

  • Full feature surface; no capability gates
  • Deployment: development, internal tools, OSS
  • Platforms: Windows, Linux, macOS
  • Community support on GitHub

Professional

Customper project

For products that ship LM-Kit to customers. Pricing scaled to deployment size and value. Includes dedicated support and direct roadmap input.

  • Commercial redistribution rights
  • Dedicated technical support
  • Unlimited developers and end users
  • Direct relationship with the engineering team
FAQ

Questions we hear often.

Does LM-Kit require a Python runtime?

No. Everything runs in-process inside your .NET application. No Python, no Docker, no daemons, no HTTP service. One NuGet, one process.

Which GPUs work out of the box?

NVIDIA via CUDA 12/13, Apple Silicon via Metal, AMD and Intel via Vulkan. CPU and AVX2 act as a fallback. The same C# code dispatches to whichever backend is fastest on the host.

Can I run on .NET Framework?

Yes. LM-Kit targets .NET Standard 2.0, so it slots into .NET Framework 4.6.2+ codebases alongside .NET 8 / 9 / 10.

How do I add a model that is not in the catalog?

Point new LM(new Uri("...")) at any GGUF on Hugging Face or your own storage. The catalog is a curated set, not a constraint.

Does data ever leave the box?

Never, unless you explicitly call an external tool like WebSearch. Model inference, RAG, OCR, speech, and embeddings all run on your hardware.

Can I fine-tune a model?

Yes. LM-Kit ships in-process LoRA training and full fine-tuning. The same NuGet that runs inference runs the training loop.

What about Microsoft.Extensions.AI and Semantic Kernel?

Both are first-class bridges. Existing IChatClient pipelines and SK connectors work unchanged with LM-Kit as the local backend.

Is there an MCP client?

Yes. The agent runtime includes Model Context Protocol clients. Built-in tools and external MCP tools coexist in the same registry under the same permission policy.

Ready when you are

Ship local AI this sprint.

Install the NuGet, load a model, ship the feature. The free Community Edition is enough to evaluate the entire surface.

Quickstart View on NuGet Pricing Talk to us