LM-Kit.NET · The .NET SDK for local AI

The complete local AI runtime for .NET.

Seven capability pillars on one adaptive inference engine. Agents, document intelligence, vision, RAG, text analysis, speech, generation. One NuGet, zero cloud calls, full control of your data, your latency, and your bill.

Quickstart Free Download

NuGet: LM-Kit.NET Targets: .NET Standard 2.0 · .NET 8 / 9 / 10 Platforms: Windows · Linux · macOS

Agent.cs

using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;
using LMKit.Model;

var model = LM.LoadFromModelID("qwen3.5:9b");
var agent = Agent.CreateBuilder(model)
    .WithTools(t =>
    {
        t.Register(BuiltInTools.WebSearch);
        t.Register(BuiltInTools.CalcArithmetic);
    })
    .Build();

var result = await agent.RunAsync(
    "Current BTC price in EUR, rounded to the nearest euro?");

Console.WriteLine(result.Text);

ExtractInvoice.cs

using LMKit.Data;
using LMKit.Extraction;
using LMKit.Model;

var model     = LM.LoadFromModelID("qwen3.5:4b");
var extractor = new TextExtraction(model);

extractor.Elements = new List<TextExtractionElement>
{
    new("invoice_number", ElementType.String),
    new("invoice_date",   ElementType.Date),
    new("total_amount",   ElementType.Double),
};

extractor.SetContent(new Attachment("invoice.pdf"));
var data = extractor.Parse();

Console.WriteLine(data.Json);

VisionChat.cs

using LMKit.Data;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

var model = LM.LoadFromModelID("qwen3.5:4b");
var chat  = new MultiTurnConversation(model);

var message = new ChatHistory.Message(
    "Identify the missing capacitor.",
    new Attachment("schematic.png"));

var reply = await chat.SubmitAsync(message);
Console.WriteLine(reply.Text);

PdfChat.cs

using LMKit.Model;
using LMKit.Retrieval;

var chatModel  = LM.LoadFromModelID("qwen3.5:9b");
var embedModel = LM.LoadFromModelID("embeddinggemma-300m");

using var chat = new PdfChat(chatModel, embedModel);

await chat.LoadDocumentAsync("policies.pdf");
await chat.LoadDocumentAsync("pricing.pdf");

var answer = await chat.SubmitAsync(
    "What is our enterprise refund window?");

Console.WriteLine(answer.Response.Completion);

RAG.cs

using LMKit.Data;
using LMKit.Data.Storage.Qdrant;
using LMKit.Retrieval;

var store  = new QdrantEmbeddingStore(qdrantClient);
var engine = new RagEngine(embedModel);

engine.AddDataSource(await DataSource.LoadFromStoreAsync(store, "knowledge-base",    embedModel));
engine.AddDataSource(await DataSource.LoadFromStoreAsync(store, "support-tickets",   embedModel));
engine.AddDataSource(await DataSource.LoadFromStoreAsync(store, "engineering-wiki", embedModel));

var matches = await engine.FindMatchingPartitionsAsync(
    "What is our enterprise refund window?", topK: 8);

foreach (var m in matches)
    Console.WriteLine($"[{m.Score:P0}] {m.Section}: {m.Partition.Text}");

Ner.cs

using LMKit.Model;
using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("lmkit-tasks:4b-preview");
var ner   = new NamedEntityRecognition(model);

var entities = ner.Recognize(
    "Satya Nadella heads Microsoft from Redmond.");

foreach (var e in entities)
    Console.WriteLine($"[{e.Type}] {e.Value}");

Transcribe.cs

using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;

var model = LM.LoadFromModelID("whisper-large-turbo3");
var stt   = new SpeechToText(model)
{
    EnableVoiceActivityDetection = true,
    SuppressHallucinations       = true,
};

stt.OnNewSegment += (_, e) =>
    Console.WriteLine($"[{e.Segment.Start:mm\\:ss}] {e.Segment.Text}");

await stt.TranscribeAsync(new WaveFile("meeting.wav"));

StreamingChat.cs

using LMKit.Model;
using LMKit.TextGeneration;

var model = LM.LoadFromModelID("gemma4:e4b");
var chat  = new MultiTurnConversation(model)
{
    SystemPrompt = "You answer in one sentence.",
};

chat.AfterTokenSampling += (_, e) =>
    Console.Write(e.Token.Text);

await chat.SubmitAsync(
    "What ports does Kestrel bind by default?");

What ships in the box

One package. The whole AI stack.

LM-Kit.NET is the complete in-process AI runtime for .NET. No Python sidecar, no Docker, no HTTP service. The same NuGet that loads an LLM also runs OCR, speech-to-text, vision chat, structured extraction, agents with tools, RAG pipelines, classifiers, and embeddings.

Models

100+

Pre-configured cards plus any GGUF from Hugging Face.

Pillars

7

Agents, Docs, Vision, RAG, Text, Speech, Generation.

Built-in tools

8 categories

Atomic, security-first tools. Constantly growing catalog.

Backends

5

CPU, AVX2, CUDA 12/13, Vulkan, Metal. Same code path.

Cloud calls

0

Every model runs on your hardware. No data leaves the box.

External services

0

In-process SDK. No Python runtime, no Docker, no daemons.

Vector backends

5

In-memory, built-in file DB, Qdrant, PostgreSQL (pgvector), bring-your-own.

Speech languages

100+

Whisper-family STT with VAD and hallucination suppression.

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

RAG & Knowledge

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Core technology

Dynamic Sampling, the symbolic layer.

Underneath every LM-Kit call sits an adaptive inference engine that steers each token in real time using structural awareness, contextual signals, and grammar-aligned validation. It is the reason a 4B local model can match fine-tuned cloud behaviour on extraction, classification, function calling, and structured generation. Always on, model-agnostic, no retraining required.

Pillar A

Constrained output

Dynamic grammar guarantees JSON, schemas, and tool-call shapes always parse. A novel hybrid path runs roughly twice as fast as classical grammar sampling.

Pillar B

Adaptive guidance

Per-token contextual perplexity, semantic memory for codes and identifiers, structural rejection of malformed runs. Hallucinations drop, recoveries happen in place.

Pillar C

Model-agnostic

No architecture coupling, no fine-tuning, no per-model adapter. Drop in a new open-weight release and the layer keeps working from day one.

Open the Dynamic Sampling deep dive →

Built-in tools

Eight categories of atomic, security-first tools.

LM-Kit ships a growing catalog of agent tools across eight categories. Each tool performs exactly one operation, exposes rich metadata, and integrates with the permission policy system for enterprise-grade access control. One tool, one feature. Compose freely.

01 · Data

Parse and transform

JSON, XML, CSV, YAML, HTML, Markdown, databases, spreadsheets, QR codes. Predictable, typed I/O.

02 · Document

PDF, OCR, format conversion

PDF manipulation, image preprocessing, OCR, format conversion between Markdown, EML, HTML, DOCX.

03 · Text

String operations

Diff, regex, templating, encoding, slugification, fuzzy matching, phonetics. The stuff prompts cannot do.

04 · Numeric

Compute primitives

Calculator, unit conversion, statistics, financial math, random, expression evaluation.

05 · Security

Hashing, JWT, validation

Hashing, encryption, JWT, validation, password generation, checksums. Audit-friendly defaults.

06 · Utility

Date, URL, locale, MIME

Date and time, cron, URLs, colors, locales, MIME types, paths, scheduling, time zones.

07 · IO

Filesystem, processes, files

File system, process execution, compression, clipboard, environment, file watching.

08 · Net

HTTP, web search, RSS

HTTP verbs, FTP, web search (DuckDuckGo, Brave, Tavily, Serper, SearXNG), SMTP, RSS feeds, diagnostics.

Permission policies

Every tool implements IToolMetadata with explicit risk level, side effect kind (LocalRead / LocalWrite / NetworkRead / NetworkWrite / Irreversible), default approval mode, and read-only flag. Pair with ToolPermissionPolicy for centralized allow/deny rules, wildcard patterns, and approval gates. Production-safe out of the box.

Model catalog

Open-weight models, curated and ready.

A constantly growing catalog of open-weight models covering text generation, vision, embeddings, OCR, and speech. Load any of them by ID, or point at any GGUF on Hugging Face.

Text LLMs

From 0.6B to 30B-class MoE

Gemma 3 (1B / 4B / 12B / 27B), Qwen 3 (0.6B to 14B), Llama 3.1, Phi-4, GLM 4.7 Flash, GPT-OSS 20B. Chat, reasoning, tool use, multilingual.

Vision

VLMs for chat and OCR

Qwen 2/2.5/3 VL, Gemma 3 VL, GLM-V 4.6 Flash. Dedicated OCR via PaddleOCR-VL and GLM-OCR. Drop an Attachment into any conversation.

Embeddings

Text and image vectors

EmbeddingGemma 300M, Qwen3-Embedding 0.6B / 4B / 8B, BGE-M3, Nomic-Embed-Text and Nomic-Embed-Vision. Multilingual, cross-modal.

Speech

Whisper family STT

Tiny through Large V3 and Large Turbo V3. 100+ languages, real-time translation to English, Voice Activity Detection.

Task models

Purpose-trained classifiers

Sentiment-analysis 2.0 and lmkit-tasks variants for fast on-device classification, NER, and PII work.

Bring-your-own

Any GGUF, any URI

Point new LM(uri) at any GGUF on Hugging Face or your own storage. The catalog is curation, not constraint.

Browse the catalog Pick a model

Document intelligence & RAG

Read every PDF, ground every answer, cite every source.

A complete document understanding and retrieval stack. PDF text and table extraction, OCR that beats commercial engines, layout-aware parsing, typed field extraction, document splitting, multi-document chat, and RAG pipelines with page-level citations.

Doc chat

Multi-document conversation

PdfChat loads several documents into one conversation. Multi-turn, cited answers, automatic context management.

Chat with PDF

Document RAG

Page-level citations

DocumentRag with OCR plus VLM document understanding. Multi-page processing, auto-detection of text vs scanned, source references on every answer.

Document RAG

Extraction

Typed fields from messy documents

Schema-driven extraction with grammar-constrained generation. Invoices, contracts, forms, ID cards. Output parses every time.

Structured extraction

Splitting

Detect boundaries in 1000-page batches

Intelligent document splitting finds logical boundaries in long PDFs. Cut a batch into per-document files automatically.

Document splitting

OCR

Native and VLM-driven OCR

CPU-efficient native engine plus VLM OCR (PaddleOCR-VL, GLM-OCR). SOTA benchmark accuracy, fast on a single core, no commercial licence to budget for.

OCR

RAG pipeline

RagEngine, RagChat, HyDE, multi-query

End-to-end RAG: vector and BM25 retrieval, hybrid search, MMR diversity, cross-encoder reranking, multiple query generation strategies.

RAG & knowledge

Vector storage

Four pluggable backends

In-memory, built-in file DB, Qdrant connector, PostgreSQL (pgvector) connector, bring-your-own via IVectorStore. Switch backend without changing code.

Vector database

Embeddings

Text and image, batched

Unified Embedder for text and images, batch-friendly, async-first. Cross-modal similarity out of the box.

Embeddings

Agents & orchestration

Production agent patterns, not toy demos.

A strongly-typed agent class with system prompts, planning strategies, tool registries, persistent memory, MCP clients, multi-agent orchestration, and production-grade observability. Compose freely, ship confidently.

Agent runtime

Strongly-typed agents

First-class Agent with builders, identity, tools, planning strategy, retry policies, streaming, and event-level observability.

AI Agents

Planning

ReAct, single-call, plan-then-execute

Pick the planning strategy per agent. Bounded steps, recoverable errors, deterministic by configuration.

Agent reasoning

Orchestration

Pipeline, parallel, supervisor

Sequential pipelines, parallel fan-out, supervisor delegation. Compose agents into deterministic graphs.

Multi-agent workflows

Graph

Graph orchestration

Sequential, Parallel, and Conditional nodes. Thread-safe context, channel-based streaming, compose any workflow shape.

Graph orchestration

Memory

Persistent agent memory

RAG-backed long-term memory that survives sessions. Recall, summarisation, semantic retrieval out of the box.

Agent memory

Skills

Reusable SKILL.md files

Drop SKILL.md files into a folder. The registry picks them up. Versionable, testable, swappable.

Agent skills

MCP

Model Context Protocol

Connect to any MCP-compatible server. The built-in tool catalog and external MCP tools coexist in the same registry.

MCP integration

Guardrails

Permissions and approval gates

Centralized policy: allow/deny patterns, max risk level, approval requirements per tool. Audit-friendly, production-safe.

Permissions

Observability

Trace every step

Events for plan, tool call, model decision, retry, error. Integrate with your existing telemetry pipeline.

Observability

Beyond LLMs

Vision, speech, and content intelligence, same runtime.

LM-Kit.NET is not just an LLM library. The same NuGet ships state-of-the-art vision-language models, speech-to-text with Voice Activity Detection, sentiment and emotion classifiers, NER and PII extractors, multilingual translation, and grammar-aware text correction.

Vision

Chat with images

Drop an Attachment into MultiTurnConversation. Multiple images per turn, streaming tokens, tool calls. Image understanding, classification, labeling, background removal.

Vision & multimodal

Speech

Transcription, translation, dictation

Whisper-family STT. 100+ languages, real-time translation, Voice Activity Detection, hallucination suppression, voice-command dictation formatting.

Speech & audio

Sentiment

Sentiment & emotion

Multilingual polarity and emotion classification with neutral support. LoRA-fine-tunable on your domain.

Sentiment

NER

Custom entity recognition

Built-in entity types plus custom EntityDefinition. Character offsets on every mention for precise downstream use.

NER

PII

PII detection & redaction

Emails, phones, IDs, addresses, custom domain labels. Batch processing for high-volume document scanning.

PII extraction

Generation

Conversations, summaries, rewriting

Single-turn, multi-turn, and stateless primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text generation

Production-grade

The boring parts that ship.

Beyond the headline features, LM-Kit.NET ships the production controls a team needs once a prototype meets real workloads: memory hibernation, encrypted model loading, multi-GPU split, LoRA, fine-tuning, quantization, and sampling levers.

Hibernation

Context hibernation

Serialize an entire conversation context (KV-cache and all session state) to disk. Free RAM/VRAM on demand. Rehydrate transparently on the next call.

Context hibernation

Encryption

Encrypted model loading

Stream-decrypted GGUF. No plaintext model files on disk at any point. Ship the model, keep the secret.

Encrypted models

Multi-GPU

Multi-GPU and tensor overrides

Split big models across multiple GPUs. Per-tensor placement overrides for fine-grained control on heterogeneous hardware.

Multi-GPU

Sampling

Temperature, top-k, top-p, grammar

Per-turn sampling levers composed on top of Dynamic Sampling. Switch from creative to deterministic without rebuilding the conversation.

Sampling controls

Fine-tuning

In-process LLM fine-tuning

Bring data, build a LoRA, or fine-tune the full model. The training loop runs in the same NuGet that runs inference.

LLM fine-tuning

LoRA

LoRA adapter loading

Compose multiple LoRA adapters at load time. Swap personas, domains, or tasks without reloading the base model.

LoRA integration

Quantization

In-process quantization

Quantize models to fit the hardware budget without leaving the SDK. From Q8 down to Q2_K, with knobs for precision-critical layers.

Model quantization

Backends

CPU, AVX2, CUDA, Vulkan, Metal

Same C# call dispatches to the fastest available backend. Deploy once, run on anything from a Raspberry Pi to an 8-GPU server.

Hardware backends

Integrations

Drop into the Microsoft AI plumbing you already use.

LM-Kit.NET plugs into the Microsoft AI ecosystem without rewriting your existing orchestration code. Existing IChatClient and Semantic Kernel pipelines keep working with a local model behind them.

Bridge

Microsoft.Extensions.AI

Stream tokens, call functions, embed text. Every IChatClient, IEmbeddingGenerator, and middleware-aware abstraction you wrote against the official package keeps working with LM-Kit as the local backend.

Extensions.AI bridge

Bridge

Semantic Kernel

Use LM-Kit.NET as a Semantic Kernel connector. Plug local chat completion, embeddings, and function-calling into existing SK plans, planners, and skills.

Semantic Kernel bridge

Protocol

Model Context Protocol

MCP clients ship in the agent runtime. Connect to any MCP server; built-in tools and external MCP tools coexist in a unified registry with the same permission policy.

MCP integration

Performance & hardware

One code path, five backends.

The same LM.LoadFromModelID call dispatches to the fastest backend available on the host. No environment branches, no per-platform builds. Deploy once, run on every developer machine and every production target.

CUDA

NVIDIA (CUDA 12 / 13)

Tensor-core acceleration with multi-GPU split for big models. The dependency package is pulled in transitively; one package, one switch.

Vulkan

Cross-vendor GPUs

AMD, Intel, and any GPU with a Vulkan driver. One backend, every vendor, no per-platform code.

Metal

Apple Silicon

Native Metal on M1, M2, M3, and beyond. Unified-memory routing for laptops; no extra runtime.

CPU / AVX2

Always-on fallback

Optimised SSE 4.1/4.2 and AVX2 kernels. The same model runs without a GPU; smaller models stay fast on a laptop CPU.

Runs where your code already runs

Cross-platform by default, not by accident.

Same NuGet, same API surface, every supported target. Targets .NET Standard 2.0 so it slots into existing .NET Framework 4.6.2+ codebases too.

Runtime: .NET Standard 2.0 · .NET 8 · 9 · 10
Operating systems: Windows 10+ · Linux x64 & ARM64 · macOS Universal
GPU acceleration: CUDA 12 · CUDA 13 · Vulkan · Metal
CPU acceleration: SSE 4.1 / 4.2 · AVX · AVX2
Models: Gemma 3, Qwen 3, Llama, Phi-4, GLM 4.7, GPT OSS, Whisper, embeddings, OCR, VLMs
Storage: In-memory · built-in file DB · Qdrant · pgvector · bring-your-own
Languages: C# · F# · VB.NET · any .NET-compatible language
Bridges: Microsoft.Extensions.AI · Semantic Kernel · MCP clients

The decision

Pick the right tradeoff for your stack.

Three honest comparisons with the alternatives a .NET team actually weighs. No straw men, no invented numbers.

Compare

vs. wrapping cloud APIs

No per-token bill. No data leaving your network. Latency you can predict. Inference cost equals the cost of compute you already own. Works offline by design.

Local vs Cloud, in depth

Compare

vs. a Python service

No FastAPI sidecar, no HTTP shim, no two-runtime tax. LM-Kit links into your .NET process, picks up the right native acceleration, and stays out of the way. Async/await all the way down.

LM-Kit vs LangChain

Compare

vs. other .NET local stacks

Most ship inference only. LM-Kit ships the full runtime: agents, RAG, OCR, structured extraction, speech, vision, classifiers, embeddings, plus the symbolic layer that makes small models behave.

LM-Kit vs LlamaSharp

Where teams ship LM-Kit

Workflows where local AI actually wins.

LM-Kit.NET is built for the .NET applications that cannot send data to a cloud endpoint, cannot rely on a network connection, or cannot afford per-token costs at scale.

Regulated

Healthcare, finance, government

HIPAA, GDPR, and data-residency requirements satisfied by design. Patient records, claims, and citizen data stay on the box.

Enterprise

Internal copilots

RAG over policies, runbooks, wikis, contracts, support tickets. Cited answers without sending source material to a third party.

Edge

Offline and air-gapped

Field laptops, rugged kiosks, vehicle telemetry, manufacturing floors. Inference works without connectivity.

Cost

High-volume workloads

Batch document processing, classification pipelines, customer support analysis. Marginal cost is compute, not token bills.

Product

Shipping AI in a desktop app

Wrap LM-Kit in a Windows/macOS desktop product. Customers run inference on their own hardware. No backend to operate.

Speed

Latency-critical UIs

Voice assistants, code editors, interactive analysis. Local inference removes the round-trip; first-token times in milliseconds.

Install

Zero dependencies. One NuGet.

Add a single package to your .csproj. The runtime, native binaries for every supported backend, and the entire AI stack come with it. No Python runtime, no Docker, no daemons.

terminal

# 1. Add LM-Kit.NET to your project
$ dotnet add package LM-Kit.NET

# 2. (Optional) Plug in a GPU backend
#    The dependency package is pulled in transitively.
$ dotnet add package LM-Kit.NET.Backend.Cuda13.Windows
# or:
$ dotnet add package LM-Kit.NET.Backend.Cuda13.Linux

Program.cs

using LMKit.Model;
using LMKit.TextGeneration;

var model = LM.LoadFromModelID("qwen3.5:4b");
var chat  = new MultiTurnConversation(model);

var reply = await chat.SubmitAsync("Hello, LM-Kit.");
Console.WriteLine(reply.Text);

Open the Quickstart API reference Code samples GitHub samples

Licensing

Free for builders. Commercial when you ship.

Run the full SDK on your own hardware at no cost. Buy a commercial license when LM-Kit becomes part of a product you sell.

Community

Freeforever

Full SDK access for any company or individual. Build and deploy non-commercial applications, or evaluate LM-Kit end to end before shipping.

Full feature surface; no capability gates
Deployment: development, internal tools, OSS
Platforms: Windows, Linux, macOS
Community support on GitHub

Community Edition

Professional

Customper project

For products that ship LM-Kit to customers. Pricing scaled to deployment size and value. Includes dedicated support and direct roadmap input.

Commercial redistribution rights
Dedicated technical support
Unlimited developers and end users
Direct relationship with the engineering team

Talk to us

FAQ

Questions we hear often.

Does LM-Kit require a Python runtime?

No. Everything runs in-process inside your .NET application. No Python, no Docker, no daemons, no HTTP service. One NuGet, one process.

Which GPUs work out of the box?

NVIDIA via CUDA 12/13, Apple Silicon via Metal, AMD and Intel via Vulkan. CPU and AVX2 act as a fallback. The same C# code dispatches to whichever backend is fastest on the host.

Can I run on .NET Framework?

Yes. LM-Kit targets .NET Standard 2.0, so it slots into .NET Framework 4.6.2+ codebases alongside .NET 8 / 9 / 10.

How do I add a model that is not in the catalog?

Point new LM(new Uri("...")) at any GGUF on Hugging Face or your own storage. The catalog is a curated set, not a constraint.

Does data ever leave the box?

Never, unless you explicitly call an external tool like WebSearch. Model inference, RAG, OCR, speech, and embeddings all run on your hardware.

Can I fine-tune a model?

Yes. LM-Kit ships in-process LoRA training and full fine-tuning. The same NuGet that runs inference runs the training loop.

What about Microsoft.Extensions.AI and Semantic Kernel?

Both are first-class bridges. Existing IChatClient pipelines and SK connectors work unchanged with LM-Kit as the local backend.

Is there an MCP client?

Yes. The agent runtime includes Model Context Protocol clients. Built-in tools and external MCP tools coexist in the same registry under the same permission policy.

Ready when you are

Ship local AI this sprint.

Install the NuGet, load a model, ship the feature. The free Community Edition is enough to evaluate the entire surface.

Quickstart View on NuGet Pricing Talk to us