Chat service
Implements IChatCompletionService backed by an LM-Kit LM. Drop into any kernel.
The LM-Kit.NET.SemanticKernel package implements
Semantic Kernel's IChatCompletionService and memory
store on top of LM-Kit. Existing kernels, plugins, planners, prompt
functions, and memory connectors continue to work, with inference
running locally instead of against a hosted endpoint. Same kernel
builder, same plugin model, same prompt files.
Implements IChatCompletionService backed by an LM-Kit LM. Drop into any kernel.
SK memory backed by LM-Kit embeddings and the built-in vector store. Local recall, semantic queries.
Existing SK plugins, prompt functions, and planners run unchanged. The kernel does not know it is running locally.
Teams building on Semantic Kernel have invested in plugins, planners, and prompt-function libraries. Switching to a local stack should not mean rewriting that investment. The bridge implements the contracts Semantic Kernel expects: chat completion, embeddings, memory. Existing code keeps composing the same way; only the inference backend moves on-device.
IChatCompletionServiceRegister through Kernel.Builder. Plugins and planners that depend on IChatCompletionService resolve to the local implementation.
SK memory uses LM-Kit embeddings and the built-in vector store under the hood. Queries hit local indexes; nothing leaves the box.
Existing .skprompt files and prompt configs work as written. The bridge consumes them through the same pipeline SK uses.
Action planners, sequential planners, function-calling planners run on the local model. Tool-call signatures stay the same.
Register multiple chat services with different IDs. Route per request: local for sensitive data, cloud for bulk traffic. Both inside the same kernel.
SK's telemetry emits as it always has. Inference runs locally; the rest of the observability story is untouched.
Register LM-Kit chat and embedding services on an existing kernel, then import prompts and plugins as usual.
using Microsoft.SemanticKernel; using LMKit.Integrations.SemanticKernel; var model = LM.LoadFromModelID("qwen3.5:4b"); var kernel = Kernel.CreateBuilder() .AddLMKitChatCompletion(model) // IChatCompletionService .AddLMKitTextEmbeddingGeneration(embedder) // embedding service .Build(); // Existing plugins. No changes. kernel.ImportPluginFromType<CalendarPlugin>(); kernel.ImportPluginFromPromptDirectory(@"prompts/support"); // Invoke a prompt function. var answer = await kernel.InvokeAsync("Support", "DraftReply", new() { ["customer"] = customerName, ["issue"] = issueDescription });
Run an SK function-calling planner against the local model, with tools routed through registered kernel plugins.
// Existing function-calling planner. Tools resolve through the kernel. var settings = new OpenAIPromptExecutionSettings { ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions }; var result = await kernel.InvokePromptAsync( "Schedule a 30-minute review with Loic for tomorrow afternoon.", new(settings)); // Inference runs on the local model. Tool calls invoke kernel plugins.
Back Semantic Kernel memory with LM-Kit embeddings and the built-in vector store, with no separate database.
// SK memory backed by LM-Kit embeddings and the built-in vector store. var memory = new SemanticTextMemory( new LMKitMemoryStore(), embeddingService); await memory.SaveInformationAsync( collection: "docs", text: "The Q3 launch slipped two weeks because of a shipping delay.", id: "q3-launch"); var hits = memory.SearchAsync("docs", "why did Q3 slip?", limit: 3); await foreach (var hit in hits) Console.WriteLine(hit.Metadata.Text);
An application built on Semantic Kernel with a hosted backend switches to local inference for compliance or cost. Plugins, planners, prompt files all keep working.
Register two chat services. Route by request sensitivity, by quota, by latency target. Same kernel handles both.
Pre-existing SK plugin libraries (calendar, mail, knowledge base) become local-friendly the moment the chat service is the bridge.
Run end-to-end SK tests with a local model in CI. No quota, no network flakiness, no API key in pipelines.
Deliver SK-based applications into environments without internet access. The bridge plus a packaged model is enough.
Replace cloud memory connectors with the LM-Kit memory store. Existing memory-backed prompts retrieve from local indexes.
The other major .NET AI abstraction. Same idea, different surface. Pick the bridge that matches your existing codebase.
Beyond SK plugins, the native Tools API gives finer control over invocation, permissions, and streaming.
The LM-Kit vector store under the SK memory connector. Same primitive other LM-Kit RAG paths use.
For full-document workflows beyond text snippets, the native RAG primitives offer source attribution and adaptive ingestion.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.