On-device .NET agents

Create AI Agents in C#

Prototype and ship production-ready agents with conversation, memory, tool calling, and prebuilt skills. Low latency and full data privacy by default.

Free SDK On device No signup

AI Agents for .NET that Think, Plan, and Act

Autonomous and multimodal on device

Build autonomous, multimodal agents that reason over context, orchestrate workflows, and use prebuilt NLP skills to deliver outcomes. Entirely on device. LM-Kit gives your app an agent brain for conversation and reasoning, a library of high-level skills for common industry tasks, and safe hands through function and tool calling. The result is an agent that can understand, decide, and do real work with low latency and full data privacy.

Agent brain

Conversational reasoning with memory, planning, and structured outputs for dependable interactions.

Skills library

Prebuilt capabilities for industry tasks such as data extraction, classification, PII detection, and RAG.

Safe hands

Function and tool calling with policies, human-in-the-loop, and full auditability for controlled execution.

Developer resources

Get straight to work with guides, full API docs, runnable samples, and release notes.

What is an AI Agent in a .NET application

From perception to action

An AI agent is software that can perceive input, reason about goals, and act through tools. With LM-Kit, agents do more than chat. They think, plan steps, call your functions and LM-Kit skills, verify results, and continue until the task is done. One .NET SDK covers multi-turn dialogue, memory, tool calling, structured outputs, embeddings, retrieval, and speech or vision input.

Perceive

Understand user intent from text, images, or audio and gather relevant context from history or retrieval.

Reason

Plan steps, choose tools, and decide how to structure outputs while keeping policies and constraints in mind.

Act

Call your functions and LM-Kit skills, verify results, iterate, and complete tasks with auditability and guardrails.

Multi-turn dialogue Memory Tool calling Structured outputs Embeddings Retrieval Speech input Vision input
Get Started Today
Free SDK On device No signup

Why teams choose LM-Kit for agents

Production advantages in one product

LM-Kit bundles the agent brain, skills, and inference engine in a single SDK. You get on-device privacy and predictable latency, strong guardrails and observability, optimized performance, and ongoing research improvements that steadily raise retrieval and understanding quality.

Unified product.

Conversation, memory, RAG, embeddings, vector search, speech to text, image understanding, generation control, and tool calling are provided in one SDK. No patchwork of libraries.

On-device privacy and speed.

Models run locally. Sensitive data stays inside your environment. Latency is low and predictable.

Autonomy with guardrails.

Human-in-the-loop, per-turn limits, loop prevention, and audit trails keep execution controllable and compliant.

Complete and highly optimized inference.

A single runtime loads, configures, and runs LLMs, VLMs, embeddings, and STT. Optimized kernels, streaming generation, batching, quantization options, and hardware acceleration are included so you get strong performance without extra infrastructure.

Skills plus workflows.

Build agents that orchestrate your APIs and also solve tasks using LM-Kit’s prebuilt capabilities such as data extraction, document classification, and PII detection.

Research-backed improvements.

We continuously refine RAG, memory, and document understanding using a strict methodology. Expect ongoing upgrades to retrieval quality, chunking, reranking, and long-context recall.

Get Started Today
Free SDK On device No signup

Multimodal by design

Agents can work with text, images, and audio

Typical flows include the following real-world patterns.

Parse and classify documents

Identify document types from images or PDFs, extract structured data from invoices, receipts, forms, and contracts with layout understanding.

Email and attachment processing

Read email content, analyze attached documents, extract key information, and route actions based on content and sender.

Transcribe voice notes and meetings

Convert audio to text, search knowledge bases, draft responses with sources, and generate meeting summaries with action items.

Extract data from scanned forms

Read handwritten or printed forms, understand logical and physical layout, validate fields, and generate structured records for downstream systems.

Visual inspection and analysis

Classify image content, detect defects or anomalies, measure dimensions, and populate structured reports for quality control workflows.

Multi-page document understanding

Process complex documents across multiple pages, maintain context, extract tables and key-value pairs, and assemble complete data records.

Presentation and slide analysis

Extract content from slides, understand visual hierarchy, combine with audio transcripts, and generate comprehensive meeting notes.

Multilingual content processing

Detect language from text or images, translate documents while preserving layout, extract data across scripts, and deliver localized outputs.

Damage assessment from photos

Analyze property or vehicle damage images, estimate repair costs, extract metadata, and generate claim documentation automatically.

Get Started Today
Free SDK On device No signup

Unicode in, understanding out

Agents can work across languages and scripts

Language detection, multilingual embeddings, on-device translation, and a Unicode-first NLP stack across the SDK remove locale friction while keeping data private.

Unicode-first NLP stack

UTF-8 throughout with robust tokenization, normalization, and sentence boundaries across Latin, CJK, and RTL scripts. Prevents mojibake and keeps NER, PII, and extraction accurate in any language.

Multilingual embeddings & RAG

One vector space across languages for search, reranking, and grounding. Query in one language and reason over content in another with consistent relevance.

Language detection & on-device translation

Auto-detects language per turn and routes prompts or models accordingly. Translates on device for predictable latency and full privacy.

Get Started Today
Free SDK On device No signup

Agents that remember what matters

Smart Memories for .NET agents

Smart Memories keeps conversations coherent with persistent, RAG backed context that is recalled only when relevant. AgentMemory integrates with MultiTurnConversation, understands memory types, and injects facts only when they are not already in short term context (KV cache). Built for on device privacy, low latency, and full auditability.

Typed memory

Store and recall semantic, episodic, and procedural knowledge for facts, events, and how to steps across sessions.

Context aware recall

Recall is KV cache aware. Smart Memories injects only missing context to keep outputs grounded without repetition.

Event hooks and control

Use the MemoryRecall event to inspect text, metadata, and memory type. Add a prefix or cancel injection before it merges.

Smart filtering

Exclude entire collections or specific sections with DataFilter. Apply domain specific or compliance rules using metadata.

Persist and share

Serialize and deserialize memory snapshots for reuse. Restore AgentMemory from disk to resume long term context instantly.

On device and auditable

Run Smart Memories on device for privacy and predictable latency. Works with human in the loop review and policy controls.

Get Started Today
Free SDK On device No signup

Agents that act with tools (MCP-ready)

Tool calling and MCP for .NET agents

Define and host tools locally in your application, then let agents call them with JSON schema validation and typed results. Or import external catalogs via MCP to reach remote services when needed. LM-Kit routes calls, correlates results, and feeds them back into the conversation. Everything is validated on device with guardrails and full auditability.

Three ways to add tools

ITool for full control and local hosting, [LMFunction] for quick local binding, or import external catalogs via MCP. One registry, one runtime.

All execution modes

Simple, Multiple, Parallel, and Parallel + Multiple. Choose strict sequencing or safe concurrency when tools are idempotent.

Policy and guardrails

Per turn ToolCallPolicy: allow, require, forbid, or pick a specific tool. Set MaxCallsPerTurn and optionally enable AllowParallelCalls.

MCP catalogs

Use McpClient to discover tools/list and call tools/call from external servers. Validate schemas locally and keep control over execution.

Typed contracts

Each tool declares Name, Description, and InputSchema in JSON Schema. Arguments are validated before execution.

Observability built in

Every call is correlated from ToolCall to ToolCallResult with stable IDs, arguments, timings, and clear success or error states.

On device privacy and speed

Local tools run inside your environment. Deterministic behavior and predictable latency without sending data to the cloud.

Model aware

Choose a model that emits tool calls and confirm with model.HasToolCalls. One runtime across Mistral, LLaMA, Qwen, Granite, GPT-OSS, and more.

Human in the loop

Approve or block actions with BeforeToolInvocation and audit results via AfterToolInvocation. Adjust sampling and memory injection with events.

Get Started Today
Free SDK On device No signup

Skills you can plug in as tools

Expose powerful skills in minutes

Use these high level capabilities on their own or wire them as callable tools inside agent workflows.

Your agent can chain these skills with your own APIs. Think of the flow as: reason about the request, pick the right skill, extract or classify, decide next step, act again, and return a grounded result.

Get Started Today
Free SDK On device No signup

Autonomy and workflows

From intent to outcome with safe, visible automation

Let agents plan steps, chain typed tools, and request human approval when needed. Built-in observability and policy controls keep every run auditable and under control.

Planning and multi-step execution

Chain tools, ground each step in results, and progress from intent to outcome.

Model-aware function calling

Structured JSON arguments with validation so tools receive the right types and values.

Dynamic registration of tools

Add or remove capabilities at runtime with typed schemas and predictable failure modes.

Human-in-the-loop review

Optionally intercept and approve sensitive actions before execution.

Observability built in

Stable identifiers with captured arguments, results, and timings for audits and debugging.

Policy controls

Apply per-turn limits, prevent loops, and enforce safety rules across executions.

Get Started Today
Free SDK On device No signup

The inference system inside LM-Kit

LM-Kit includes a complete inference stack so you do not need separate runtimes.

One engine

Run text generation, vision-language inference, embeddings, and speech to text from the same runtime.

Hardware acceleration

Scheduler and execution pool with adaptive caching for reuse of model state, backed by GPU runtimes including CUDA, Vulkan, and Metal for low-latency, high-throughput inference.

Quantization and precision

Choose formats and precision levels that match your hardware profile and cost targets.

Consistent APIs

Unified configuration across models makes deployment and scaling simpler for your team.

This all-in-one approach means fewer moving parts, faster prototypes, and simpler production operations.

Get Started Today
Free SDK On device No signup

Memory, RAG, and document understanding

Grounded answers with memory, retrieval, and document understanding

Keep context across turns, ground generation on your data, and turn documents into clean structured records that downstream systems trust.

AgentMemory

Keeps facts, preferences, and decisions that matter. Agents recall what to reuse, which reduces repetition and improves relevance.

Retrieval-Augmented Generation

Provides factual grounding on your data. We keep improving chunking strategies, reranking, hybrid search, and long-context techniques.

Document understanding

Combines classification, entity detection, and extraction to produce clean structured outputs that downstream systems can trust.

Get Started Today
Free SDK On device No signup

Quick start in C#

Below is a minimal pattern that combines conversation, memory, and tool calling. It also shows how to expose LM-Kit skills as callable tools.

				
					using LMKit;
using LMKit.TextGeneration;
using LMKit.Agents.Tools;

var model = new LMKit.Model.LM("models/your-model.gguf");
var agent = new MultiTurnConversation(model)
{
    SystemPrompt = "You are an operations agent. When possible, return JSON with fields: action, data, and notes."
};

// Wrap LM-Kit high-level capabilities as tools
public static class DocSkills
{
    [LMFunction("classify_document", "Classify a document into predefined categories")]
    public static string Classify(string text)
    {
        // Example: call LM-Kit classification capability
        return "Invoice";
    }

    [LMFunction("extract_invoice", "Extract key fields from an invoice")]
    public static object ExtractInvoice(string filePath)
    {
        // Example: call LM-Kit data extraction capability
        return new { number = "INV-2025-001", total = 1234.56, supplier = "Acme Corp" };
    }

    [LMFunction("redact_pii", "Detect and redact PII from text")]
    public static string RedactPii(string input)
    {
        // Example: call LM-Kit PII extraction and redaction
        return input;
    }
}

// Register the tools
agent.Tools.Register(LMFunctionToolBinder.FromType<DocSkills>());

// Submit a task. The agent will plan, call tools, and return a structured answer.
var user = "Read this invoice at C:\\docs\\invoice.pdf, extract fields, redact PII in the notes, and summarize next steps.";
var result = agent.Submit(user);
Console.WriteLine(result.Completion);

				
			
Also available
  • SingleTurnConversation for fast Q&A
  • AgentMemory for persistent context
  • Grammar and JSON schema constraints for structured output
  • Sampling controls such as Mirostat, logit bias, and repetition penalties for quality and style

Other examples

The Multi-Turn Chat Demo exemplifies how the LM-Kit.NET SDK enables interactive, multi-turn conversations with AI models. This sample demonstrates how to integrate various large language models (LLMs) into a .NET application to create a chatbot that can engage in extended dialogues, maintaining context across multiple exchanges.

The Multi-Turn Chat with Vision demo enhances the LM-Kit.NET SDK by enabling visual attachments in multi-turn conversations. It demonstrates integrating both Large Language Models (LLMs) and Small Language Models (SLMs) into .NET applications, offering flexibility to run on devices from powerful servers to smaller edge devices. The demo supports maintaining text-based conversational context while generating image-driven insights.

This demo shows how to build a multi-turn chatbot in .NET that uses persistent memory via the AgentMemory class to recall context, ensuring more accurate and context-aware responses even on low-cost CPUs. By leveraging a tiny Small Language Model, it stores key facts during conversations to reduce errors and hallucinations, making it ideal for applications like customer support and information services.

 

The Multi-Turn Chat with Chat History Guidance Demo illustrates how to utilize the LM-Kit.NET SDK to build an interactive chatbot that retains context over multiple exchanges by leveraging chat history. This example showcases the integration of large language models (LLMs) into a .NET application to enable multi-turn conversations, with the chatbot being guided by the entire conversation history.

The Multi-Turn Chat with Coding Assistant Demo illustrates how to utilize the LM-Kit.NET SDK to build an interactive chatbot that assists with programming tasks. This example showcases the integration of large language models (LLMs) into a .NET application to enable multi-turn conversations, offering features like coding support, code analysis, and comment reviews.

The Multi-Turn Chat with Custom Sampling Demo illustrates how to utilize the LM-Kit.NET SDK to build an interactive chatbot featuring customized sampling strategies. This example highlights the integration of large language models (LLMs) into a .NET application to facilitate multi-turn conversations, employing advanced sampling techniques to influence the chatbot’s responses.

The Multi-Turn Chat with Persistent Session demonstration highlights how to employ the LM-Kit.NET SDK to create a chat application that retains context throughout multiple exchanges. This application records chat sessions to a file, allowing users to seamlessly continue their conversations when they reopen the application.

The Function Calling Demo shows how to use the LM-Kit.NET SDK to enable language models to call predefined functions based on user input. It demonstrates integrating function-calling capabilities so the model can perform tasks like retrieving external information and returning structured data. This is ideal for creating AI applications that execute dynamic tasks from natural language prompts.

Function Calling Demo
Function Calling Demo

This demo is a .NET console sample for LM-Kit.NET that demonstrates agentic tool-calling (typed ITool + JSON Schema), multi-tool chaining with memory, and ToolChoice safety. It includes currency (ECB/Frankfurter), weather (Open-Meteo), and offline unit conversions, shows progress/stats and /reset|/continue|/regenerate, and runs with several local models or a custom URI.

The Single Turn Chat Demo illustrates how to employ the LM-Kit.NET SDK for single-turn conversational interactions. This example emphasizes the ease and effectiveness of using large language models (LLMs) to handle individual questions or statements, making it ideal for quick and informative exchanges.

Hands-on Samples

Open the official LM-Kit.NET samples repo with ready-to-run C# demos: console apps, ASP.NET Minimal APIs, RAG pipelines, structured data extraction, JSON-constrained generation, Dynamic Sampling, reranking, and evaluation utilities. Clone, build, run locally.

Run a Demo Now

Common use cases

Real workflows where LM-Kit agents deliver value

From back office to front line, these patterns show how autonomy, multimodality, and on-device privacy combine to ship outcomes.

Accounts payable automation

  • Classify documents, extract invoice fields, validate totals
  • Post to ERP and request missing information

Claims intake

  • Ingest emails and PDFs, redact PII, summarize key facts
  • File a case with a structured payload

Customer support copilots

  • Retrieve knowledge with RAG and propose next actions
  • Call remediation tools and log outcomes

Compliance automation

  • Run NER and PII over contracts and supporting documents
  • Generate obligation summaries and produce JSON checklists

Contract analysis

  • Extract terms, obligations, and key dates from agreements
  • Flag non-standard clauses and compliance risks

HR automation

  • Screen resumes, extract skills, rank candidates
  • Onboard new hires with structured task lists

Field service automation

  • Diagnose issues from technician reports and images
  • Route work orders and suggest parts inventory

Content moderation

  • Classify user-generated content and detect policy violations
  • Escalate edge cases with context for human review
Get Started Today
Free SDK On device No signup
Ready when you are

Ship on-device .NET agents with LM-Kit

One SDK covers conversation, Smart Memories, RAG, tool calling, speech, and vision. Get predictable latency and full data privacy with a local runtime.

Free SDK On device No signup