🧰 Meet LM-Kit Tool Calling for Local Agents

October 17, 2025

TL;DR: LM-Kit .NET SDK now supports state-of-the-art tool calling for building AI agents in C#. Create on-device agents that discover, invoke, and chain tools with structured JSON schemas, safety policies, and human-in-the-loop controls, all running locally with full privacy. Works with thousands of local models from Mistral, LLaMA, Qwen, Granite, GPT-OSS, and more. Supports all tool calling modes: simple function, multiple function, parallel function, and parallel multiple function. No cloud dependencies, no API costs, complete control over your agent workflows.

What Are Tools in Agentic AI?

Tools are a fundamental part of agentic AI, alongside these core capabilities:

memory tools planning delegation reflection

While language models excel at understanding and generating text, tools extend their abilities by letting them interact with the real world: searching the web for current information, executing code for calculations, accessing databases, reading files, or connecting to external services through APIs. Think of tools as the hands and eyes of an AI agent. They transform a conversational system into an agent that can accomplish tasks by bridging the gap between reasoning and action. When an agent needs to check the weather, analyze a spreadsheet, or send an email, it invokes the appropriate tool, receives the result, and incorporates that information into its response. This moves AI beyond pure text generation toward practical, real-world problem solving.

Interested in how agents retain and use context over time? Explore our deep dive on agent memory.

Read the article

Why Local Agents Have Been Hard

Building AI agents that can actually do things locally has been surprisingly hard. You need:

Models that understand when and how to call external functions
Privacy without sending data to the cloud
A runtime that can parse tool calls, validate arguments, and inject results
Model-specific flows because each model has different tool calling formats and interaction patterns, requiring custom logic for interception, result injection, and action ordering
Safety controls to prevent infinite loops and runaway costs
Clear observability so you know what your agent is doing

Until now, most agentic frameworks forced a choice: powerful cloud-based agents with latency and privacy concerns, or limited local models without proper tool support. Today, that changes.

Why Tool Calling Changes Everything

With LM-Kit's new tool calling capabilities, your local agents can:

Ground answers in real data. No more hallucinated weather forecasts or exchange rates. Agents fetch actual API responses and can cite sources.
Chain complex workflows. For example: check the weather, convert temperature to the user's preferred units, then suggest activities. All in one conversational turn.
Maintain full privacy. Everything runs on-device. Your users' queries, tool arguments, and results never leave their machines.
Stay deterministic and safe. Typed schemas, validated inputs, policy controls, and approval hooks prevent agents from going rogue.
Scale with your domain. Add business APIs, internal databases, or external MCP catalogs as tools. The model learns to use them from descriptions and schemas alone.

What’s new at a glance

State-of-the-art tool calling, right in chatbot flows. Models decide when to call tools, pass structured JSON args, and use results to answer users accurately.
Dedicated flow support across model families like Mistral, GPT-OSS, Qwen, Granite, LLaMA, and more — all via one runtime.
Three ways to add tools
- Implement ITool
- Annotate methods with [LMFunction]
- Import catalogs from MCP servers
Unified API that runs local SLMs with per-turn policy, guardrails, and events for human-in-the-loop and observability at every stage.
All function calling modes supported. Simple Function, Multiple Function, Parallel Function, and Parallel Multiple Function — choose strict sequencing or safe parallelism.
Model-aware tool call flow. Modern SLMs emit structured tool calls. LM-Kit parses calls, routes them to your tools, and feeds results back with correlation and clear result types for a reliable inference path.

How It Works: Getting Started

Here’s a complete working example in under 20 lines:

				
					using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Agents.Tools;
using System.Text.Json;

// 1) Load a local model from the catalog
var model = LM.LoadFromModelID("gptoss:20b"); // OpenAI GPT-OSS 20B
// Optional: confirm tool-calling capability
if (!model.HasToolCalls) { /* choose a different model or fallback */ }

// 2) Create a multi-turn conversation
using var chat = new MultiTurnConversation(model);

// 3) Register tools (see three options below)
chat.Tools.Register(new WeatherTool());

// 4) Shape the behavior per turn
chat.ToolPolicy.Choice = ToolChoice.Auto;      // let the model decide
chat.ToolPolicy.MaxCallsPerTurn = 3;           // guard against loops

// 5) Ask a question
var reply = chat.Submit("Plan my weekend and check the weather in Toulouse.");
Console.WriteLine(reply.Content);

The model catalog includes GPT-OSS and many other families. LM.LoadFromModelID lets you pull a named card like gptoss:20b. You can also check HasToolCalls before you rely on tools. See https://docs.lm-kit.com/lm-kit-net/guides/getting-started/model-catalog.html for details.

Try it now — GitHub sample

A production-ready console sample demonstrates multi-turn chat with tool calling (currency, weather, unit conversion), per-turn policies, progress feedback, and special commands. Jump to:

Create Multi-Turn Chatbot with Tools in .NET Applications → github.com/LM-Kit/lm-kit-net-samples

Three ways to add tools

🧩

1) Implement `ITool` (Full Control)

Best when you need clear contracts and custom validation.

This snippet demonstrates implementing the ITool interface so an LLM can call your tool directly.

It declares the tool contract (Name, Description, InputSchema), parses JSON args, runs your logic, and returns structured JSON to the model.

				
					public sealed class WeatherTool : ITool
{
    public string Name => "get_weather";
    public string Description => "Get current weather for a city. Returns temperature, conditions, and optional hourly forecast.";

    // JSON Schema defines expected arguments
    public string InputSchema => """
    {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name (e.g., 'Paris' or 'New York')"}
        },
        "required": ["city"]
    }
    """;

    public async Task<string> InvokeAsync(string arguments, CancellationToken ct = default)
    {
        // Parse the model's JSON arguments
        var city = JsonDocument.Parse(arguments).RootElement.GetProperty("city").GetString();

        // Call your weather API
        var weatherData = await FetchWeatherAsync(city);

        // Return structured JSON the model can understand
        var result = new { city, temp_c = weatherData.Temp, conditions = weatherData.Conditions };
        return JsonSerializer.Serialize(result);
    }
}

// Register it
chat.Tools.Register(new WeatherTool());

Why use ITool? Complete control over validation, async execution, error handling, and result formatting.

🏷️

2) Annotate Methods with `[LMFunction]` (Quick Binding)

Best for rapid prototyping and simple synchronous tools.

What it does: Add [LMFunction(name, description)] to public instance methods. LM-Kit discovers them and exposes each as an ITool, generating a JSON schema from method parameters.

How it’s wired: Reflect and bind with LMFunctionToolBinder.FromType<MyDomainTools>() (or FromInstance/FromAssembly), then register the resulting tools via chat.Tools.Register(...).

				
					public sealed class MyDomainTools
{
    [LMFunction("search_docs", "Search internal documentation by keyword. Returns top 5 matches.")]
    public string SearchDocs(string query)
    {
        var results = _documentIndex.Search(query).Take(5);
        return JsonSerializer.Serialize(new { hits = results });
    }

    [LMFunction("get_user_info", "Retrieve user profile and preferences.")]
    public string GetUserInfo(int userId)
    {
        var user = _database.GetUser(userId);
        return JsonSerializer.Serialize(user);
    }
}

// Automatically scan and register all annotated methods
var tools = LMFunctionToolBinder.FromType<MyDomainTools>();
chat.Tools.Register(tools);

Why use [LMFunction]? Less boilerplate. The binder LMFunctionToolBinder generates schemas from parameter types and registers everything in one line.

🔌

3) Import MCP Catalogs (External Services)

Best for connecting to third-party tool ecosystems via the Model Context Protocol.

What it does: Uses McpClient to establish a JSON-RPC session with an MCP server, fetch its tool catalog, and adapt those tools so your agent can call them.

How it’s wired: Create new McpClient(uri, httpClient) (optionally set a bearer token), then chat.Tools.Register(mcp, overwrite: false) to import the catalog; LM-Kit manages tools/list, tools/call, retries, and session persistence.

				
					using LMKit.Mcp.Client;

// Connect to an MCP server
var mcp = new McpClient(
    new Uri("https://mcp.example.com/api"),
    new HttpClient()
);

// Import all available tools from the server
int toolCount = chat.Tools.Register(mcp, overwrite: false);
Console.WriteLine($"Imported {toolCount} tools from MCP server");

Why use MCP? Instant access to curated tool catalogs. The server handles tools/list and tools/call over JSON-RPC; LM-Kit validates schemas locally. See McpClient.

Execution Modes That Match Your Workflow

Choose the right policy for each conversational turn:

Simple Function

One tool, one answer.

chat.ToolPolicy.MaxCallsPerTurn = 1;
chat.ToolPolicy.Choice = ToolChoice.Required; // force at least one call

Example: "What is the weather in Tokyo?" -> calls get_weather once -> answers.

Multiple Function

Chain tools sequentially.

chat.ToolPolicy.MaxCallsPerTurn = 5;
chat.ToolPolicy.Choice = ToolChoice.Auto;

Example: "Convert 75°F to Celsius, then tell me if I need a jacket."
Calls convert_temperature(75, "F", "C") -> gets 23.9°C
Calls get_weather("current_location") -> gets conditions
Synthesizes answer -> "It is 24°C and sunny. A light jacket should be fine."

Parallel Function

Execute multiple tools concurrently.

chat.ToolPolicy.AllowParallelCalls = true;
chat.ToolPolicy.MaxCallsPerTurn = 10;

Example: "Compare weather in Paris, London, and Berlin."
Calls get_weather("Paris"), get_weather("London"), get_weather("Berlin") simultaneously
Waits for all results -> compares -> answers

Only enable if your tools are idempotent and thread-safe.

Parallel Multiple Function

Combine chaining and parallelism.

Example: "Check weather in 3 cities, convert all temps to Fahrenheit, and recommend which to visit."
Parallel -> fetches weather for 3 cities
Parallel -> converts all temperatures
Sequential -> recommends based on results

See ToolCallPolicy for all options including ToolChoice.Specific and ForcedToolName. Defaults are conservative: parallel off, max calls capped.

Safety, Control, and Observability

⬡

Policy controls

Configure safe defaults and per-turn limits. See ToolCallPolicy.

chat.ToolPolicy = new ToolCallPolicy
{
    Choice = ToolChoice.Auto,           // let model decide
    MaxCallsPerTurn = 3,                // prevent runaway loops
    AllowParallelCalls = false,         // safe default: sequential only
    
    // Optional: force a specific tool first
    // Choice = ToolChoice.Specific,
    // ForcedToolName = "authenticate_user"
};

◉

Human in the loop

Review, approve, or block tool execution. Hooks: BeforeToolInvocation, AfterToolInvocation, BeforeTokenSampling, MemoryRecall.

// approve tool calls before execution
chat.BeforeToolInvocation += (sender, e) =>
{
    Console.WriteLine($"🔔 About to call: {e.ToolCall.Name}");
    Console.WriteLine($"   Arguments: {e.ToolCall.ArgumentsJson}");
    
    // block sensitive operations
    if (e.ToolCall.Name == "delete_user" && !UserHasApproved())
    {
        e.Cancel = true;
        Console.WriteLine("   ❌ Blocked by policy");
    }
};

// audit results after execution
chat.AfterToolInvocation += (sender, e) =>
{
    var result = e.ToolCallResult;
    Console.WriteLine($"✅ {result.ToolName} completed");
    Console.WriteLine($"   Status: {result.Type}");
    Console.WriteLine($"   Result: {result.ResultJson}");
    _telemetry.LogToolCall(result); // send to monitoring
};

// override token sampling in real time
chat.BeforeTokenSampling += (sender, e) =>
{
    if (_needsDeterministicOutput)
        e.Sampling.Temperature = 0.1f;
};

// control memory injection
chat.MemoryRecall += (sender, e) =>
{
    Console.WriteLine($"💭 Injecting memory: {e.Text.Substring(0, 50)}...");
    // e.Cancel = true; // optionally cancel
};

↻

Structured data flow

Every call flows through a typed pipeline for reproducibility and clear logs.

→

Incoming: ToolCall with stable Id and ArgumentsJson.
←

Outgoing: ToolCallResult with ToolCallId, ToolName, ResultJson, and Type (Success or Error).

Try It: Multi-Turn Chat Sample

Create Multi-Turn Chatbot with Tools in .NET Applications

Purpose

Demonstrates LM-Kit.NET’s agentic tool-calling: during a conversation, the model can decide to call one or multiple tools to fetch data or run computations, pass JSON arguments that match each tool’s InputSchema, and use each tool’s JSON result to produce a grounded reply—while preserving full multi-turn context. Tools implement ITool and are managed by a registry; per-turn behavior is shaped via ToolChoice.

Why tools in chatbots?

Reliable, source-backed answers (weather, FX, conversions, business APIs).
Agentic chaining: call several tools in one turn and combine results.
Determinism & safety: typed schemas, clear failure modes, policy control.
Extensibility: implement ITool for domain logic; keep code auditable.
Efficiency: offload math/lookup to tools; keep the model focused on reasoning.

Target audience

Product & platform teams; DevOps & internal tools; B2B apps; educators & demos.

Problem solved

Actionable answers, deterministic conversions/quotes, multi-turn memory, easy extensibility.

Sample app

lets you choose a local model (or a custom URI)
registers three tools (currency, weather, unit conversion)
runs a multi-turn chat where the model decides when to call tools
prints generation stats (tokens, stop reason, speed, context usage)

Key features

Tool calling via JSON arguments
Full dialogue memory
Progress feedback (download/load bars)
Special commands: /reset, /continue, /regenerate
Multiple tool calls per turn (and across turns)

Built-in tools

Tool name	Purpose	Online?	Notes
`convert_currency`	ECB rates via Frankfurter (latest or historical) + optional trend	Yes	No API key; business days; rounding & date support
`get_weather`	Open-Meteo current weather + optional short hourly forecast	Yes	No API key; geocoding + metric/us/si
`convert_units`	Offline conversions (length, mass, temperature, speed, etc.)	No	Temperature is non-linear; can list supported units

Tools implement ITool: Name, Description, InputSchema (JSON Schema), and InvokeAsync(string json) returning JSON.

Extend with your own tool

chat.Tools.Register(new MyCustomTool()); // implements ITool

Use unique, stable, lowercase snake_case names.

Supported models (pick per hardware)

Mistral Nemo 2407 12.2B (~7.7 GB VRAM)
Meta Llama 3.1 8B (~6 GB VRAM)
Google Gemma 3 4B Medium (~4 GB VRAM)
Microsoft Phi-4 Mini 3.82B (~3.3 GB VRAM)
Alibaba Qwen-3 8B (~5.6 GB VRAM)
Microsoft Phi-4 14.7B (~11 GB VRAM)
IBM Granite 4 7B (~6 GB VRAM)
Open-AI GPT-OSS 20B (~16 GB VRAM)
or provide a custom model URI (GGUF/LMK)

Commands

/reset — clear conversation
/continue — continue last assistant message
/regenerate — new answer for last user input

Example prompts

“Convert 125 USD to EUR and show a 7-day trend.”
“Weather in Toulouse next 6 hours (metric).”
“Convert 65 mph to km/h.” / “List pressure units.”
“Now 75 °F to °C, then 2 km to miles.”

Behavior & policies (quick reference)

Tool selection policy: by default the sample lets the model decide (ToolChoice.Auto). You can Require / Forbid / Force a specific tool per turn.
Multiple tool calls: supports several tool invocations per turn; outputs are injected back into context.
Schemas matter: precise InputSchema + concise Description improve argument construction.
Networking: currency & weather require internet; unit conversion is offline.
Errors: clear exceptions for invalid inputs (units, dates, locations).

Getting started

Prerequisites
.NET Framework 4.6.2 or .NET 6.0

Download

git clone https://github.com/LM-Kit/lm-kit-net-samples.git
cd lm-kit-net-samples/console_net/multi_turn_chat_with_tools

Run

dotnet build
dotnet run

Then pick a model or paste a custom URI. Chat naturally; the assistant will call one or multiple tools as needed. Use /reset, /continue, /regenerate anytime.

Project link

View on GitHub →

Complete Example: All Three Integration Paths

				
					// Load a capable local model
var model = LM.LoadFromModelID("gptoss:20b");
using var chat = new MultiTurnConversation(model);

// 1) ITool implementation
chat.Tools.Register(new WeatherTool());

// 2) LMFunctionAttribute methods
var tools = LMFunctionToolBinder.FromType<MyDomainTools>();
chat.Tools.Register(tools);

// 3) MCP import
var mcp = new McpClient(new Uri("https://mcp.example/api"), new HttpClient());
chat.Tools.Register(mcp);

// Safety and behavior
chat.ToolPolicy = new ToolCallPolicy
{
    Choice = ToolChoice.Auto,
    MaxCallsPerTurn = 3,
    // AllowParallelCalls = true       // enable only if tools are idempotent
};

// Human-in-the-loop
chat.BeforeToolInvocation += (_, e) => { /* approve or cancel */ };
chat.AfterToolInvocation  += (_, e) => { /* log results */ };

// Run
var answer = chat.Submit(
    "Find 3 relevant docs for 'safety policy' and summarize.");
Console.WriteLine(answer.Content);

Why Go Local with LM-Kit?

vs. Cloud Agent Frameworks

Zero API costs: No per-token charges. Run unlimited conversations.
Complete privacy: User data never leaves the device. GDPR/HIPAA friendly.
Sub-100ms latency: Local inference eliminates network roundtrips entirely.
Works offline: Agents function without internet connectivity.
No rate limits: Scale to millions of requests without throttling.
Full control: Own the stack. No vendor lock-in or API deprecations.

vs. Basic Prompt Engineering

Type-safe schemas: JSON Schema validation catches bad arguments before execution.
Deterministic results: Clear success/error states, not fragile regex parsing.
Parallel execution: Run multiple tools concurrently when safe.
Full observability: Structured events at every stage, not log archaeology.
Testable contracts: Mock tools, inject results, replay conversations.
Error boundaries: Graceful failures with retry logic and fallbacks.

vs. Manual Function Calling

Model decides: Agent autonomously picks tools and arguments—no brittle if/else chains.
Auto-chaining: Multiple tool calls per turn, results fed back automatically.
90% less boilerplate: Register tools once, not per-model or per-prompt.
Built-in safety: Loop prevention, max-calls limits, approval hooks out of the box.
Model-agnostic API: Same code works across Mistral, LLaMA, Qwen, Granite, GPT-OSS.
Progressive enhancement: Add tools without refactoring conversation logic.

Performance and Limitations

Performance expectations

Tool invocation overhead: ~2–5 ms per call (parsing + validation)
Network tools: 50–500 ms depending on API
Local tools: <1 ms
Model inference remains the primary latency factor.

Requirements

Models must support tool calling (check HasToolCalls).
Network-dependent tools require internet connectivity.
Parallel execution requires thread-safe, idempotent tools.
Recommended GPU memory: 6–16 GB VRAM depending on model size.

Known limitations

Tool selection quality depends on clear descriptions and schemas.
Complex nested objects in arguments may confuse smaller models.
Very long tool chains (>10 calls) may exceed context windows.

Ready to Build?

Clone the sample

git clone https://github.com/LM-Kit/lm-kit-net-samples.git
cd lm-kit-net-samples/console_net/multi_turn_chat_with_tools

Pick your integration approach
- Need full control? Use ITool.
- Prototyping quickly? Use [LMFunction].
- Using external catalogs? Use McpClient.
Add your domain logic

Replace demo tools with your APIs, databases, or business logic.
Set policies that fit your use case
- Simple lookups: MaxCallsPerTurn = 1
- Complex workflows: MaxCallsPerTurn = 10 with approval hooks
Ship agents that actually work

On-device. Private. Reliable. Observable.

Sample Repository Documentation Questions? Open an issue

Start building agentic workflows that respect user privacy, run anywhere, and stay under your control.