Why Local AI · Compare · LM-Kit.NET vs Foundry Local

LM-Kit.NET vs Foundry Local Same Vision, Different Scope

Microsoft Foundry Local (formerly Azure AI Foundry Local) is a local inference runtime built on ONNX Runtime. LM-Kit.NET is a complete AI development platform with its own inference engine, agents, RAG, and document intelligence. Both believe in local AI, but they differ in what they deliver.

Product Positioning

Foundry Local

ONNX-based local inference runtime with model management and OpenAI-compatible API

LM-Kit.NET

Self-contained AI platform with inference, agents, RAG, documents, NLP, and speech

Quick Comparison

60+
Built-in Models
5
GPU Backends
GA
Production Status
1
NuGet Package
Before we compare

Before We Compare: Different Product Categories.

Microsoft Foundry Local and LM-Kit.NET share the same fundamental belief: AI should run locally on your hardware. But they occupy very different positions in the stack. Foundry Local is an inference runtime. LM-Kit.NET is a complete AI development platform. Understanding this distinction is essential for a fair comparison.

Microsoft Foundry Local

Foundry Local (formerly Azure AI Foundry Local) is a free, on-device AI inference runtime built on ONNX Runtime. It downloads, manages, and serves ONNX models locally through an OpenAI-compatible API or native in-process SDK. Currently in public preview.

  • ONNX Runtime with auto hardware detection
  • NPU, CUDA, DirectML, Metal acceleration
  • OpenAI-compatible REST API
  • C#, Python, JavaScript, and Rust SDKs

Think of it this way: Foundry Local is like an engine. It runs models and returns completions. LM-Kit.NET is the entire vehicle: it has the engine (inference), but also the navigation system (RAG), the dashboard instruments (NLP and text analysis), the cargo bay (document processing), the communication system (speech), and the autopilot (agent orchestration). If all you need is an engine, Foundry Local is a solid choice. If you need the whole vehicle, LM-Kit.NET delivers it in one package.

Foundry Local strengths

Where Foundry Local Genuinely Shines

Foundry Local is a well-designed inference runtime backed by Microsoft and the ONNX ecosystem. Here are the areas where it excels.

NPU and Hardware Auto-Detection

First-class support for Qualcomm, Intel, and AMD NPUs alongside CUDA and DirectML. Automatically detects your hardware and downloads the optimal model variant.

Multi-Language SDKs

Available in C#, Python, JavaScript, and Rust. Teams using polyglot stacks can access local inference from their preferred language.

OpenAI-Compatible API

Exposes standard OpenAI endpoints for chat completions, audio transcription, and embeddings. Any OpenAI SDK client can point at the local endpoint with minimal changes.

Microsoft Ecosystem Integration

Part of Windows AI Foundry. Integrates with Semantic Kernel, Microsoft.Extensions.AI, the AI Toolkit for VS Code, and the broader Microsoft Foundry cloud platform.

Free to Use

No cost to install or run. No API keys, no metered billing, no subscription required. The runtime itself is proprietary but free for developers.

Android and Mobile Support

Foundry Local has entered private preview on Android, with a major partner (PhonePe) already integrating on-device AI for their mobile platform.

LM-Kit.NET advantages

Where LM-Kit.NET Goes Further

Foundry Local focuses on one thing: running models locally. LM-Kit.NET starts with local inference and builds an entire AI development platform on top. Here is what that means in practice.

Agent Orchestration

Foundry Local has no agent framework. It supports basic tool calling (one tool per request, limited to Qwen models) but cannot coordinate multi-step workflows. LM-Kit.NET ships a full agent orchestration system.

  • Pipeline, Parallel, Router, Supervisor patterns
  • Rich tool catalog with permission policies
  • ReAct planning with multi-step reasoning
  • Agent memory and MCP protocol support

Complete RAG Pipeline

Foundry Local has no RAG capabilities. Building RAG requires assembling external components (Semantic Kernel, a vector database, an embedding service). LM-Kit.NET ships the full pipeline in one SDK.

  • Hybrid retrieval: vector + BM25 with RRF
  • Built-in vector store and Qdrant connector
  • Semantic, Markdown, HTML, layout chunking
  • Multi-query, HyDE, query contextualization

Document Intelligence

Foundry Local has no document processing. PDF parsing, OCR, and format conversion require external Azure services or third-party libraries. LM-Kit.NET handles all of this natively.

  • PDF text extraction, OCR, table detection
  • PDF/image to Markdown conversion
  • HTML, EML, DOCX processing
  • Document splitting and structured extraction

NLP and Text Analysis

Foundry Local has no dedicated NLP features. Developers must prompt the LLM directly for text analysis tasks. LM-Kit.NET provides purpose-built, high-accuracy NLP APIs.

  • NER with 102 entity types, PII detection
  • Sentiment analysis, emotion detection
  • Custom text and document classification
  • Language detection and translation

Model Ecosystem (60+ Models, GGUF)

Foundry Local is restricted to ONNX format with a small curated catalog. Custom models require conversion through Microsoft Olive. LM-Kit.NET uses the GGUF format, giving access to the broadest model ecosystem available.

  • 60+ curated models (Gemma 3, Qwen 3, Phi-4, Llama, etc.)
  • GGUF: thousands of community quantizations available
  • On-device fine-tuning (LoRA) and quantization
  • No format conversion required

Speech, Vision, and Fine-Tuning

Foundry Local supports Whisper transcription and Phi-3.5 vision, but has no on-device fine-tuning or text-to-speech. LM-Kit.NET covers all three areas within the same SDK.

  • Whisper speech-to-text (tiny to large-v3-turbo)
  • Vision language models (Qwen2-VL, Gemma3-VL)
  • On-device LoRA fine-tuning
  • Model quantization for deployment optimization
Feature comparison

Detailed Comparison.

A thorough, category-by-category comparison. We have marked features honestly, including where Foundry Local has the edge.

FeatureLM-Kit.NETFoundry Local
Core Architecture
Product type Complete AI platform Inference runtime
Inference engine llama.cpp (built-in) ONNX Runtime GenAI
Model format GGUF ONNX only
In-process inference Yes Yes (C# SDK v0.8+)
OpenAI-compatible API No (SK & MEAI bridges) Yes (native)
Production status Generally Available Public Preview
Model Management
Curated model catalog 60+ models ~15 models
Auto model download Yes Yes
Hardware-adaptive variants Manual selection Auto-detection
Custom model support Any GGUF model ONNX via Olive conversion
Model cache management Yes Yes (CLI + SDK)
Hardware Acceleration
CUDA (NVIDIA GPU) CUDA 12/13 CUDA
Vulkan (cross-platform GPU) Yes No
Metal (Apple GPU) Yes Yes
DirectML (AMD/Intel GPU) No Yes
NPU (Qualcomm, Intel, AMD) No Yes (QNN, OpenVINO)
TensorRT (NVIDIA optimized) No Yes
AVX/AVX2 (CPU optimized) Yes Yes
Agent Orchestration
Agent framework Built-in (4 patterns) None
Tool / function calling Rich catalog + custom Basic (1 tool/request, Qwen only)
Multi-agent patterns Pipeline, Parallel, Router, Supervisor None
Agent memory Yes No
MCP protocol Yes No
Tool permission policies Yes (category, risk, approval) No
RAG & Retrieval
Built-in RAG pipeline Yes No
Vector store Built-in + Qdrant None (external required)
Hybrid retrieval (BM25 + vector) Yes with RRF No
Document chunking strategies Semantic, Markdown, HTML, layout No
Embeddings generation Built-in models API exists, limited catalog models
Document & Vision
PDF processing Built-in (pdfium) No
OCR Built-in (tesseract) No
Vision language models Qwen2-VL, Gemma3-VL Phi-3.5-Vision
Document format conversion PDF/HTML/EML/DOCX to Markdown No
NLP & Text Analysis
Named entity recognition 102 entity types No
PII detection Yes No
Sentiment analysis Yes No
Text classification Custom categories No
Language detection / translation Yes No
Speech
Speech-to-text (Whisper) Tiny through large-v3-turbo Whisper-tiny, whisper-medium
Streaming transcription Yes Yes (C# SDK)
Model Customization
On-device fine-tuning LoRA No (requires Olive + cloud)
On-device quantization Yes Pre-quantized only (via Olive)
Grammar-constrained generation JSON schema, GBNF No
Platform & Licensing
Windows x64 x64, ARM
macOS Universal (Apple Silicon) Apple Silicon
Linux x64, ARM64 In development
Android No Private Preview
SDK languages .NET (C#) C#, Python, JS, Rust
License Commercial (free trial) Proprietary (free to use)
Microsoft.Extensions.AI Yes (bridge) Yes (via OpenAI compat)
Semantic Kernel integration Yes (dedicated bridge) Yes (via OpenAI connector)
Decision

Which One Fits Your Project?

Honest guidance based on what each product actually delivers today.

Choose Foundry Local if...

Best for teams that need a lightweight inference runtime with NPU support and multi-language access.

  • You only need chat completions and basic inference
  • You target NPU hardware (Snapdragon, Intel NPU)
  • You need Python, JavaScript, or Rust SDKs
  • You want drop-in OpenAI API compatibility
  • You already use ONNX models in your pipeline
  • You are prototyping and preview status is acceptable

Build production AI in .NET.

Local inference, agents, RAG, document intelligence, speech, vision. One SDK. 100% on-device.

Download free SDK overview