Solutions · Local inference · Model catalog

Discover. Download. Run.

Pick a model by ID. LM.LoadFromModelID("qwen3.5:4b"). The Model Catalog handles the rest: download from the source, cache on disk, validate, load. Browse the catalogue programmatically to show users a model picker; filter by capability when only vision or only embedding-capable models qualify; track download progress in your UI. The infrastructure between "I want to use a model" and "the model is running" is a function call.

Start building free ModelCard API

Load by ID Auto-download Capability filtering

By ID

"qwen3.5:4b", "gemma4:26b-a4b", "paddleocr-vl:0.9b". Stable identifiers across versions.

Auto-cached

First load downloads. Subsequent loads hit the cache. Storage location configurable per deployment.

Filterable

Browse by capability (vision, embedding, OCR, function-calling), context length, parameter count, quantisation.

Why a curated catalogue

Model selection is half the work.

Public model hubs ship thousands of variants in dozens of formats with inconsistent metadata. Knowing which one fits your use case, runs on your hardware, supports your task, and behaves the way the SDK expects is the first hard problem of every local-AI project. The Model Catalog solves it once: every supported model, vetted and tagged with the metadata the SDK needs to load it correctly.

Stable identifiers

Each model has a short, stable ID (qwen3:4b, gemma3:12b). The ID survives revisions; pinned by content hash internally so identical IDs always load the same bytes.

Capability metadata

Each ModelCard declares its capabilities: chat, instruction-following, vision, embedding, OCR, tool calling, function calling, reasoning. Filter the catalogue by exactly what you need.

Hardware-honest

Every entry exposes parameter count, quantisation, file size, and context length. Estimate VRAM and disk before downloading, not after.

Progress callbacks

Subscribe to download progress. Render a progress bar, log to telemetry, route through your CDN. The transfer is observable, not opaque.

Configurable storage

Configuration.ModelStorageDirectory sets the cache location. Default OS-aware path; override for shared-storage farms, encrypted volumes, or per-tenant isolation.

Continuous updates

New models land in every release: Qwen, Gemma, GLM, Llama, Phi, Whisper, GLM-OCR, PaddleOCR-VL, embedding models. Bump a NuGet, gain access.

Four patterns

From ID to inference.

Quick load, progress-aware download, programmatic catalog browse, and a shared-cache deployment. Pick a tab.

The shortest path: pass a model ID. The runtime resolves it against the catalog, downloads on first run, caches afterwards. Subsequent runs are instant.

QuickLoad.cs

using LMKit.Model;
using LMKit.TextGeneration;

// Load by ID. Downloads on first run, caches afterwards.
var model = LM.LoadFromModelID("qwen3.5:4b");
var chat  = new MultiTurnConversation(model);

Console.WriteLine(await chat.SubmitAsync("Two-line bio of Marie Curie."));

First-time downloads can be multi-gigabyte. Wire the OnDownloadProgress callback to a progress bar so the user sees real bytes-received, total-bytes, and percent.

ProgressUI.cs

// First-time download with a progress bar in the UI.
var options = new LM.LoadOptions
{
    OnDownloadProgress = (sender, e) =>
    {
        progressBar.Value   = (int)(e.Percent * 100);
        progressLabel.Text  = $"{e.BytesReceived / 1024 / 1024} MB / {e.TotalBytes / 1024 / 1024} MB";
    }
};

var model = await LM.LoadFromModelIDAsync("gemma4:26b-a4b", options);

ModelCard.Catalog exposes every shipping model as a LINQ-queryable collection. Filter by Capabilities, FileSize, MaxContextLength, sort, present a picker, load the chosen ID.

ProgrammaticBrowse.cs

// Browse the catalogue. Filter to models that support vision and chat.
var visionChat = ModelCard.Catalog
    .Where(c => c.Capabilities.HasFlag(ModelCapabilities.Vision))
    .Where(c => c.Capabilities.HasFlag(ModelCapabilities.Chat))
    .Where(c => c.FileSize <= 12L * 1024 * 1024 * 1024)   // 12 GB or less
    .OrderBy(c => c.FileSize);

foreach (var card in visionChat)
{
    Console.WriteLine($"{card.ModelId,-22} {card.FileSize / 1_000_000_000m:F1} GB  ctx={card.MaxContextLength}");
}

// Let the user pick. Load by the chosen ID.
var chosen = visionChat.First();
var model  = LM.LoadFromModelID(chosen.ModelId);

Multi-machine deployments and multi-tenant servers share or isolate models via Configuration.ModelStorageDirectory. Point all nodes at a network share, or namespace per tenant. One line.

SharedCache.cs

using LMKit.Global;

// Multi-machine deployment: point every node at a shared model store.
Configuration.ModelStorageDirectory = @"\\fileserver\models";

// Per-tenant isolation: separate directory per tenant.
Configuration.ModelStorageDirectory = $@"D:\tenants\{tenantId}\models";

Where the catalogue ships

Real workflows.

In-app model picker

Render the catalogue as a list. Filter to capabilities your app uses. Let users choose; load by ID. The app size stays small; models download on demand.

First-run wizard

Detect available VRAM, recommend a matching model, download it once, run forever. The user never sees a config file.

Tier-aware deployments

Pro tier loads gemma3:27b; Standard loads gemma3:4b; Lite loads gemma3:1b. Same code path, different ID.

Capability-aware routing

Vision request? Pick a vision model. Embedding request? Pick an embedding model. Filter by capability flag, route the call.

Shared model farms

Point every node at a network share or read-only model volume. Pay the download cost once for the whole fleet.

Air-gapped pre-bundling

Pre-download on a connected machine, copy the cache to the air-gapped target, set ModelStorageDirectory, run.

Demos and guides

Working references.

GuideModel loading and caching GuideBrowse and select models programmatically GuideChoosing the right model GuideModel catalog reference APILM.LoadFromModelID APIModelCard

Related capabilities

Catalogue plus the rest.

Encrypted model loading

Catalogue models load by URI; private models load encrypted. Same SDK, different distribution channel.

Encrypted models

Multi-GPU placement

Pick the right model size for your placement plan. Catalogue exposes parameter count and quantisation per variant.

Multi-GPU

Quantization

When the catalogue does not ship the precision you need, quantise locally and load the result by file path.

Quantization

LoRA integration

Load a base model from the catalogue, then layer your adapter on top. Hot-swap personas without rebuilding the base.

LoRA

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

One ID. One line. Running.

Get Community Edition Download

Discover. Download. Run.

By ID

Auto-cached

Filterable

Model selection is half the work.

Stable identifiers

Capability metadata

Hardware-honest

Progress callbacks

Configurable storage

Continuous updates

From ID to inference.

Real workflows.

In-app model picker

First-run wizard

Tier-aware deployments

Capability-aware routing

Shared model farms

Air-gapped pre-bundling

Working references.

Catalogue plus the rest.

Encrypted model loading

Multi-GPU placement

Quantization

LoRA integration

Build it. Read it. Try it.

Model Catalog Browser

Model Catalog Browser walkthrough

Browse and select models programmatically

Understanding model loading and caching

Estimating memory and context size

One ID. One line. Running.