Solutions · Local inference · Model catalog

Discover. Download. Run.

Pick a model by ID. LM.LoadFromModelID("qwen3.5:4b"). The Model Catalog handles the rest: download from the source, cache on disk, validate, load. Browse the catalogue programmatically to show users a model picker; filter by capability when only vision or only embedding-capable models qualify; track download progress in your UI. The infrastructure between "I want to use a model" and "the model is running" is a function call.

Load by ID Auto-download Capability filtering

By ID

"qwen3.5:4b", "gemma4:26b-a4b", "paddleocr-vl:0.9b". Stable identifiers across versions.

Auto-cached

First load downloads. Subsequent loads hit the cache. Storage location configurable per deployment.

Filterable

Browse by capability (vision, embedding, OCR, function-calling), context length, parameter count, quantisation.

Why a curated catalogue

Model selection is half the work.

Public model hubs ship thousands of variants in dozens of formats with inconsistent metadata. Knowing which one fits your use case, runs on your hardware, supports your task, and behaves the way the SDK expects is the first hard problem of every local-AI project. The Model Catalog solves it once: every supported model, vetted and tagged with the metadata the SDK needs to load it correctly.

Stable identifiers

Each model has a short, stable ID (qwen3:4b, gemma3:12b). The ID survives revisions; pinned by content hash internally so identical IDs always load the same bytes.

Capability metadata

Each ModelCard declares its capabilities: chat, instruction-following, vision, embedding, OCR, tool calling, function calling, reasoning. Filter the catalogue by exactly what you need.

Hardware-honest

Every entry exposes parameter count, quantisation, file size, and context length. Estimate VRAM and disk before downloading, not after.

Progress callbacks

Subscribe to download progress. Render a progress bar, log to telemetry, route through your CDN. The transfer is observable, not opaque.

Configurable storage

Configuration.ModelStorageDirectory sets the cache location. Default OS-aware path; override for shared-storage farms, encrypted volumes, or per-tenant isolation.

Continuous updates

New models land in every release: Qwen, Gemma, GLM, Llama, Phi, Whisper, GLM-OCR, PaddleOCR-VL, embedding models. Bump a NuGet, gain access.

Four patterns

From ID to inference.

Quick load, progress-aware download, programmatic catalog browse, and a shared-cache deployment. Pick a tab.

The shortest path: pass a model ID. The runtime resolves it against the catalog, downloads on first run, caches afterwards. Subsequent runs are instant.

QuickLoad.cs
using LMKit.Model;
using LMKit.TextGeneration;

// Load by ID. Downloads on first run, caches afterwards.
var model = LM.LoadFromModelID("qwen3.5:4b");
var chat  = new MultiTurnConversation(model);

Console.WriteLine(await chat.SubmitAsync("Two-line bio of Marie Curie."));
Where the catalogue ships

Real workflows.

In-app model picker

Render the catalogue as a list. Filter to capabilities your app uses. Let users choose; load by ID. The app size stays small; models download on demand.

First-run wizard

Detect available VRAM, recommend a matching model, download it once, run forever. The user never sees a config file.

Tier-aware deployments

Pro tier loads gemma3:27b; Standard loads gemma3:4b; Lite loads gemma3:1b. Same code path, different ID.

Capability-aware routing

Vision request? Pick a vision model. Embedding request? Pick an embedding model. Filter by capability flag, route the call.

Shared model farms

Point every node at a network share or read-only model volume. Pay the download cost once for the whole fleet.

Air-gapped pre-bundling

Pre-download on a connected machine, copy the cache to the air-gapped target, set ModelStorageDirectory, run.

Related capabilities

Catalogue plus the rest.

Encrypted model loading

Catalogue models load by URI; private models load encrypted. Same SDK, different distribution channel.

Encrypted models

Multi-GPU placement

Pick the right model size for your placement plan. Catalogue exposes parameter count and quantisation per variant.

Multi-GPU

Quantization

When the catalogue does not ship the precision you need, quantise locally and load the result by file path.

Quantization

LoRA integration

Load a base model from the catalogue, then layer your adapter on top. Hot-swap personas without rebuilding the base.

LoRA

One ID. One line. Running.

Get Community Edition Download