By ID
"qwen3.5:4b", "gemma4:26b-a4b", "paddleocr-vl:0.9b". Stable identifiers across versions.
Pick a model by ID. LM.LoadFromModelID("qwen3.5:4b").
The Model Catalog handles the rest: download from the source, cache
on disk, validate, load. Browse the catalogue programmatically to
show users a model picker; filter by capability when only vision
or only embedding-capable models qualify; track download progress
in your UI. The infrastructure between "I want to use a model" and
"the model is running" is a function call.
"qwen3.5:4b", "gemma4:26b-a4b", "paddleocr-vl:0.9b". Stable identifiers across versions.
First load downloads. Subsequent loads hit the cache. Storage location configurable per deployment.
Browse by capability (vision, embedding, OCR, function-calling), context length, parameter count, quantisation.
Public model hubs ship thousands of variants in dozens of formats with inconsistent metadata. Knowing which one fits your use case, runs on your hardware, supports your task, and behaves the way the SDK expects is the first hard problem of every local-AI project. The Model Catalog solves it once: every supported model, vetted and tagged with the metadata the SDK needs to load it correctly.
Each model has a short, stable ID (qwen3:4b, gemma3:12b). The ID survives revisions; pinned by content hash internally so identical IDs always load the same bytes.
Each ModelCard declares its capabilities: chat, instruction-following, vision, embedding, OCR, tool calling, function calling, reasoning. Filter the catalogue by exactly what you need.
Every entry exposes parameter count, quantisation, file size, and context length. Estimate VRAM and disk before downloading, not after.
Subscribe to download progress. Render a progress bar, log to telemetry, route through your CDN. The transfer is observable, not opaque.
Configuration.ModelStorageDirectory sets the cache location. Default OS-aware path; override for shared-storage farms, encrypted volumes, or per-tenant isolation.
New models land in every release: Qwen, Gemma, GLM, Llama, Phi, Whisper, GLM-OCR, PaddleOCR-VL, embedding models. Bump a NuGet, gain access.
Quick load, progress-aware download, programmatic catalog browse, and a shared-cache deployment. Pick a tab.
The shortest path: pass a model ID. The runtime resolves it against the catalog, downloads on first run, caches afterwards. Subsequent runs are instant.
using LMKit.Model; using LMKit.TextGeneration; // Load by ID. Downloads on first run, caches afterwards. var model = LM.LoadFromModelID("qwen3.5:4b"); var chat = new MultiTurnConversation(model); Console.WriteLine(await chat.SubmitAsync("Two-line bio of Marie Curie."));
First-time downloads can be multi-gigabyte. Wire the
OnDownloadProgress callback to a progress bar so the
user sees real bytes-received, total-bytes, and percent.
// First-time download with a progress bar in the UI. var options = new LM.LoadOptions { OnDownloadProgress = (sender, e) => { progressBar.Value = (int)(e.Percent * 100); progressLabel.Text = $"{e.BytesReceived / 1024 / 1024} MB / {e.TotalBytes / 1024 / 1024} MB"; } }; var model = await LM.LoadFromModelIDAsync("gemma4:26b-a4b", options);
ModelCard.Catalog exposes every shipping model as a
LINQ-queryable collection. Filter by Capabilities,
FileSize, MaxContextLength, sort, present
a picker, load the chosen ID.
// Browse the catalogue. Filter to models that support vision and chat. var visionChat = ModelCard.Catalog .Where(c => c.Capabilities.HasFlag(ModelCapabilities.Vision)) .Where(c => c.Capabilities.HasFlag(ModelCapabilities.Chat)) .Where(c => c.FileSize <= 12L * 1024 * 1024 * 1024) // 12 GB or less .OrderBy(c => c.FileSize); foreach (var card in visionChat) { Console.WriteLine($"{card.ModelId,-22} {card.FileSize / 1_000_000_000m:F1} GB ctx={card.MaxContextLength}"); } // Let the user pick. Load by the chosen ID. var chosen = visionChat.First(); var model = LM.LoadFromModelID(chosen.ModelId);
Multi-machine deployments and multi-tenant servers share or isolate
models via Configuration.ModelStorageDirectory. Point
all nodes at a network share, or namespace per tenant. One line.
using LMKit.Global; // Multi-machine deployment: point every node at a shared model store. Configuration.ModelStorageDirectory = @"\\fileserver\models"; // Per-tenant isolation: separate directory per tenant. Configuration.ModelStorageDirectory = $@"D:\tenants\{tenantId}\models";
Render the catalogue as a list. Filter to capabilities your app uses. Let users choose; load by ID. The app size stays small; models download on demand.
Detect available VRAM, recommend a matching model, download it once, run forever. The user never sees a config file.
Pro tier loads gemma3:27b; Standard loads gemma3:4b; Lite loads gemma3:1b. Same code path, different ID.
Vision request? Pick a vision model. Embedding request? Pick an embedding model. Filter by capability flag, route the call.
Point every node at a network share or read-only model volume. Pay the download cost once for the whole fleet.
Pre-download on a connected machine, copy the cache to the air-gapped target, set ModelStorageDirectory, run.
Catalogue models load by URI; private models load encrypted. Same SDK, different distribution channel.
Pick the right model size for your placement plan. Catalogue exposes parameter count and quantisation per variant.
When the catalogue does not ship the precision you need, quantise locally and load the result by file path.
Load a base model from the catalogue, then layer your adapter on top. Hot-swap personas without rebuilding the base.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Discover the catalog from code, filter by capabilities.
Read the guide → How-to guideHow LoadFromModelID works; where models are stored; cache eviction.
Read the guide → How-to guidePlan VRAM and RAM budgets per model and per session.
Read the guide →