Solutions · Local inference · Backends

Every accelerator. One NuGet.

LM-Kit ships precompiled native runtimes for CUDA 12, CUDA 13, Vulkan, Metal, AVX2, and AVX in a single package. The runtime detects what the host machine can use, picks the fastest available path, and loads the matching backend. Pin a specific backend when you need to. Same .NET API on every target: developer laptop, M-series Mac, on-prem server, edge box, regulated workstation.

Start building free Backends API

6 precompiled backends Auto-detect One NuGet

CUDA 12 / 13

NVIDIA GPUs from consumer to data-centre. Two CUDA toolchain versions for broad driver compatibility.

Vulkan

Cross-vendor GPU acceleration. AMD, Intel, NVIDIA on Windows and Linux.

Metal

Native acceleration on Apple Silicon and Intel Macs. Universal binary distribution.

CPU SIMD

AVX2 fast path on modern x86 CPUs, AVX baseline for older hardware. ARM64 NEON on supported builds.

Why backend choice matters

The first deployment question.

The fastest path on a developer laptop is not the fastest path on a customer's regulated workstation, on a fleet of M-series Macs, or on an air-gapped Linux server with no proprietary drivers. A serious local-inference stack needs to handle every one of those without forcing a separate build, a separate NuGet, or a separate application package. LM-Kit ships them all and picks the right one at load time.

Auto-detection

At first load the runtime probes available drivers and capabilities. CUDA > Vulkan > Metal > CPU SIMD, with override hooks for fleet-wide policy. Single binary, hardware-aware behaviour.

Multiple CUDA paths

CUDA 12 and CUDA 13 ship side-by-side. The runtime selects the path the installed driver supports. No "wrong CUDA version" failures, no forced driver upgrades on customer hardware.

Vendor-agnostic GPU

Vulkan covers AMD, Intel Arc, and NVIDIA on Windows and Linux. Useful for fleets that mix vendors or for environments where proprietary drivers are forbidden.

Apple Silicon native

Metal backend with universal-binary distribution. M-series chips run at native speed; Intel Macs fall back to a CPU path that still uses every available core.

CPU-only viable

AVX2 path on modern x86 servers, AVX baseline for older hardware. Production-quality CPU-only inference for the workloads where GPUs are not an option.

Single distribution

All native runtimes ship inside the NuGet. No per-platform installer matrix, no separate downloads per accelerator. Customers get one package; the right backend wakes up at runtime.

The backend catalog

Six paths, one decision tree.

Each backend has a target. The auto-detect logic walks them in priority order and picks the first one the host can run.

CUDA 13

NVIDIA latest

Modern NVIDIA GPUs (RTX 30xx and newer) on hosts with up-to-date drivers. Highest throughput on supported hardware. First choice when the driver supports it.

CUDA 12

NVIDIA broad

Older driver baseline that covers a wider range of customer environments. Same kernels, broader compatibility. Picked automatically when CUDA 13 is not available.

Vulkan

Cross-vendor GPU

AMD GPUs, Intel Arc, NVIDIA on Linux without CUDA, Windows machines without the NVIDIA driver chain. Fleet-friendly when the hardware is mixed.

Metal

Apple Silicon

M1, M2, M3, M4 Macs. Unified-memory architecture lets the GPU and CPU share weights without copies. Native-speed inference on developer Macs and customer Macs.

AVX2

Modern CPU

Intel Haswell-and-later, AMD Excavator-and-later. Fast SIMD path that runs on practically every server built in the last decade. Default fall-through when no GPU is present.

AVX

Legacy compatibility

Older x86 hosts that lack AVX2. Slower but functional. The deployment safety net for environments where you cannot dictate the hardware.

Auto and explicit

Pick once, or pick per call.

The default. The runtime probes the host on first load, picks the fastest available backend, and exposes what it selected so you can log it for telemetry and support.

AutoBackend.cs

using LMKit.Model;

// Default behaviour: runtime probes the host and picks the fastest backend.
var model = LM.LoadFromModelID("qwen3.5:4b");

// Inspect what the runtime selected (useful for telemetry and support).
Console.WriteLine($"Backend in use: {model.Runtime.BackendName}");
Console.WriteLine($"Devices: {string.Join(", ", model.Runtime.Devices.Select(d => d.Name))}");

Force a specific backend regardless of what the host could run. Useful for fleets that need vendor-agnostic behaviour, for environments without NVIDIA drivers, or for diagnostic comparisons against the CPU path.

PinBackend.cs

using LMKit.Global;

// Pin the runtime to Vulkan even on a CUDA-capable host. Useful for fleets
// that want vendor-agnostic behaviour or for environments without NVIDIA drivers.
Configuration.PreferredBackend = BackendType.Vulkan;

// Or fall back to CPU explicitly for diagnostic comparisons.
Configuration.PreferredBackend = BackendType.Avx2;

var model = LM.LoadFromModelID("qwen3.5:4b");

Pre-flight inspection: enumerate available GPUs and their memory before picking a model size. The same call works on every backend, so this code is portable across CUDA, Vulkan, Metal and CPU hosts.

DeviceEnumeration.cs

// Pre-flight: enumerate available GPUs and their memory before loading.
foreach (var dev in Runtime.GetAvailableDevices())
{
    Console.WriteLine($"{dev.BackendType,-10} {dev.Name,-30} " +
                       $"{dev.MemoryBytes / 1_000_000_000m:F1} GB free");
}

// Choose a model that fits the smallest available device.
var minVram   = Runtime.GetAvailableDevices().Min(d => d.MemoryBytes);
var modelId   = minVram > 12_000_000_000 ? "gemma4:26b-a4b" : "gemma4:e4b";
var model     = LM.LoadFromModelID(modelId);

Platform matrix

What runs where.

Windows x64

CUDA 13 / 12 / Vulkan / AVX2

Default deployment target for most LM-Kit customers. NVIDIA GPUs accelerate via CUDA; AMD and Intel GPUs via Vulkan; CPU-only paths still production-grade.

Linux x64

CUDA 13 / 12 / Vulkan / AVX2

Server deployments. CUDA paths for NVIDIA-equipped hosts, Vulkan when proprietary drivers are not allowed, CPU SIMD for serverless and constrained environments.

Linux ARM64

CPU NEON

ARM servers, Raspberry Pi 5, Jetson-class boards. Native-speed CPU inference for industrial and edge deployments.

macOS Apple Silicon

Metal

M1, M2, M3, M4. Unified-memory advantage means even small Macs run surprisingly large models comfortably.

macOS Intel

AVX2

Older Mac hardware. CPU SIMD path keeps existing developer machines productive without a hardware upgrade cycle.

.NET targets

Standard 2.0, .NET 8 / 9 / 10

Library targets cover desktop, server, MAUI mobile, and AOT scenarios. Same NuGet, same APIs, same backends across the matrix.

Where backend choice ships

Real deployment shapes.

Mixed fleet

A customer has Windows boxes with NVIDIA, Linux servers with AMD, and developer Macs. One installer, one NuGet, three backends in production. Auto-detect handles each.

Driver-free environments

Regulated environments forbid proprietary drivers. Vulkan path runs on the same NVIDIA hardware without the CUDA toolchain.

CI / CD pipelines

CI runners without GPUs still test the inference path on AVX2. Same code, slower but functional. Production keeps running on GPU.

Customer support

Users on a wide range of hardware. BackendName in support diagnostics tells your team exactly what is running where, no more "what GPU?" guessing.

Mobile and edge

Apple Silicon for iPad and Mac apps via MAUI; ARM64 NEON for industrial Linux boxes; AVX2 for desktop installers; same codebase.

Performance forensics

Pin a backend explicitly to compare throughput across paths. The CPU baseline is a useful regression check against any GPU performance claim.

Demos and guides

Working references.

GuideConfigure GPU backends GuideConfigure backends and optimise performance GuideEstimate memory and context APIBackends namespace APIConfiguration.PreferredBackend GuideDistributed inference

Related capabilities

Backends plus the rest.

Multi-GPU & tensor overrides

Backends decide what runs the math; tensor overrides decide where each weight lives. Pair them for fine-grained device placement.

Multi-GPU

Quantization

Smaller artefacts run faster on every backend. Quantise to fit the device, then load on the right path.

Quantization

Edge & offline deployment

Pre-bundle the right backend for the target hardware. The NuGet handles every path; your installer ships them all.

Edge deployment

Sampling controls

Speculative decoding pairs naturally with hardware-aware backend choice. Draft on a small backend, full on the fast one.

Sampling controls

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Every accelerator. One package.

Get Community Edition Download

Every accelerator. One NuGet.

CUDA 12 / 13

Vulkan

Metal

CPU SIMD

Auto-detection

Multiple CUDA paths

Vendor-agnostic GPU

Apple Silicon native

CPU-only viable

Single distribution

NVIDIA latest

NVIDIA broad

Cross-vendor GPU

Apple Silicon

Modern CPU

Legacy compatibility

CUDA 13 / 12 / Vulkan / AVX2

CUDA 13 / 12 / Vulkan / AVX2

CPU NEON

Metal

AVX2

Standard 2.0, .NET 8 / 9 / 10

Mixed fleet

Driver-free environments

CI / CD pipelines

Customer support

Mobile and edge

Performance forensics

Multi-GPU & tensor overrides

Quantization

Edge & offline deployment

Sampling controls

Hardware Backends Inspector

Hardware Backends Inspector walkthrough

Configure GPU backends and optimize performance

Distributed inference

Configure GPU backends (getting started)