CUDA 12 / 13
NVIDIA GPUs from consumer to data-centre. Two CUDA toolchain versions for broad driver compatibility.
LM-Kit ships precompiled native runtimes for CUDA 12, CUDA 13, Vulkan, Metal, AVX2, and AVX in a single package. The runtime detects what the host machine can use, picks the fastest available path, and loads the matching backend. Pin a specific backend when you need to. Same .NET API on every target: developer laptop, M-series Mac, on-prem server, edge box, regulated workstation.
NVIDIA GPUs from consumer to data-centre. Two CUDA toolchain versions for broad driver compatibility.
Cross-vendor GPU acceleration. AMD, Intel, NVIDIA on Windows and Linux.
Native acceleration on Apple Silicon and Intel Macs. Universal binary distribution.
AVX2 fast path on modern x86 CPUs, AVX baseline for older hardware. ARM64 NEON on supported builds.
The fastest path on a developer laptop is not the fastest path on a customer's regulated workstation, on a fleet of M-series Macs, or on an air-gapped Linux server with no proprietary drivers. A serious local-inference stack needs to handle every one of those without forcing a separate build, a separate NuGet, or a separate application package. LM-Kit ships them all and picks the right one at load time.
At first load the runtime probes available drivers and capabilities. CUDA > Vulkan > Metal > CPU SIMD, with override hooks for fleet-wide policy. Single binary, hardware-aware behaviour.
CUDA 12 and CUDA 13 ship side-by-side. The runtime selects the path the installed driver supports. No "wrong CUDA version" failures, no forced driver upgrades on customer hardware.
Vulkan covers AMD, Intel Arc, and NVIDIA on Windows and Linux. Useful for fleets that mix vendors or for environments where proprietary drivers are forbidden.
Metal backend with universal-binary distribution. M-series chips run at native speed; Intel Macs fall back to a CPU path that still uses every available core.
AVX2 path on modern x86 servers, AVX baseline for older hardware. Production-quality CPU-only inference for the workloads where GPUs are not an option.
All native runtimes ship inside the NuGet. No per-platform installer matrix, no separate downloads per accelerator. Customers get one package; the right backend wakes up at runtime.
Each backend has a target. The auto-detect logic walks them in priority order and picks the first one the host can run.
CUDA 13
Modern NVIDIA GPUs (RTX 30xx and newer) on hosts with up-to-date drivers. Highest throughput on supported hardware. First choice when the driver supports it.
CUDA 12
Older driver baseline that covers a wider range of customer environments. Same kernels, broader compatibility. Picked automatically when CUDA 13 is not available.
Vulkan
AMD GPUs, Intel Arc, NVIDIA on Linux without CUDA, Windows machines without the NVIDIA driver chain. Fleet-friendly when the hardware is mixed.
Metal
M1, M2, M3, M4 Macs. Unified-memory architecture lets the GPU and CPU share weights without copies. Native-speed inference on developer Macs and customer Macs.
AVX2
Intel Haswell-and-later, AMD Excavator-and-later. Fast SIMD path that runs on practically every server built in the last decade. Default fall-through when no GPU is present.
AVX
Older x86 hosts that lack AVX2. Slower but functional. The deployment safety net for environments where you cannot dictate the hardware.
The default. The runtime probes the host on first load, picks the fastest available backend, and exposes what it selected so you can log it for telemetry and support.
using LMKit.Model; // Default behaviour: runtime probes the host and picks the fastest backend. var model = LM.LoadFromModelID("qwen3.5:4b"); // Inspect what the runtime selected (useful for telemetry and support). Console.WriteLine($"Backend in use: {model.Runtime.BackendName}"); Console.WriteLine($"Devices: {string.Join(", ", model.Runtime.Devices.Select(d => d.Name))}");
Force a specific backend regardless of what the host could run. Useful for fleets that need vendor-agnostic behaviour, for environments without NVIDIA drivers, or for diagnostic comparisons against the CPU path.
using LMKit.Global; // Pin the runtime to Vulkan even on a CUDA-capable host. Useful for fleets // that want vendor-agnostic behaviour or for environments without NVIDIA drivers. Configuration.PreferredBackend = BackendType.Vulkan; // Or fall back to CPU explicitly for diagnostic comparisons. Configuration.PreferredBackend = BackendType.Avx2; var model = LM.LoadFromModelID("qwen3.5:4b");
Pre-flight inspection: enumerate available GPUs and their memory before picking a model size. The same call works on every backend, so this code is portable across CUDA, Vulkan, Metal and CPU hosts.
// Pre-flight: enumerate available GPUs and their memory before loading. foreach (var dev in Runtime.GetAvailableDevices()) { Console.WriteLine($"{dev.BackendType,-10} {dev.Name,-30} " + $"{dev.MemoryBytes / 1_000_000_000m:F1} GB free"); } // Choose a model that fits the smallest available device. var minVram = Runtime.GetAvailableDevices().Min(d => d.MemoryBytes); var modelId = minVram > 12_000_000_000 ? "gemma4:26b-a4b" : "gemma4:e4b"; var model = LM.LoadFromModelID(modelId);
Windows x64
Default deployment target for most LM-Kit customers. NVIDIA GPUs accelerate via CUDA; AMD and Intel GPUs via Vulkan; CPU-only paths still production-grade.
Linux x64
Server deployments. CUDA paths for NVIDIA-equipped hosts, Vulkan when proprietary drivers are not allowed, CPU SIMD for serverless and constrained environments.
Linux ARM64
ARM servers, Raspberry Pi 5, Jetson-class boards. Native-speed CPU inference for industrial and edge deployments.
macOS Apple Silicon
M1, M2, M3, M4. Unified-memory advantage means even small Macs run surprisingly large models comfortably.
macOS Intel
Older Mac hardware. CPU SIMD path keeps existing developer machines productive without a hardware upgrade cycle.
.NET targets
Library targets cover desktop, server, MAUI mobile, and AOT scenarios. Same NuGet, same APIs, same backends across the matrix.
A customer has Windows boxes with NVIDIA, Linux servers with AMD, and developer Macs. One installer, one NuGet, three backends in production. Auto-detect handles each.
Regulated environments forbid proprietary drivers. Vulkan path runs on the same NVIDIA hardware without the CUDA toolchain.
CI runners without GPUs still test the inference path on AVX2. Same code, slower but functional. Production keeps running on GPU.
Users on a wide range of hardware. BackendName in support diagnostics tells your team exactly what is running where, no more "what GPU?" guessing.
Apple Silicon for iPad and Mac apps via MAUI; ARM64 NEON for industrial Linux boxes; AVX2 for desktop installers; same codebase.
Pin a backend explicitly to compare throughput across paths. The CPU baseline is a useful regression check against any GPU performance claim.
Backends decide what runs the math; tensor overrides decide where each weight lives. Pair them for fine-grained device placement.
Smaller artefacts run faster on every backend. Quantise to fit the device, then load on the right path.
Pre-bundle the right backend for the target hardware. The NuGet handles every path; your installer ships them all.
Speculative decoding pairs naturally with hardware-aware backend choice. Draft on a small backend, full on the fast one.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
How-to: pick CUDA / Vulkan / Metal / AVX2 and tune throughput.
Read the guide → How-to guideSplit inference across machines for higher throughput.
Read the guide → How-to guideFirst-time setup walkthrough for each backend.
Read the guide →