Solutions · Model optimization · LoRA integration

One base model. Many specialisations.

LM-Kit treats LoRA adapters as first-class runtime objects. Load several megabytes of fine-tuned weights, attach them to a multi-gigabyte base model, and toggle them in or out by adjusting a scale factor. Ship one base model and dozens of specialised behaviours from the same process, swap personas mid-conversation, or permanently bake adapters in with LoraMerger.

Hot-swap at runtime Multi-adapter mixing Permanent merge option

Apply

Register an adapter with LM.ApplyLoraAdapter(path, scale). Activates immediately when scale is above zero.

Blend

Adjust LoraAdapter.Scale live to dial intensity from 0 (off) to 1 (full strength) without reloading.

Merge

Bake adapters permanently into the base weights with LoraMerger.Merge, optionally re-quantising.

Why runtime LoRA

Specialise without shipping a new model.

A LoRA adapter typically weighs a few megabytes against a multi-gigabyte base. That asymmetry unlocks deployment patterns that full fine-tunes simply cannot match: per-tenant personalisation, A/B testing, and persona switching all from a single resident model.

Tiny payloads

Adapters are orders of magnitude smaller than the base. Distribute new behaviours over the wire without re-pushing gigabytes to every device.

Memory shared

Many adapters share the same resident base weights. RAM and VRAM scale with the number of active adapters, not with the number of installed ones.

Live toggling

Setting Scale to zero deactivates an adapter for subsequent inference. No reload, no checkpoint swap, no service restart.

Stackable

Multiple adapters can be active at once, each with its own scale. Compose a domain adapter with a tone adapter, or blend a base persona with a customer-specific overlay.

Reversible

RemoveLoraAdapter fully detaches an adapter and frees its registration. Useful for tenant logout, eviction, or A/B reset flows.

Bakeable

When an adapter graduates from experimental to canonical, LoraMerger folds it into the base weights and writes a new GGUF, optionally re-quantised.

Hot-swap pattern

Apply, generate, swap, repeat.

Load two adapters, run the same prompt through each, and you get two distinct voices from a single resident base. The base model never moves.

HotSwap.cs
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Finetuning;

// Single resident base model.
var model = LM.LoadFromModelID("qwen3.5:4b");

// Register two specialised adapters. Both live in memory but are gated by Scale.
model.ApplyLoraAdapter(@"adapters\support-tone.gguf", scale: 0);
model.ApplyLoraAdapter(@"adapters\legal-domain.gguf", scale: 0);

LoraAdapter support = model.Adapters[0];
LoraAdapter legal   = model.Adapters[1];

var chat = new SingleTurnConversation(model);

// Activate the support persona for the next exchange.
support.Scale = 1.0f;
legal.Scale   = 0.0f;
Console.WriteLine(chat.Submit("Customer cannot reset their password."));

// Switch personas without reloading the base model.
support.Scale = 0.0f;
legal.Scale   = 1.0f;
Console.WriteLine(chat.Submit("Summarise the indemnification clause."));

// Detach an adapter entirely when a tenant signs out.
model.RemoveLoraAdapter(support);
Multi-adapter blending

Compose specialisations.

Scales are not booleans. Two adapters can be partially active at the same time, letting you compose a domain expert with a tone overlay, or interpolate between fine-tunes for A/B experiments.

Blend.cs
using LMKit.Model;
using LMKit.Finetuning;

var model = LM.LoadFromModelID("qwen3.5:4b");

// A medical-domain adapter and a friendly bedside-manner adapter.
model.ApplyLoraAdapter(@"adapters\medical-domain.gguf", scale: 0.8f);
model.ApplyLoraAdapter(@"adapters\bedside-tone.gguf",   scale: 0.4f);

// Inspect what's currently registered.
foreach (LoraAdapter a in model.Adapters)
{
    Console.WriteLine($"{a.Identifier}  path={a.Path}  scale={a.Scale}");
}

// Tune the blend live based on user preference, AB test bucket, etc.
model.Adapters[1].Scale = 0.7f; // more bedside warmth
model.Adapters[0].Scale = 1.0f; // full medical expertise
Permanent merge

Bake adapters into the base weights.

Once an adapter has earned its place, fold it permanently into the model with LoraMerger. The output is a new self-contained GGUF, optionally re-quantised in the same pass.

Merge.cs
using LMKit.Finetuning;

var merger = new LoraMerger(@"models\qwen3-4b-base.gguf")
{
    ThreadCount       = Environment.ProcessorCount,
    EnableQuantization = true      // re-quantise the merged weights
};

// Stack adapters with their final scale factors.
merger.AddLoraAdapter(@"adapters\medical-domain.gguf", scale: 1.0f);
merger.AddLoraAdapter(@"adapters\bedside-tone.gguf",   scale: 0.5f);

// Produce a self-contained GGUF that no longer needs the adapters.
merger.Merge(@"models\qwen3-4b-medical-merged.gguf");
Applications

Where adapters change the calculus.

Multi-tenant SaaS

One base model in RAM, one adapter per tenant on disk. Load on first request, cache while the tenant is active, evict on idle. Memory grows with concurrency, not customer count.

Per-task specialisation

A "summarise", a "rewrite", and a "classify" adapter trained on the same base. Toggle the relevant one before each task; release it after.

Persona switching

Customer support, internal helpdesk, and marketing copilots can share weights and differ only in adapter and prompt. Update one persona without touching the others.

A/B experiments

Run two adapter versions side by side in the same process. Bucket users by adjusting Scale; promote the winner with LoraMerger.

Edge personalisation

Ship the base once at install time and stream small adapters per user profile or device locale. Bandwidth and storage stay reasonable on consumer devices.

Compliance modes

Layer a redaction or policy adapter on top of a generic base for regulated workloads. Activate it when the request is in scope, deactivate when it is not.

Developer Resources

API reference.

LM.ApplyLoraAdapter

Registers an adapter on a loaded model. Overloads accept either a path + scale or a LoraAdapterSource. The adapter joins the Adapters collection and activates when its scale is above zero.

View documentation

LM.RemoveLoraAdapter

Detaches a previously registered adapter and releases its native handle. Returns true if the adapter was found and removed.

View documentation

LM.Adapters

Read-only collection of currently registered LoraAdapter instances. Iterate to inspect or to update individual Scale values live.

View documentation

LoraAdapter

Represents a registered adapter. Exposes Identifier, Path, and a mutable Scale (clamped to zero or higher). Setting Scale = 0 disables the adapter for subsequent inference.

View documentation

LoraAdapterSource

Lightweight descriptor used to register an adapter. Pairs a file path with an initial scale factor. Reusable across ApplyLoraAdapter and LoraMerger.AddLoraAdapter.

View documentation

LoraMerger

Permanent merge pipeline. Add one or more adapters via AddLoraAdapter, set EnableQuantization and ThreadCount as needed, then call Merge(modelPath) to produce a new GGUF.

View documentation

Related capabilities

Pair with training and quantisation.

LLM fine-tuning

Produce the LoRA adapters this page consumes. LoraFinetuning handles dataset prep, training loop, and checkpointing.

Fine-tuning page

Model quantization

After merging, re-quantise the resulting GGUF to land on the right precision-versus-quality trade-off for your deployment target.

Quantization page

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

One base. Every persona.

Get Community Edition Download