Solutions · Model optimization · LoRA integration

One base model. Many specialisations.

LM-Kit treats LoRA adapters as first-class runtime objects. Load several megabytes of fine-tuned weights, attach them to a multi-gigabyte base model, and toggle them in or out by adjusting a scale factor. Ship one base model and dozens of specialised behaviours from the same process, swap personas mid-conversation, or permanently bake adapters in with LoraMerger.

Start building free API reference

Hot-swap at runtime Multi-adapter mixing Permanent merge option

Apply

Blend

Adjust LoraAdapter.Scale live to dial intensity from 0 (off) to 1 (full strength) without reloading.

Merge

Bake adapters permanently into the base weights with LoraMerger.Merge, optionally re-quantising.

Why runtime LoRA

Specialise without shipping a new model.

A LoRA adapter typically weighs a few megabytes against a multi-gigabyte base. That asymmetry unlocks deployment patterns that full fine-tunes simply cannot match: per-tenant personalisation, A/B testing, and persona switching all from a single resident model.

Tiny payloads

Adapters are orders of magnitude smaller than the base. Distribute new behaviours over the wire without re-pushing gigabytes to every device.

Memory shared

Many adapters share the same resident base weights. RAM and VRAM scale with the number of active adapters, not with the number of installed ones.

Live toggling

Setting Scale to zero deactivates an adapter for subsequent inference. No reload, no checkpoint swap, no service restart.

Stackable

Multiple adapters can be active at once, each with its own scale. Compose a domain adapter with a tone adapter, or blend a base persona with a customer-specific overlay.

Reversible

RemoveLoraAdapter fully detaches an adapter and frees its registration. Useful for tenant logout, eviction, or A/B reset flows.

Bakeable

When an adapter graduates from experimental to canonical, LoraMerger folds it into the base weights and writes a new GGUF, optionally re-quantised.

Hot-swap pattern

Apply, generate, swap, repeat.

Load two adapters, run the same prompt through each, and you get two distinct voices from a single resident base. The base model never moves.

HotSwap.cs

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Finetuning;

// Single resident base model.
var model = LM.LoadFromModelID("qwen3.5:4b");

// Register two specialised adapters. Both live in memory but are gated by Scale.
model.ApplyLoraAdapter(@"adapters\support-tone.gguf", scale: 0);
model.ApplyLoraAdapter(@"adapters\legal-domain.gguf", scale: 0);

LoraAdapter support = model.Adapters[0];
LoraAdapter legal   = model.Adapters[1];

var chat = new SingleTurnConversation(model);

// Activate the support persona for the next exchange.
support.Scale = 1.0f;
legal.Scale   = 0.0f;
Console.WriteLine(chat.Submit("Customer cannot reset their password."));

// Switch personas without reloading the base model.
support.Scale = 0.0f;
legal.Scale   = 1.0f;
Console.WriteLine(chat.Submit("Summarise the indemnification clause."));

// Detach an adapter entirely when a tenant signs out.
model.RemoveLoraAdapter(support);

Multi-adapter blending

Compose specialisations.

Scales are not booleans. Two adapters can be partially active at the same time, letting you compose a domain expert with a tone overlay, or interpolate between fine-tunes for A/B experiments.

Blend.cs

using LMKit.Model;
using LMKit.Finetuning;

var model = LM.LoadFromModelID("qwen3.5:4b");

// A medical-domain adapter and a friendly bedside-manner adapter.
model.ApplyLoraAdapter(@"adapters\medical-domain.gguf", scale: 0.8f);
model.ApplyLoraAdapter(@"adapters\bedside-tone.gguf",   scale: 0.4f);

// Inspect what's currently registered.
foreach (LoraAdapter a in model.Adapters)
{
    Console.WriteLine($"{a.Identifier}  path={a.Path}  scale={a.Scale}");
}

// Tune the blend live based on user preference, AB test bucket, etc.
model.Adapters[1].Scale = 0.7f; // more bedside warmth
model.Adapters[0].Scale = 1.0f; // full medical expertise

Permanent merge

Bake adapters into the base weights.

Once an adapter has earned its place, fold it permanently into the model with LoraMerger. The output is a new self-contained GGUF, optionally re-quantised in the same pass.

Merge.cs

using LMKit.Finetuning;

var merger = new LoraMerger(@"models\qwen3-4b-base.gguf")
{
    ThreadCount       = Environment.ProcessorCount,
    EnableQuantization = true      // re-quantise the merged weights
};

// Stack adapters with their final scale factors.
merger.AddLoraAdapter(@"adapters\medical-domain.gguf", scale: 1.0f);
merger.AddLoraAdapter(@"adapters\bedside-tone.gguf",   scale: 0.5f);

// Produce a self-contained GGUF that no longer needs the adapters.
merger.Merge(@"models\qwen3-4b-medical-merged.gguf");

Applications

Where adapters change the calculus.

Multi-tenant SaaS

One base model in RAM, one adapter per tenant on disk. Load on first request, cache while the tenant is active, evict on idle. Memory grows with concurrency, not customer count.

Per-task specialisation

A "summarise", a "rewrite", and a "classify" adapter trained on the same base. Toggle the relevant one before each task; release it after.

Persona switching

Customer support, internal helpdesk, and marketing copilots can share weights and differ only in adapter and prompt. Update one persona without touching the others.

A/B experiments

Run two adapter versions side by side in the same process. Bucket users by adjusting Scale; promote the winner with LoraMerger.

Edge personalisation

Ship the base once at install time and stream small adapters per user profile or device locale. Bandwidth and storage stay reasonable on consumer devices.

Compliance modes

Layer a redaction or policy adapter on top of a generic base for regulated workloads. Activate it when the request is in scope, deactivate when it is not.

Developer Resources

API reference.

`LM.ApplyLoraAdapter`

Registers an adapter on a loaded model. Overloads accept either a path + scale or a LoraAdapterSource. The adapter joins the Adapters collection and activates when its scale is above zero.

View documentation

`LM.RemoveLoraAdapter`

Detaches a previously registered adapter and releases its native handle. Returns true if the adapter was found and removed.

View documentation

`LM.Adapters`

Read-only collection of currently registered LoraAdapter instances. Iterate to inspect or to update individual Scale values live.

View documentation

`LoraAdapter`

Represents a registered adapter. Exposes Identifier, Path, and a mutable Scale (clamped to zero or higher). Setting Scale = 0 disables the adapter for subsequent inference.

View documentation

`LoraAdapterSource`

Lightweight descriptor used to register an adapter. Pairs a file path with an initial scale factor. Reusable across ApplyLoraAdapter and LoraMerger.AddLoraAdapter.

View documentation

`LoraMerger`

Permanent merge pipeline. Add one or more adapters via AddLoraAdapter, set EnableQuantization and ThreadCount as needed, then call Merge(modelPath) to produce a new GGUF.

View documentation

Related capabilities

Pair with training and quantisation.

LLM fine-tuning

Produce the LoRA adapters this page consumes. LoraFinetuning handles dataset prep, training loop, and checkpointing.

Fine-tuning page

Model quantization

After merging, re-quantise the resulting GGUF to land on the right precision-versus-quality trade-off for your deployment target.

Quantization page

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

One base. Every persona.

Get Community Edition Download

One base model. Many specialisations.

Apply

Blend

Merge

Specialise without shipping a new model.

Tiny payloads

Memory shared

Live toggling

Stackable

Reversible

Bakeable

Apply, generate, swap, repeat.

Compose specialisations.

Bake adapters into the base weights.

Where adapters change the calculus.

Multi-tenant SaaS

Per-task specialisation

Persona switching

A/B experiments

Edge personalisation

Compliance modes

API reference.

`LM.ApplyLoraAdapter`

`LM.RemoveLoraAdapter`

`LM.Adapters`

`LoraAdapter`

`LoraAdapterSource`

`LoraMerger`

Pair with training and quantisation.

LLM fine-tuning

Model quantization

Build it. Read it. Try it.

LoRA Adapter Hot-Swap

LoRA Adapter Hot-Swap walkthrough

Load and merge LoRA adapters

Prepare training datasets for LoRA finetuning

One base. Every persona.