Apply
Register an adapter with LM.ApplyLoraAdapter(path, scale). Activates immediately when scale is above zero.
LM-Kit treats LoRA adapters as first-class runtime objects. Load
several megabytes of fine-tuned weights, attach them to a multi-gigabyte
base model, and toggle them in or out by adjusting a scale factor.
Ship one base model and dozens of specialised behaviours from the same
process, swap personas mid-conversation, or permanently bake adapters
in with LoraMerger.
Register an adapter with LM.ApplyLoraAdapter(path, scale). Activates immediately when scale is above zero.
Adjust LoraAdapter.Scale live to dial intensity from 0 (off) to 1 (full strength) without reloading.
Bake adapters permanently into the base weights with LoraMerger.Merge, optionally re-quantising.
A LoRA adapter typically weighs a few megabytes against a multi-gigabyte base. That asymmetry unlocks deployment patterns that full fine-tunes simply cannot match: per-tenant personalisation, A/B testing, and persona switching all from a single resident model.
Adapters are orders of magnitude smaller than the base. Distribute new behaviours over the wire without re-pushing gigabytes to every device.
Many adapters share the same resident base weights. RAM and VRAM scale with the number of active adapters, not with the number of installed ones.
Setting Scale to zero deactivates an adapter for subsequent inference. No reload, no checkpoint swap, no service restart.
Multiple adapters can be active at once, each with its own scale. Compose a domain adapter with a tone adapter, or blend a base persona with a customer-specific overlay.
RemoveLoraAdapter fully detaches an adapter and frees its registration. Useful for tenant logout, eviction, or A/B reset flows.
When an adapter graduates from experimental to canonical, LoraMerger folds it into the base weights and writes a new GGUF, optionally re-quantised.
Load two adapters, run the same prompt through each, and you get two distinct voices from a single resident base. The base model never moves.
using LMKit.Model; using LMKit.TextGeneration; using LMKit.Finetuning; // Single resident base model. var model = LM.LoadFromModelID("qwen3.5:4b"); // Register two specialised adapters. Both live in memory but are gated by Scale. model.ApplyLoraAdapter(@"adapters\support-tone.gguf", scale: 0); model.ApplyLoraAdapter(@"adapters\legal-domain.gguf", scale: 0); LoraAdapter support = model.Adapters[0]; LoraAdapter legal = model.Adapters[1]; var chat = new SingleTurnConversation(model); // Activate the support persona for the next exchange. support.Scale = 1.0f; legal.Scale = 0.0f; Console.WriteLine(chat.Submit("Customer cannot reset their password.")); // Switch personas without reloading the base model. support.Scale = 0.0f; legal.Scale = 1.0f; Console.WriteLine(chat.Submit("Summarise the indemnification clause.")); // Detach an adapter entirely when a tenant signs out. model.RemoveLoraAdapter(support);
Scales are not booleans. Two adapters can be partially active at the same time, letting you compose a domain expert with a tone overlay, or interpolate between fine-tunes for A/B experiments.
using LMKit.Model; using LMKit.Finetuning; var model = LM.LoadFromModelID("qwen3.5:4b"); // A medical-domain adapter and a friendly bedside-manner adapter. model.ApplyLoraAdapter(@"adapters\medical-domain.gguf", scale: 0.8f); model.ApplyLoraAdapter(@"adapters\bedside-tone.gguf", scale: 0.4f); // Inspect what's currently registered. foreach (LoraAdapter a in model.Adapters) { Console.WriteLine($"{a.Identifier} path={a.Path} scale={a.Scale}"); } // Tune the blend live based on user preference, AB test bucket, etc. model.Adapters[1].Scale = 0.7f; // more bedside warmth model.Adapters[0].Scale = 1.0f; // full medical expertise
Once an adapter has earned its place, fold it permanently into the model
with LoraMerger. The output is a new self-contained GGUF,
optionally re-quantised in the same pass.
using LMKit.Finetuning; var merger = new LoraMerger(@"models\qwen3-4b-base.gguf") { ThreadCount = Environment.ProcessorCount, EnableQuantization = true // re-quantise the merged weights }; // Stack adapters with their final scale factors. merger.AddLoraAdapter(@"adapters\medical-domain.gguf", scale: 1.0f); merger.AddLoraAdapter(@"adapters\bedside-tone.gguf", scale: 0.5f); // Produce a self-contained GGUF that no longer needs the adapters. merger.Merge(@"models\qwen3-4b-medical-merged.gguf");
One base model in RAM, one adapter per tenant on disk. Load on first request, cache while the tenant is active, evict on idle. Memory grows with concurrency, not customer count.
A "summarise", a "rewrite", and a "classify" adapter trained on the same base. Toggle the relevant one before each task; release it after.
Customer support, internal helpdesk, and marketing copilots can share weights and differ only in adapter and prompt. Update one persona without touching the others.
Run two adapter versions side by side in the same process. Bucket users by adjusting Scale; promote the winner with LoraMerger.
Ship the base once at install time and stream small adapters per user profile or device locale. Bandwidth and storage stay reasonable on consumer devices.
Layer a redaction or policy adapter on top of a generic base for regulated workloads. Activate it when the request is in scope, deactivate when it is not.
LM.ApplyLoraAdapterRegisters an adapter on a loaded model. Overloads accept either a path + scale or a LoraAdapterSource. The adapter joins the Adapters collection and activates when its scale is above zero.
LM.RemoveLoraAdapterDetaches a previously registered adapter and releases its native handle. Returns true if the adapter was found and removed.
LM.AdaptersRead-only collection of currently registered LoraAdapter instances. Iterate to inspect or to update individual Scale values live.
LoraAdapterRepresents a registered adapter. Exposes Identifier, Path, and a mutable Scale (clamped to zero or higher). Setting Scale = 0 disables the adapter for subsequent inference.
LoraAdapterSourceLightweight descriptor used to register an adapter. Pairs a file path with an initial scale factor. Reusable across ApplyLoraAdapter and LoraMerger.AddLoraAdapter.
LoraMergerPermanent merge pipeline. Add one or more adapters via AddLoraAdapter, set EnableQuantization and ThreadCount as needed, then call Merge(modelPath) to produce a new GGUF.
Produce the LoRA adapters this page consumes. LoraFinetuning handles dataset prep, training loop, and checkpointing.
After merging, re-quantise the resulting GGUF to land on the right precision-versus-quality trade-off for your deployment target.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.