Solutions · Local Inference · LLM fine-tuning

Specialise any model on your data.

The LoraFinetuning engine trains task-specific LoRA adapters against a frozen base model. Per-tensor rank control, AdamW with cosine decay, gradient accumulation, automatic early-stopping, checkpoint save/resume, and ShareGPT dataset export. A complete fine-tuning loop that runs on your hardware with no cloud uploads of your training data.

LoRA adapters Per-tensor ranks Checkpointing

LoraFinetuning

Trainer with iteration loop, progress events, checkpointing.

LoraTrainingParameters

25+ knobs: ranks, AdamW, cosine decay, gradient clipping, RoPE.

TrainingDataset

Build datasets from chat history, plain text, or ShareGPT.

ShareGptExporter

Export production conversations as a ShareGPT-format dataset.

Why LoRA?

A small adapter beats a big retrain.

Full fine-tuning of a 7B model needs 14+ GB of weights, a multi-GPU setup, and days of training. LoRA (Low-Rank Adaptation) trains a tiny rank-r matrix injected alongside each frozen weight. The base model stays untouched. The resulting adapter is typically 10 to 100 MB and can be hot-swapped at inference time, mixed with other adapters, or merged back into the base permanently with LoraMerger.

Tiny adapters

A typical adapter is 10 to 100 MB. Versionable, distributable, swappable. The base model never moves.

Reasonable hardware

Train rank-16 LoRAs on consumer GPUs (16 to 24 GB VRAM) for 4B-7B models. No multi-node setup required.

Per-task specialisation

One base model, many adapters. Sentiment, code, legal, medical: load the right adapter per request.

Privacy by design

Training data never leaves the machine. Your customer interactions, internal docs, and proprietary logs stay where they belong.

Training loop

A full training run in 30 lines.

FineTune.cs
using LMKit.Model;
using LMKit.Finetuning;

// 1. Load the base model (it stays frozen during training).
var model = new LM("path/to/base-model.gguf");

// 2. Configure training hyperparameters.
var parameters = new LoraTrainingParameters
{
    LoraRank             = 16,
    LoraAlpha            = 32,
    AdamAlpha            = 1e-4f,
    AdamBeta1            = 0.9f,
    AdamBeta2            = 0.999f,
    AdamDecay            = 0.01f,
    AdamGradientClipping = 1.0f,
    GradientAccumulation = 4,
    CosineDecaySteps     = 2000,
    CosineDecayMin       = 0.1f,
    MaxNoImprovement     = 100,

    // Per-tensor rank control: spend more capacity where it matters.
    RankWQ = 16, RankWK = 16, RankWV = 16, RankWO = 8
};

// 3. Wire up the trainer with progress events.
var trainer = new LoraFinetuning(model, parameters)
{
    Iterations          = 2000,
    BatchSize           = 8,
    ContextSize         = 2048,
    UseGradientCheckpointing = true,
    TrainingCheckpoint  = "checkpoints/run-2026-q1.bin"
};

trainer.FinetuningProgress += (s, e) =>
{
    Console.WriteLine($"Iter {e.Iteration}/{e.MaxIterations}  loss={e.Loss:F4}  lr={e.LearningRate:E2}");
};

// 4. Load training data from a ChatHistory, text file, or both.
int samples = trainer.LoadTrainingDataFromText("corpus/customer-support.jsonl");
Console.WriteLine($"{samples} training samples loaded");

// 5. Train. The output is a single .lora file.
trainer.Finetune2Lora("adapters/customer-support.lora");
Dataset tools

Build training data from production.

Most real fine-tuning failures stem from poor data, not poor hyperparameters. The SDK ships first-class tools for building, filtering, exporting, and versioning training datasets directly from running applications.

LoadTrainingDataFromChatHistory

Convert a ChatHistory from a live MultiTurnConversation into training samples in one call. Capture the best customer interactions and feed them back as supervision.

LoadTrainingDataFromText

Load JSONL or plain-text corpora. Multiple overloads cover ShareGPT, Alpaca, and custom formats.

FilterSamplesBySize

Drop samples outside (minSize, maxSize) token bounds in a single pass. Common cleanup step before training.

ShareGptExporter

Export collected samples as a standard ShareGPT-format JSON file. Share with team members or version-control alongside your code.

SampleAvgLength / SampleMinLength / SampleMaxLength

Inspect the loaded dataset's distribution before training. Catch outliers that would skew your loss curve.

Sample manipulation

GetSample(int), RemoveSample(int), ClearTrainingData(), SaveTrainingData: full control over the loaded corpus.

Hyperparameters

Twenty-five knobs, sane defaults.

Every important training parameter is exposed for advanced users; defaults are calibrated for common 4B to 13B fine-tunes.

Optimizer (AdamW)

AdamAlpha, AdamBeta1, AdamBeta2, AdamDecay, AdamDecayMinNDim, AdamGradientClipping.

Cosine schedule

CosineDecayMin, CosineDecayRestart, CosineDecaySteps. Anneal learning rate cleanly with optional warm restarts.

LoRA structure

LoraRank, LoraAlpha, plus per-tensor ranks RankWQ, RankWK, RankWV, RankWO, attention-norm and feed-forward ranks.

Gradient handling

GradientAccumulation for larger effective batches; UseGradientCheckpointing for memory savings on big models.

Early stopping

MaxNoImprovement: stop training after N iterations without loss improvement. No more babysitting runs.

RoPE control

RopeFreqBase, RopeFreqScale: tune positional encoding for long-context fine-tuning experiments.

Applications

Where local fine-tuning pays off.

Domain language

Adapt a base model to your domain's vocabulary: legal, medical, financial, scientific. Improve named-entity accuracy and instruction following without prompt engineering.

House style

Train an adapter that emits writing in your brand voice, with your editorial conventions, terminology, and tone.

Customer-facing chat

Fine-tune on your support transcripts for grounded, on-brand replies. Hot-swap a freshly-trained adapter daily without redeploying the base model.

Tool-call accuracy

Train against a corpus of (intent, function-call) pairs so the model rarely hallucinates tool names or arguments. Pair with grammar-constrained decoding for ironclad output.

Compliance training

Train on your team's redaction policies, privacy rules, and disclosure templates. Run entirely on premises so the policies themselves never leak.

Multi-tenant SaaS

Train a per-customer adapter on their data. Load adapters dynamically per request so each tenant gets a model that knows their patterns.

Developer Resources

API reference.

LoraFinetuning

Main trainer. Configure iteration count, batch size, context window, checkpointing. Subscribe to FinetuningProgress for live metrics.

View documentation

LoraTrainingParameters

25+ tunable hyperparameters. AdamW, cosine decay, gradient clipping, per-tensor ranks, RoPE controls.

View documentation

TrainingDataset / TrainingSample

Container types for the loaded corpus. Inspect, filter, save, reload during a training run.

View documentation

ShareGptExporter

Convert collected ChatTrainingSamples into a ShareGPT-format JSON file for sharing or archival.

View documentation

After training, see LoRA Integration for runtime hot-swap and Model Quantization to compress for deployment.

Train where the data lives.

No cloud uploads. No GPU rental. Your data, your model, your hardware.

Get Community Edition Download