Solutions · Local inference · Encrypted models

Ship the model. Keep the secrets.

Fine-tuned models are intellectual property. So are the proprietary datasets that produced them. Yet most local-LLM stacks ask you to distribute weights as plaintext on disk, copyable in a clipboard move. LM.LoadEncrypted ships a stream-based decryption path: plaintext tensor bytes are decrypted on the fly into the inference engine, never written back to disk, never aggregated in memory. The encrypted file is what your customer sees; the model is only assembled inside the running process.

AES-256-CTR streaming Password-derived keys Zero plaintext on disk

No plaintext on disk

Decryption is per-tensor and streamed straight into the engine. No temporary file, no decrypted artefact in the model cache.

Standards-based

AES-256 in CTR mode for seekable random access; PBKDF2-SHA256 for password-derived keys; salted, versioned headers.

Drop-in

Encrypt once. Replace LM.Load with LM.LoadEncrypted. Everything else in the pipeline keeps working.

Why this matters

Weights leak. Money leaks faster.

If a fine-tune cost you six figures of compute and proprietary data, a plaintext file on a customer machine is a strategic risk. A copy job walks the model out the door. A snapshot of a desktop image leaks it. A stolen laptop publishes it. Encrypted model loading turns that threat surface into an authentication problem you actually control.

Fine-tune protection

A proprietary fine-tune embeds your training data and your domain expertise. Encryption keeps both inside the process boundary.

Regulatory alignment

Industries that classify model weights as protected data (healthcare, finance, defence) get a turn-key encryption story without integrating a separate key vault.

Subscription enforcement

Combine encryption with license verification to gate access to specific models per tier. Keys distributed by your licensing service, not bundled.

Edge deployments

Devices on customer premises, kiosks, mobile installs all carry the encrypted artefact. The plaintext model exists only at runtime.

No performance tax

Streaming decryption runs at memory-bandwidth speed. Load-time impact is in single-digit percentages on commodity hardware.

Forward compatible

Versioned header format. New encryption schemes can be added without breaking existing artefacts. The reader negotiates the right path automatically.

Encrypt and load

Three phases, one workflow.

Encrypt once at build time, load straight from the encrypted file at runtime, and distribute keys through a licensing or secrets service in production. Pick a tab.

Build-time step. EncryptedGguf.Encrypt takes a plaintext GGUF and writes an .lmk-enc file that ships in your installer. The plaintext stays on the build server.

EncryptOnce.cs
using LMKit.Model;

// Build-time: encrypt the GGUF artefact you ship.
EncryptedGguf.Encrypt(
    sourcePath:    @"D:\models\support-tone-finetune.gguf",
    encryptedPath: @"D:\dist\support-tone-finetune.lmk-enc",
    password:      SecretsVault.GetReleasePassword());

// The .lmk-enc file is what your installer ships. The plaintext .gguf
// stays on the build server.
Where encryption ships

Real scenarios.

ISV redistribution

Independent software vendors distributing fine-tuned models inside their .NET applications. Customers run the app; nobody walks off with the weights.

Per-tenant SaaS

One encrypted model per customer, gated by their license server. Cancellation revokes the key; the file becomes inert.

Tiered model access

Pro tier ships a stronger model, Standard tier ships a smaller one. Both encrypted; tier determines which key the licensing service hands out.

Air-gapped deployments

Defence and regulated environments where weights themselves are classified. Encryption lets the artefact travel through unclassified channels.

Forensic-ready archives

Long-term storage of trained models without plaintext exposure. Decrypt only when reproducing a result; archive integrity preserved.

Mobile and embedded

Devices ship the encrypted blob; the OS keychain holds the key. Lost device, useless artefact.

Versus the alternatives

Most stacks punt this entirely.

Plaintext on disk

The default. Easy to ship, easy to leak. A copy command exfiltrates the model. A backup of the customer machine snapshots it. Nothing protects the weights once they land.

Decrypt-then-load

Decrypt to a temp file, load that, delete after. Fragile: the temp file is plaintext for the duration. Crash dumps, swap, monitoring agents all see it.

LM.LoadEncrypted

Streaming decryption directly into the inference engine. No intermediate plaintext. Standard cipher, password-derived key, per-tensor reads. Drop-in replacement for the standard load path.

Related capabilities

Encryption plus the rest.

Model catalog & loading

The full loading story: download from the catalogue, cache on disk, swap in encrypted variants for restricted deployments.

Model catalog

Quantization

Quantise first, encrypt second. Shipping smaller artefacts and protecting them are independent levers.

Quantization

LoRA integration

Encrypt the base, encrypt the adapters, hot-swap at runtime. Protect both the foundation and the differentiating tune.

LoRA

Edge & offline deployment

Encrypted models on customer premises, mobile devices, kiosks. The right pair for any deployment that leaves the data centre.

Edge deployment

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Distribute the model. Protect the IP.

Get Community Edition Download