No plaintext on disk
Decryption is per-tensor and streamed straight into the engine. No temporary file, no decrypted artefact in the model cache.
Fine-tuned models are intellectual property. So are the proprietary
datasets that produced them. Yet most local-LLM stacks ask you to
distribute weights as plaintext on disk, copyable in a clipboard
move. LM.LoadEncrypted ships a stream-based decryption
path: plaintext tensor bytes are decrypted on the fly into the
inference engine, never written back to disk, never aggregated in
memory. The encrypted file is what your customer sees; the model is
only assembled inside the running process.
Decryption is per-tensor and streamed straight into the engine. No temporary file, no decrypted artefact in the model cache.
AES-256 in CTR mode for seekable random access; PBKDF2-SHA256 for password-derived keys; salted, versioned headers.
Encrypt once. Replace LM.Load with LM.LoadEncrypted. Everything else in the pipeline keeps working.
If a fine-tune cost you six figures of compute and proprietary data, a plaintext file on a customer machine is a strategic risk. A copy job walks the model out the door. A snapshot of a desktop image leaks it. A stolen laptop publishes it. Encrypted model loading turns that threat surface into an authentication problem you actually control.
A proprietary fine-tune embeds your training data and your domain expertise. Encryption keeps both inside the process boundary.
Industries that classify model weights as protected data (healthcare, finance, defence) get a turn-key encryption story without integrating a separate key vault.
Combine encryption with license verification to gate access to specific models per tier. Keys distributed by your licensing service, not bundled.
Devices on customer premises, kiosks, mobile installs all carry the encrypted artefact. The plaintext model exists only at runtime.
Streaming decryption runs at memory-bandwidth speed. Load-time impact is in single-digit percentages on commodity hardware.
Versioned header format. New encryption schemes can be added without breaking existing artefacts. The reader negotiates the right path automatically.
Encrypt once at build time, load straight from the encrypted file at runtime, and distribute keys through a licensing or secrets service in production. Pick a tab.
Build-time step. EncryptedGguf.Encrypt takes a plaintext
GGUF and writes an .lmk-enc file that ships in your
installer. The plaintext stays on the build server.
using LMKit.Model; // Build-time: encrypt the GGUF artefact you ship. EncryptedGguf.Encrypt( sourcePath: @"D:\models\support-tone-finetune.gguf", encryptedPath: @"D:\dist\support-tone-finetune.lmk-enc", password: SecretsVault.GetReleasePassword()); // The .lmk-enc file is what your installer ships. The plaintext .gguf // stays on the build server.
Runtime step. LM.LoadEncrypted stream-decrypts the
weights as they load. No plaintext is ever written to disk. From
there, the model is just an LM: chat, embed, anything.
// Runtime: load straight from the encrypted file. No intermediate plaintext. var model = LM.LoadEncrypted( path: @"C:\Program Files\YourApp\models\support-tone-finetune.lmk-enc", password: keyService.ResolveModelKey("support-tone")); // From here, the model behaves exactly like any other LM. var chat = new MultiTurnConversation(model); var reply = await chat.SubmitAsync("How do I escalate this ticket?");
Production pattern. The key never lives on disk or in the binary; it comes from a licensing service, a key vault, or an SSO-mediated session. Per-tenant entitlement gates which model loads at all.
// Production pattern: keys come from a licensing or secrets service, // never from disk, never embedded in the binary. async Task<LM> LoadModelForTenantAsync(string tenantId, string modelId) { var entitlement = await _licenseClient.CheckEntitlementAsync(tenantId, modelId); if (!entitlement.IsValid) throw new UnauthorizedAccessException(); var password = await _keyVault.GetModelKeyAsync(modelId, entitlement.SessionToken); return LM.LoadEncrypted(_modelStorage.GetPath(modelId), password); }
Independent software vendors distributing fine-tuned models inside their .NET applications. Customers run the app; nobody walks off with the weights.
One encrypted model per customer, gated by their license server. Cancellation revokes the key; the file becomes inert.
Pro tier ships a stronger model, Standard tier ships a smaller one. Both encrypted; tier determines which key the licensing service hands out.
Defence and regulated environments where weights themselves are classified. Encryption lets the artefact travel through unclassified channels.
Long-term storage of trained models without plaintext exposure. Decrypt only when reproducing a result; archive integrity preserved.
Devices ship the encrypted blob; the OS keychain holds the key. Lost device, useless artefact.
The default. Easy to ship, easy to leak. A copy command exfiltrates the model. A backup of the customer machine snapshots it. Nothing protects the weights once they land.
Decrypt to a temp file, load that, delete after. Fragile: the temp file is plaintext for the duration. Crash dumps, swap, monitoring agents all see it.
Streaming decryption directly into the inference engine. No intermediate plaintext. Standard cipher, password-derived key, per-tensor reads. Drop-in replacement for the standard load path.
The full loading story: download from the catalogue, cache on disk, swap in encrypted variants for restricted deployments.
Quantise first, encrypt second. Shipping smaller artefacts and protecting them are independent levers.
Encrypt the base, encrypt the adapters, hot-swap at runtime. Protect both the foundation and the differentiating tune.
Encrypted models on customer premises, mobile devices, kiosks. The right pair for any deployment that leaves the data centre.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.