Foundation · Local Inference · Dynamic Sampling

The symbolic layer under every LM-Kit call.

Dynamic Sampling is the adaptive inference engine LM-Kit.NET wraps around every open-weight model. It steers token selection in real time using structural awareness, contextual signals, and grammar-aligned validation, so a generic pretrained model behaves like a fine-tuned one on extraction, classification, function calling, and structured generation, without any retraining.

  • Constrained output that always parses, even on small models
  • Adaptive guidance that drops hallucinations on tight schemas
  • Up to roughly 10x faster end-to-end on structured workloads
  • Always on, no fine-tuning, no prompt acrobatics
Always on by default Model-agnostic No fine-tuning required
Layer

Symbolic inference layer

Sits between the model and your output. Steers tokens, never trains them.

Pillar A

Constrained output

Dynamic grammar enforces structure at sampling time.

Pillar B

Adaptive guidance

Real-time signals shape each token decision.

~10x
Faster end-to-end on structured tasks
~2x
Faster than classical grammar sampling
0
Fine-tuning required
What it is

A second brain that thinks in structure.

Most inference engines treat a model as a token machine: sample the next probable token, append, repeat. Dynamic Sampling sits one level above that loop. It carries a live model of what is being generated, why, and which alternatives exist, then votes on every token with that context in hand. The model still does the heavy lifting; Dynamic Sampling decides what to do with each step.

The result is a generic open-weight model that holds the line on tight schemas, emits clean JSON the first time, avoids the classic hallucination patterns, and arrives at the answer in a fraction of the steps a vanilla pipeline would take. One toggle (Configuration.EnableDynamicSampling) turns the layer on and off; it ships on by default.

Two pillars

Constrained output, adaptive guidance.

The layer rests on two cooperating systems. The first guarantees shape, the second guarantees substance.

Inside the layer

What it actually does at runtime.

Eight behaviours, all stacked transparently underneath your call. You never write a line of code to opt in.

01

Structural awareness

The layer always knows what it is sampling. Inside a JSON string, inside a numeric run, inside an array, inside a value of a known format. Decisions are made with that knowledge, not against it.

02

Speculative grammar

The most probable token is sampled first and validated against the active grammar. If it holds, generation moves on with no extra cost. If not, the slower exhaustive path takes over.

03

Contextual perplexity assessment

Token-level uncertainty is measured live and folded back into the decision. Low-entropy decisions go fast and confident; high-entropy decisions trigger broader validation.

04

Auxiliary content as extended context

A lightweight semantic store carries codes, identifiers, formats, and reference patterns alongside the model. The layer uses it to disambiguate without expanding the prompt.

05

Model-preference adaptation

Different models prefer different JSON dialects: spaces around colons, trailing newlines, escaped slashes. The layer reads those preferences and adjusts so the model never fights the grammar.

06

Targeted repetition control

Repetition is blocked where it is structurally wrong (duplicate array items, runaway character runs) and welcomed where it is correct (large numeric identifiers, padding). No blanket penalties.

07

Graceful fallbacks

When a candidate token would break the contract, the layer substitutes a structurally safe alternative or replays the decision with a wider candidate list. Inference never restarts from scratch.

08

Model-agnostic by design

No architecture coupling, no fine-tuning, no per-model adapter. Drop in a new open-weight release; the same layer keeps working from day one.

Where it fires

Every structured call in LM-Kit.NET.

You don't import Dynamic Sampling. You use the API; the layer engages itself.

Extraction

Structured field extraction

Names, dates, identifiers, currency, custom schemas. Dynamic Sampling is what makes a 4B model emit a clean object on the first pass.

Functions

Tool and function calling

Tool arguments parse every time. The layer holds the call shape and rejects malformed sequences before they reach your dispatcher.

Classification

Text and image classification

Categorisation, sentiment, intent. The layer locks the answer space to the enum you defined, so the model can only pick from it.

RAG

Retrieval-augmented generation

When citations, source IDs, and metadata need to round-trip through generation, structural awareness keeps them clean.

Generation

Schema-constrained generation

JSON schemas, custom shapes, nested objects, recursive types. Hand the structure, get an object back.

Agents

Agents and orchestration

Multi-step reasoning, planning artifacts, intermediate states. The layer keeps each artefact parseable so the next step has something to grab.

Why it matters

A vanilla pipeline vs. Dynamic Sampling.

Same model. Same prompt. Different layer underneath.

Vanilla

Token machine

  • One encode and decode per token
  • Stopping point is unknown in advance
  • Schema compliance is best effort
  • Mid-stream errors propagate to the end
  • Brittle to prompt and model swaps
  • Latency drifts as KV-cache grows
Code

One flag, everywhere underneath.

Dynamic Sampling is on by default. The only public surface is a global toggle for benchmarking or A/B testing. Everything else happens automatically inside extraction, classification, RAG, function calling, and structured generation.

  • Configuration.EnableDynamicSampling defaults to true
  • Flip it to false only when benchmarking against a vanilla pipeline
  • No per-call wiring: every structured API picks it up
  • Composes with the per-turn sampling controls (temperature, top-k, top-p)
  • Composes with the grammar layer for fully custom shapes
DynamicSamplingToggle.cs
using LMKit.Global;
using LMKit.Model;
using LMKit.Extraction;

// Default: Dynamic Sampling is on. You don't need this line in production.
Configuration.EnableDynamicSampling = true;

var model = LM.LoadFromModelID("qwen3.5:4b");

// A typical extraction call. Under the hood, Dynamic Sampling:
//   - builds a per-task grammar
//   - tracks the partial JSON as it grows
//   - validates each token against the schema
//   - blends model preference with grammar shape
//   - falls back to safe alternatives on conflict
var extractor = new TextExtraction(model);
extractor.AddElement("date_of_birth", PredefinedStringFormat.Date);
extractor.AddElement("email",        PredefinedStringFormat.Email);

var result = await extractor.ParseAsync(
    "Bob was born on November 5th, 1981, his email is bob@bobby.com.");

Console.WriteLine(result.Content);
// { "date_of_birth": "1981-11-05", "email": "bob@bobby.com" }
LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Build on a layer that actually steers.

Dynamic Sampling is the reason a small local model can keep up with much larger cloud ones on real structured workloads. Ship it under your next agent, extractor, or RAG pipeline.

Download free Read the deep dive