Foundation · Local Inference · Dynamic Sampling

The symbolic layer under every LM-Kit call.

Dynamic Sampling is the adaptive inference engine LM-Kit.NET wraps around every open-weight model. It steers token selection in real time using structural awareness, contextual signals, and grammar-aligned validation, so a generic pretrained model behaves like a fine-tuned one on extraction, classification, function calling, and structured generation, without any retraining.

Constrained output that always parses, even on small models
Adaptive guidance that drops hallucinations on tight schemas
Up to roughly 10x faster end-to-end on structured workloads
Always on, no fine-tuning, no prompt acrobatics

Start building free Read the deep dive

Always on by default Model-agnostic No fine-tuning required

Layer

Symbolic inference layer

Sits between the model and your output. Steers tokens, never trains them.

Pillar A

Constrained output

Dynamic grammar enforces structure at sampling time.

Pillar B

Adaptive guidance

Real-time signals shape each token decision.

~10x
Faster end-to-end on structured tasks

~2x
Faster than classical grammar sampling

0
Fine-tuning required

What it is

A second brain that thinks in structure.

Most inference engines treat a model as a token machine: sample the next probable token, append, repeat. Dynamic Sampling sits one level above that loop. It carries a live model of what is being generated, why, and which alternatives exist, then votes on every token with that context in hand. The model still does the heavy lifting; Dynamic Sampling decides what to do with each step.

The result is a generic open-weight model that holds the line on tight schemas, emits clean JSON the first time, avoids the classic hallucination patterns, and arrives at the answer in a fraction of the steps a vanilla pipeline would take. One toggle (Configuration.EnableDynamicSampling) turns the layer on and off; it ships on by default.

Two pillars

Constrained output, adaptive guidance.

The layer rests on two cooperating systems. The first guarantees shape, the second guarantees substance.

Pillar A

Constrained output

LM-Kit produces a fresh grammar for every task, derived from the schema, the target output, and the model in use. A novel hybrid approach mixes greedy sampling for the constant parts of the structure with speculative validation for the variable parts. The two paths combined run roughly twice as fast as classical grammar-based sampling, and the output always parses.

JSON is the canonical shape; works across every model family
Pre-tokenised constants avoid repeated encode and decode cycles
Speculative validation rejects invalid tokens before they cost anything
Graceful fallback to traditional grammar paths under high entropy

Pillar B

Adaptive guidance

On top of grammar, Dynamic Sampling reads contextual signals at every step. A persistent completion state tracks what is being generated, which fields were already emitted, which character runs would be invalid, which value the surrounding context suggests. Sampling decisions blend those signals with the model's distribution, never against it.

Per-token contextual perplexity assessment
Lightweight semantic memory for codes, identifiers, formats
Structural rejection of repetitive or malformed runs
Targeted fallbacks instead of blanket repetition penalties

Inside the layer

What it actually does at runtime.

Eight behaviours, all stacked transparently underneath your call. You never write a line of code to opt in.

Structural awareness

The layer always knows what it is sampling. Inside a JSON string, inside a numeric run, inside an array, inside a value of a known format. Decisions are made with that knowledge, not against it.

Speculative grammar

The most probable token is sampled first and validated against the active grammar. If it holds, generation moves on with no extra cost. If not, the slower exhaustive path takes over.

Contextual perplexity assessment

Token-level uncertainty is measured live and folded back into the decision. Low-entropy decisions go fast and confident; high-entropy decisions trigger broader validation.

Auxiliary content as extended context

A lightweight semantic store carries codes, identifiers, formats, and reference patterns alongside the model. The layer uses it to disambiguate without expanding the prompt.

Model-preference adaptation

Different models prefer different JSON dialects: spaces around colons, trailing newlines, escaped slashes. The layer reads those preferences and adjusts so the model never fights the grammar.

Targeted repetition control

Repetition is blocked where it is structurally wrong (duplicate array items, runaway character runs) and welcomed where it is correct (large numeric identifiers, padding). No blanket penalties.

Graceful fallbacks

When a candidate token would break the contract, the layer substitutes a structurally safe alternative or replays the decision with a wider candidate list. Inference never restarts from scratch.

Model-agnostic by design

No architecture coupling, no fine-tuning, no per-model adapter. Drop in a new open-weight release; the same layer keeps working from day one.

Where it fires

Every structured call in LM-Kit.NET.

You don't import Dynamic Sampling. You use the API; the layer engages itself.

Extraction

Structured field extraction

Names, dates, identifiers, currency, custom schemas. Dynamic Sampling is what makes a 4B model emit a clean object on the first pass.

Functions

Tool and function calling

Tool arguments parse every time. The layer holds the call shape and rejects malformed sequences before they reach your dispatcher.

Classification

Text and image classification

Categorisation, sentiment, intent. The layer locks the answer space to the enum you defined, so the model can only pick from it.

RAG

Retrieval-augmented generation

When citations, source IDs, and metadata need to round-trip through generation, structural awareness keeps them clean.

Generation

Schema-constrained generation

JSON schemas, custom shapes, nested objects, recursive types. Hand the structure, get an object back.

Agents

Agents and orchestration

Multi-step reasoning, planning artifacts, intermediate states. The layer keeps each artefact parseable so the next step has something to grab.

Why it matters

A vanilla pipeline vs. Dynamic Sampling.

Same model. Same prompt. Different layer underneath.

Vanilla

Token machine

One encode and decode per token
Stopping point is unknown in advance
Schema compliance is best effort
Mid-stream errors propagate to the end
Brittle to prompt and model swaps
Latency drifts as KV-cache grows

Dynamic Sampling

Structured by construction

Constant segments batched, variable segments sampled
Stopping point known from the schema
Output always parses; grammar enforces it
Errors caught at the offending token and recovered in place
Same code carries across model families and sizes
Latency stays stable end to end

Code

One flag, everywhere underneath.

Dynamic Sampling is on by default. The only public surface is a global toggle for benchmarking or A/B testing. Everything else happens automatically inside extraction, classification, RAG, function calling, and structured generation.

Configuration.EnableDynamicSampling defaults to true
Flip it to false only when benchmarking against a vanilla pipeline
No per-call wiring: every structured API picks it up
Composes with the per-turn sampling controls (temperature, top-k, top-p)
Composes with the grammar layer for fully custom shapes

DynamicSamplingToggle.cs

using LMKit.Global;
using LMKit.Model;
using LMKit.Extraction;

// Default: Dynamic Sampling is on. You don't need this line in production.
Configuration.EnableDynamicSampling = true;

var model = LM.LoadFromModelID("qwen3.5:4b");

// A typical extraction call. Under the hood, Dynamic Sampling:
//   - builds a per-task grammar
//   - tracks the partial JSON as it grows
//   - validates each token against the schema
//   - blends model preference with grammar shape
//   - falls back to safe alternatives on conflict
var extractor = new TextExtraction(model);
extractor.AddElement("date_of_birth", PredefinedStringFormat.Date);
extractor.AddElement("email",        PredefinedStringFormat.Email);

var result = await extractor.ParseAsync(
    "Bob was born on November 5th, 1981, his email is bob@bobby.com.");

Console.WriteLine(result.Content);
// { "date_of_birth": "1981-11-05", "email": "bob@bobby.com" }

Composition

How it pairs with the rest of the stack.

Dynamic Sampling is the floor, not the ceiling. It composes cleanly with every other inference lever.

Pair

Sampling Controls

Per-turn temperature, top-k, top-p, repetition control. Use them when you want explicit creative or deterministic behaviour on top of the adaptive baseline.

Open Sampling Controls

Pair

Model Catalog

The layer works the same across every model in the catalog, from sub-1B to 30B-class MoE. Pick by VRAM budget, not by sampling compatibility.

Open Model Catalog

Pair

Hardware Backends

CPU, AVX2, CUDA, Vulkan, Metal. Dynamic Sampling rides on top of every backend with the same behaviour and the same speed-up profile.

Open Backends

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Blog

Introducing Dynamic Sampling

Why we built it, what it changes, what to expect from a model that runs on top of it.

Open → API reference

Glossary: Dynamic Sampling

Definitions of the components and behaviours that make up the layer.

Open the reference → API reference

Configuration.EnableDynamicSampling

The single switch that controls the layer at runtime. Default is true.

Open the reference → How-to guide

Control token sampling with dynamic strategies

How Dynamic Sampling composes with explicit sampling controls per turn.

Read the guide →

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

RAG & Knowledge

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Build on a layer that actually steers.

Dynamic Sampling is the reason a small local model can keep up with much larger cloud ones on real structured workloads. Ship it under your next agent, extractor, or RAG pipeline.

Download free Read the deep dive