Symbolic inference layer
Sits between the model and your output. Steers tokens, never trains them.
Dynamic Sampling is the adaptive inference engine LM-Kit.NET wraps around every open-weight model. It steers token selection in real time using structural awareness, contextual signals, and grammar-aligned validation, so a generic pretrained model behaves like a fine-tuned one on extraction, classification, function calling, and structured generation, without any retraining.
Sits between the model and your output. Steers tokens, never trains them.
Dynamic grammar enforces structure at sampling time.
Real-time signals shape each token decision.
Most inference engines treat a model as a token machine: sample the next probable token, append, repeat. Dynamic Sampling sits one level above that loop. It carries a live model of what is being generated, why, and which alternatives exist, then votes on every token with that context in hand. The model still does the heavy lifting; Dynamic Sampling decides what to do with each step.
The result is a generic open-weight model that holds the line on tight schemas,
emits clean JSON the first time, avoids the classic hallucination patterns, and
arrives at the answer in a fraction of the steps a vanilla pipeline would take.
One toggle (Configuration.EnableDynamicSampling) turns the layer on
and off; it ships on by default.
The layer rests on two cooperating systems. The first guarantees shape, the second guarantees substance.
Pillar A
LM-Kit produces a fresh grammar for every task, derived from the schema, the target output, and the model in use. A novel hybrid approach mixes greedy sampling for the constant parts of the structure with speculative validation for the variable parts. The two paths combined run roughly twice as fast as classical grammar-based sampling, and the output always parses.
Pillar B
On top of grammar, Dynamic Sampling reads contextual signals at every step. A persistent completion state tracks what is being generated, which fields were already emitted, which character runs would be invalid, which value the surrounding context suggests. Sampling decisions blend those signals with the model's distribution, never against it.
Eight behaviours, all stacked transparently underneath your call. You never write a line of code to opt in.
01
The layer always knows what it is sampling. Inside a JSON string, inside a numeric run, inside an array, inside a value of a known format. Decisions are made with that knowledge, not against it.
02
The most probable token is sampled first and validated against the active grammar. If it holds, generation moves on with no extra cost. If not, the slower exhaustive path takes over.
03
Token-level uncertainty is measured live and folded back into the decision. Low-entropy decisions go fast and confident; high-entropy decisions trigger broader validation.
04
A lightweight semantic store carries codes, identifiers, formats, and reference patterns alongside the model. The layer uses it to disambiguate without expanding the prompt.
05
Different models prefer different JSON dialects: spaces around colons, trailing newlines, escaped slashes. The layer reads those preferences and adjusts so the model never fights the grammar.
06
Repetition is blocked where it is structurally wrong (duplicate array items, runaway character runs) and welcomed where it is correct (large numeric identifiers, padding). No blanket penalties.
07
When a candidate token would break the contract, the layer substitutes a structurally safe alternative or replays the decision with a wider candidate list. Inference never restarts from scratch.
08
No architecture coupling, no fine-tuning, no per-model adapter. Drop in a new open-weight release; the same layer keeps working from day one.
You don't import Dynamic Sampling. You use the API; the layer engages itself.
Extraction
Names, dates, identifiers, currency, custom schemas. Dynamic Sampling is what makes a 4B model emit a clean object on the first pass.
Functions
Tool arguments parse every time. The layer holds the call shape and rejects malformed sequences before they reach your dispatcher.
Classification
Categorisation, sentiment, intent. The layer locks the answer space to the enum you defined, so the model can only pick from it.
RAG
When citations, source IDs, and metadata need to round-trip through generation, structural awareness keeps them clean.
Generation
JSON schemas, custom shapes, nested objects, recursive types. Hand the structure, get an object back.
Agents
Multi-step reasoning, planning artifacts, intermediate states. The layer keeps each artefact parseable so the next step has something to grab.
Same model. Same prompt. Different layer underneath.
Vanilla
Dynamic Sampling
Dynamic Sampling is on by default. The only public surface is a global toggle for benchmarking or A/B testing. Everything else happens automatically inside extraction, classification, RAG, function calling, and structured generation.
Configuration.EnableDynamicSampling defaults to truefalse only when benchmarking against a vanilla pipelineusing LMKit.Global; using LMKit.Model; using LMKit.Extraction; // Default: Dynamic Sampling is on. You don't need this line in production. Configuration.EnableDynamicSampling = true; var model = LM.LoadFromModelID("qwen3.5:4b"); // A typical extraction call. Under the hood, Dynamic Sampling: // - builds a per-task grammar // - tracks the partial JSON as it grows // - validates each token against the schema // - blends model preference with grammar shape // - falls back to safe alternatives on conflict var extractor = new TextExtraction(model); extractor.AddElement("date_of_birth", PredefinedStringFormat.Date); extractor.AddElement("email", PredefinedStringFormat.Email); var result = await extractor.ParseAsync( "Bob was born on November 5th, 1981, his email is bob@bobby.com."); Console.WriteLine(result.Content); // { "date_of_birth": "1981-11-05", "email": "bob@bobby.com" }
Dynamic Sampling is the floor, not the ceiling. It composes cleanly with every other inference lever.
Pair
Per-turn temperature, top-k, top-p, repetition control. Use them when you want explicit creative or deterministic behaviour on top of the adaptive baseline.
Open Sampling ControlsPair
The layer works the same across every model in the catalog, from sub-1B to 30B-class MoE. Pick by VRAM budget, not by sampling compatibility.
Open Model CatalogPair
CPU, AVX2, CUDA, Vulkan, Metal. Dynamic Sampling rides on top of every backend with the same behaviour and the same speed-up profile.
Open BackendsWorking console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Why we built it, what it changes, what to expect from a model that runs on top of it.
Open → API referenceDefinitions of the components and behaviours that make up the layer.
Open the reference → API referenceThe single switch that controls the layer at runtime. Default is true.
Open the reference → How-to guideHow Dynamic Sampling composes with explicit sampling controls per turn.
Read the guide →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.
Dynamic Sampling is the reason a small local model can keep up with much larger cloud ones on real structured workloads. Ship it under your next agent, extractor, or RAG pipeline.