⚡Introducing Dynamic Sampling in LM-Kit.NET: Up to 75% Error Reduction and 2x Faster Processing for LLMs

October 2, 2024

Introduction

We are thrilled to announce a significant enhancement in our latest release of LM-Kit.NET: the introduction of a new parameter that allows you to activate or deactivate our Dynamic Sampling technology. This isn’t just a simple toggle—it’s a gateway to making Large Language Models (LLMs) faster and smarter, enabling them to better serve various industry use cases.

Dynamic Sampling is deeply embedded at the core of LM-Kit.NET, utilized extensively across both internal mechanisms and public API layers. Our engineering team is committed to iterative improvements, following continuous innovation cycles to refine this technology. We will frequently share updates and insights about the benefits and outcomes that Dynamic Sampling offers.

Looking ahead, in 2025, we plan to deliver high-level capabilities that enable the building and orchestration of autonomous agents using graph-based structures. Dynamic Sampling will be a pivotal component in tackling the challenges associated with these advancements. Stay tuned for more exciting developments!

What Is Dynamic Sampling?

Dynamic Sampling is an advanced inference strategy designed to optimize text generation in LLMs. Traditional sampling methods, like selecting the most probable token or considering top candidates in a linear fashion, often fall short in producing coherent and contextually appropriate outputs. Dynamic Sampling addresses these limitations by intelligently evaluating multiple token options during inference, considering a multitude of factors to select the best fit.

Key Components of Dynamic Sampling

Holistic Evaluation: Considers context, predefined constraint-guided instructions (e.g., JSON schemas), output objectives, perplexity, current completion state, and model vocabulary.
Adaptive Style Following: Aligns with the model’s stylistic tendencies to maintain lower perplexity and reduce incoherence.
Advanced NLP Techniques: Applies syntax parsing, semantic analysis, and other NLP methods to enhance the current completion state.
Real-Time Strategy Blending: Dynamically blends different sampling methods based on real-time conditions and requirements.

How Does It Work?

Dynamic Sampling operates by performing a multi-factor analysis at each step of the token generation process:

Context Awareness: It examines the surrounding text and predefined contexts to select tokens that fit naturally within the narrative.
Constraint Adherence: It incorporates predefined instructions, such as adhering to a specific JSON schema, ensuring outputs meet exacting requirements.
Perplexity Optimization: It monitors perplexity—a measure of how well the model predicts the next token—and adjusts tolerance levels to maintain coherence.
Adaptive Adjustments: It makes real-time adjustments based on the current completion state, model vocabulary, and stylistic considerations to ensure logical and fluent outputs.

By integrating these components, Dynamic Sampling transcends simple probability-based methods, offering a more nuanced and effective approach to text generation that results in up to 75% fewer errors and processing speeds up to 2x faster.

Benefits of Dynamic Sampling

Dynamic Sampling brings several significant benefits to the table:

Elimination of Prompt Engineering and Fine-Tuning: By intelligently managing token selection, Dynamic Sampling reduces or eliminates the need for extensive prompt engineering and model fine-tuning. This allows you to achieve high-quality outputs without spending excessive time crafting prompts or retraining models.
Simplified Development: Developers can focus on building applications rather than dealing with complex prompt structures or model adjustments. Dynamic Sampling streamlines the development process, making it easier to integrate LLMs into your projects.
Model-Agnostic Inference: Dynamic Sampling works effectively across different models and architectures without requiring model-specific adaptations. This flexibility enables you to switch between models or use multiple models in your applications without compatibility issues.
Resource Saving: By optimizing the inference process, Dynamic Sampling reduces computational overhead, leading to faster processing times and lower resource consumption. This is particularly beneficial when deploying models on devices with limited computational capabilities, such as older smartphones or computers without GPUs.

Benchmarking Dynamic Sampling

To evaluate the impact of Dynamic Sampling, we conducted extensive benchmarking focused on function calling (a.k.a. Tool calling)—a high-level capability that significantly benefits from this technology.

Function Calling and Agent Orchestration

Function calling within LM-Kit.NET involves the orchestration of two agents: a classifier and a feature extractor. These agents work in tandem to interpret the input prompt and generate the appropriate function call with the correct parameters.

Classifier: Determines which function should be called based on the input prompt from among multiple available functions.
Feature Extractor: Extracts the necessary parameters and their values required to execute the chosen function, ensuring they align with predefined schemas or constraints.

The successful execution of function calling depends on both agents performing accurately. The classifier must select the correct function, and the feature extractor must provide the correct parameters.

Testing Protocol

Scope: We tested around 300 prompts, each expected to invoke a single function from over 60 possible functions spread across 9 plugins (e.g., book store, finance, geometry, math, number operations, string manipulations, UI controller, weather plugin).
Test Cases: The test cases included both manually crafted scenarios, reflecting real-world use cases with tricky samples, and synthetically generated prompts using OpenAI.
Complexity: The prompts ranged from medium to high complexity, with some requiring parameter values to be inferred from semantic meanings.
Accuracy Validation: Our validation criteria were stringent. A success was recorded only if both agents—the classifier and the feature extractor—provided correct results. Any partial failure, where either agent did not perform accurately, was considered a complete failure for the process. This strict evaluation ensures that the function call not only invokes the correct function but also includes the correct parameters.

Sample Prompts

Here are some examples of the prompts used:

“What periodic payment is required to reach $100,000 in 20 years at 7% annual interest, compounded annually?”
“Find the net present value of an investment costing -$200,000 with cash flows of $70,000 per year for 3 years at a discount rate of 6%.”
“Solve quadratic equation x^2 + 1 = 0”
“Compute square root of 6,479,488,794,256”
“Give me the temperature of Blagnac on 5 Nov 1981”
“Send an email to ‘[email protected]‘ and mark it as important”

Feel free to reach out to our team if you are interested in accessing the complete testing suite.

Results

The following table summarizes the benchmarking results across various models:

Model Name	Accuracy without DS (%)	Accuracy with DS (%)	Avg Processing Time without DS (ms)	Avg Processing Time with DS (ms)	Speed Factor	Percentage of Reduced Errors (%)
Mistral-0.1-OpenOrca-7B-Q4_K_M.gguf	96.6	97.0	396.9	273.0	x1.45	11.76
Mistral-0.3-7B-Instruct-Q4_K_M.gguf	98.6	99.0	395.2	263.4	x1.50	28.57
Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf	95.3	97.6	670.6	433.7	x1.55	48.94
Llama-3.1-8B-Instruct-Q4_K_M.gguf	97.6	97.6	485.1	287.3	x1.69	0.00
Qwen-2.5-7B-Instruct-Q4_K_M.gguf	97.3	99.3	528.5	332.8	x1.59	74.07
Gemma-2-2B-Q4_K_M.gguf	83.4	85.1	554.6	325.8	x1.70	10.24
Gemma-2-9B-Q4_K_M.gguf	98.0	98.3	737.2	500.0	x1.47	15.00
Phi-3.5-mini-Instruct-Q4_K_M.gguf	97.6	97.6	288.8	210.3	x1.37	0.00
Phi-3-medium-4k-Instruct-Q4_K_M.gguf	93.2	99.3	667.8	467.5	x1.43	89.71
Llama-3.2-1B-Instruct-Q4_K_M.gguf	80.4	81.4	348.0	186.1	x1.87	5.10
Average	93.80	95.22	507.27	327.99	x1.56	28.34

Takeaways

Up to 75% Error Reduction: Some models experienced error reductions of up to 89.71%, with an average reduction of 28.34% across all models.
Up to 2x Faster Processing: Processing speeds improved significantly, with some models achieving up to 1.87x faster processing times. The average speed factor was x1.56.
Accuracy Improvement: On average, there was a 1.42% increase in accuracy with Dynamic Sampling activated. Some models saw improvements up to 6.1%.
Consistent Gains Across Models: The benefits of Dynamic Sampling were observed across all tested models, regardless of size or architecture.

Future Outlook

Our commitment to innovation extends beyond the current release. Dynamic Sampling is a significant milestone that sets the stage for even more advanced capabilities. We are diligently working to make LLMs accessible and efficient on a wide array of devices—including older smartphones and computers without GPUs. By optimizing performance and resource utilization, we aim to democratize AI technology, ensuring that anyone can leverage advanced language models regardless of their hardware limitations.

In addition, we’re developing high-level functionalities that will enable the construction and orchestration of autonomous agents using graph-based structures. These advancements will unlock new possibilities across various industries and applications, addressing complex challenges with sophisticated solutions.

Conclusion

The introduction of the parameter to activate or deactivate Dynamic Sampling empowers developers with greater control over their text generation processes. Whether you’re aiming for higher accuracy, processing speeds up to 2x faster, or up to 75% fewer errors, Dynamic Sampling offers tangible benefits that enhance the performance of your applications.

As we explore new frontiers with LM-Kit.NET, we invite you to experiment with Dynamic Sampling in your projects and share your experiences. Your insights are invaluable and help shape the future direction of our technology.

Stay tuned for our upcoming blog articles, where we will delve deeper into function calling, autonomous agent orchestration, and other exciting features powered by Dynamic Sampling.