Introduction
Over the past year, weâve been hard at work expanding LM-Kit to empower developers with cutting-edge AI capabilities. We are now excited to announce that LM-Kit has officially gone multimodal, thanks to the addition of support for Vision Language Models (VLMs). This milestone paves the way for delivering a multi-agents orchestration system, which will be one of our main focuses in the coming year.
Our journey in 2024 revolved around building a state-of-the-art inference systemâfully native to the .NET ecosystemâthat can run thousands of local language models (for example, phi 4, Llama 3.3, Mistral, Gemma 2, and Qwen 2.5) on any device. This system provides robust features such as:
- Dynamic Sampling: Leverage our proprietary method for on-the-fly sampling adaptation, described in detail on our blog.
- Advanced Function Calling: Easily integrate your own functions and workflows using our innovative function calling approach, showcased here.
Â
In addition, LM-Kit comes with a growing library of prebuilt AI agents, providing out-of-the-box solutions for various use casesâand weâre continuously expanding this list to cover even more domains.
By becoming multimodal, LM-Kit now seamlessly integrates visual and textual inputs, opening up endless possibilities for real-world applications. Looking ahead to 2025, our priority is to harness these capabilities toward an orchestration layer that manages multiple agents simultaneouslyâfurther expanding the scope of what you can achieve with LM-Kit.
Major Milestone: Vision Support in LM-Kit
New .lmk Model Architecture
Weâve introduced a new model format (.lmk
) to streamline how models are packaged and deployed with LM-Kit. This format is essentially a zip file that contains:
- Metadata
- Base model tensors
- Other modal tensors
No Changes Required
One of the main benefits of this .lmk
format is that no modifications are needed in your existing LM-Kit applications. The .lmk
files are fully compatible with the current LM-Kit API. As soon as you load an .lmk
file that includes a vision model, your AI agents gain instant image interpretation and reasoning capabilities.
Single-File Deployment
By consolidating all necessary componentsâmetadata, base model, and any specialized modules like visionâinto a single .lmk
file, shipping and version control become much simpler. You just need to deploy one file, rather than juggling multiple files and configurations.
Future-Ready
While this announcement focuses on vision, the .lmk
format is designed to handle everything in LM-Kit, including additional models, audio, and any other future capabilities. Itâs a universal container for all of LM-Kitâs current and upcoming functionalities
Prebuilt Vision-Enabled Agents and Expansion Plans
For this initial release, weâve added vision support to our prebuilt conversational AI agents. Over the next few weeks, weâll extend the same capability to other types of agents in LM-Kit, further paving the way for multimodal AI agent orchestrationâwhere text and images (and soon audio) are processed together for richer, more context-aware interactions.
Where to Get the Models
If youâre eager to experiment with the new .lmk
files, you can find them in our Hugging Face repository. Weâve tested various configurations, and the results have gone far beyond our expectations.
See Vision in Action: Real-World Examples of LM-Kitâs Vision Support
Doesn’t this look like magic? This demo is running entirely on my laptop, performing inference with a free, Apache-2 licensed model.
Try It Yourself
For those ready to dive in, check out our tutorial here:
âThe Multi-Turn Chat with Vision demo extends the capabilities of the LM-Kit.NET SDK by adding support for visual attachments in a multi-turn conversational flow. This sample shows how to integrate both Large Language Models (LLMs) and Small Language Models (SLMs) into a .NET application. By supporting models of various sizes, you can run the chatbot on devices ranging from powerful servers to smaller edge devices. The demo produces image-driven insights in addition to maintaining text-based conversational context across multiple exchanges.â
If youâre looking for a more hands-on approach, you can explore the source code directly on GitHub:
LM-Kit: The Four-in-One AI Agent Runtime
LM-Kit isnât just an inference toolâitâs a very unique suite for creating and managing intelligent agents. Each of its four components tackles a key part of the AI journey:
Multimodal Language Model Inference
Effortlessly run LLMs, SLMs, and VLMs on-deviceâprocessing text and images with free, open-source models.Prebuilt Agent Customization
Leverage existing AI agents, then tailor them to your specific needs for quick and efficient deployment.Agent Creation from Scratch
Build specialized AI agents entirely on your own terms, with flexible design and capabilities.Multi-Agent Orchestration
Coordinate entire fleets of AI agents, forming sophisticated collaborative solutions.
From stand-alone inference to orchestrating complex agent networks, LM-Kit empowers you to craft cutting-edge AI applicationsâsecurely, efficiently, and without leaving your device.
Our Vision for Generative AI in 2025
From Language Models to Agentic Solutions
Historically, we built software by coding a rigid set of predefined rules. As AI evolves from standalone language models to agentic solutions, our focus has shifted toward orchestrating autonomous AI âagentsâ that function more like small teams of humans. Instead of hardcoding every procedure, we now define objectives, instructions, roles, permissions, and organizational structures.
Because these agents learn from accumulated knowledge and experiencesâmuch as people doâwe can no longer rely on purely deterministic methods. This change demands new software architectures designed to handle flexibility, autonomy, and decision-making at scale, marking a transition from traditional rule-based development to a more dynamic, agent-based approach.
LM-Kitâs mission is to help developers embrace this new paradigm by providing a comprehensive AI Agent Runtime framework, enabling them to build, deploy, and manage these autonomous AI agents more effectively.
2024 in Review
In 2024, LM-Kit’s research and development focused on the question: âWhat can we really do with LLMs?â To address this, we developed a local inference system that supports text and vision modalities, utilizing Large Language Models (LLMs), Small Language Models (SLMs), and Vision Language Models (VLMs). Additionally, we created out-of-the-box AI agents tailored to various industry-specific use cases, showcasing the practical applications and versatility of these models.
Looking Forward to 2025
Moving into 2025, we will:
- Continue to enhance our inference system to ensure maximum performance and flexibility.
- Invest heavily in AI Agent Orchestration features, enriching our cognitive framework for more collaborative and autonomous agent interactions.
- Introduce audio/voice support, further expanding our multimodal approach and offering even more ways for agents to communicate.
Get Started Today
Weâre thrilled to see what our developer community will build with these new vision features in LM-Kit. Whether youâre creating a simple chatbot or orchestrating a multi-agent system, our goal is to make AI development as open, flexible, and powerful as possible.
Ready to start? Download the latest version of LM-Kit, explore our Hugging Face repository for .lmk
models, and check out our tutorials and sample code to integrate vision support into your apps right away.
Thank you for being part of our journey. We canât wait to see what you build with LM-Kitâs new multimodal AI capabilitiesâand weâre excited to bring you even more advanced features in 2025!