Introduction
Over the past year, weâve been hard at work expanding LM-Kit to empower developers with cutting-edge AI capabilities. We are now excited to announce that LM-Kit has officially gone multimodal, thanks to the addition of support for Vision Language Models (VLMs). This milestone paves the way for delivering a multi-agents orchestration system, which will be one of our main focuses in the coming year.
- Dynamic Sampling: Leverage our proprietary method for on-the-fly sampling adaptation, described in detail on our blog.
- Advanced Function Calling: Easily integrate your own functions and workflows using our innovative function calling approach, showcased here.
Â
In addition, LM-Kit comes with a growing library of prebuilt AI agents, providing out-of-the-box solutions for various use cases,and weâre continuously expanding this list to cover even more domains.
By becoming multimodal, LM-Kit now seamlessly integrates visual and textual inputs, opening up endless possibilities for real-world applications. Looking ahead to 2025, our priority is to harness these capabilities toward an orchestration layer that manages multiple agents simultaneously,further expanding the scope of what you can achieve with LM-Kit.
Major Milestone: Vision Support in LM-Kit
New .lmk Model Architecture
Weâve introduced a new model format (.lmk) to streamline how models are packaged and deployed with LM-Kit. This format is essentially a zip file that contains:
- Metadata
- Base model tensors
- Other modal tensors
No Changes Required
One of the main benefits of this .lmk format is that no modifications are needed in your existing LM-Kit applications. The .lmk files are fully compatible with the current LM-Kit API. As soon as you load an .lmk file that includes a vision model, your AI agents gain instant image interpretation and reasoning capabilities.
Single-File Deployment
By consolidating all necessary components,metadata, base model, and any specialized modules like vision,into a single .lmk file, shipping and version control become much simpler. You just need to deploy one file, rather than juggling multiple files and configurations.
Future-Ready
While this announcement focuses on vision, the .lmk format is designed to handle everything in LM-Kit, including additional models, audio, and any other future capabilities. Itâs a universal container for all of LM-Kitâs current and upcoming functionalities
Prebuilt Vision-Enabled Agents and Expansion Plans
For this initial release, weâve added vision support to our prebuilt conversational AI agents. Over the next few weeks, weâll extend the same capability to other types of agents in LM-Kit, further paving the way for multimodal AI agent orchestration,where text and images (and soon audio) are processed together for richer, more context-aware interactions.
Where to Get the Models
If youâre eager to experiment with the new .lmk files, you can find them in our Hugging Face repository. Weâve tested various configurations, and the results have gone far beyond our expectations.
See Vision in Action: Real-World Examples of LM-Kitâs Vision Support
Doesn’t this look like magic? This demo is running entirely on my laptop, performing inference with a free, Apache-2 licensed model.
Try It Yourself
For those ready to dive in, check out our tutorial here:
Multi-Turn Chat with Vision Demo
âThe Multi-Turn Chat with Vision demo extends the capabilities of the LM-Kit.NET SDK by adding support for visual attachments in a multi-turn conversational flow. This sample shows how to integrate both Large Language Models (LLMs) and Small Language Models (SLMs) into a .NET application. By supporting models of various sizes, you can run the chatbot on devices ranging from powerful servers to smaller edge devices. The demo produces image-driven insights in addition to maintaining text-based conversational context across multiple exchanges.â
If youâre looking for a more hands-on approach, you can explore the source code directly on GitHub:
LM-Kit.NET Samples: Multi-Turn Chat with Vision