MultiTurnConversation
History-aware chat with image attachments.
Attach one or more images to a conversation, ask follow-up
questions, stream the response, call tools. Same
MultiTurnConversation primitive that drives every
text-only chat in LM-Kit.NET. The model holds visual context
across turns.
MultiTurnConversationHistory-aware chat with image attachments.
AttachmentFile, byte array, stream, or base64.
Tokens stream in real time for responsive UI.
01
Attach several images to a single message. The VLM reasons across them: "compare these two", "which one is closer to spec".
02
The model remembers images from earlier turns. Ask "and what about the second photo I sent?" three messages later.
03
Stream tokens as the model generates. Build responsive chat UIs without batching the whole answer.
04
VLMs that support function calling (like GLM-V Flash) can call your tools mid-conversation with image context.
using LMKit.Model; using LMKit.TextGeneration; using LMKit.Graphics; var vlm = LM.LoadFromModelID("qwen3-vl:8b"); var chat = new MultiTurnConversation(vlm) { SystemMessage = "You are a careful visual inspector." }; // Turn 1: send an image. var first = await chat.SubmitAsync( "Describe this part and flag any defects.", Attachment.FromFile("photo-1.jpg")); // Turn 2: send a second image, reference the first. var second = await chat.SubmitAsync( "This one is from the same batch. Compare to the first.", Attachment.FromFile("photo-2.jpg")); // Turn 3: stream the verdict. await foreach (var token in chat.StreamAsync( "Verdict: which one ships, which one goes back?")) { Console.Write(token.Text); }
User drops a screenshot, the assistant explains what to do. Multi-turn back-and-forth without ever leaving the device.
Technicians snap photos of equipment, the on-device assistant reasons across them, drafts a report. Works in low-connectivity sites.
Paste a screenshot of a UI bug; the assistant walks the user through the fix while preserving conversation history.
Patient uploads photos of symptoms; on-device VLM drafts triage notes. PHI never crosses the network boundary.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: drop one or more images into a multi-turn chat and stream the response.
Open on GitHub → SampleAnnotated walkthrough of the multi-turn-chat-with-vision sample on the docs site.
Read on docs → How-to guideHow-to: load a VLM, attach an image, stream the answer, constrain output to a schema.
Read the guide → API referenceAPI reference for the conversation primitive that drives every chat in LM-Kit.NET, including VLM chats.
Open the reference →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.