JSON array
Grammar-constrained output emits a typed list of tags.
Multi-label tagging from any image: people, objects, scenes, attributes, custom categories. Output a clean array of tags with confidence scores, ready to drop into your asset catalog or moderation pipeline.
Grammar-constrained output emits a typed list of tags.
Token logprobs translate to per-label confidence.
Free-form tags or a fixed taxonomy via grammar enumeration.
01
Unlike classification, labeling returns a SET. Three to thirty tags per image with no per-call cost.
02
Grammar enforces a JSON shape: array of strings, plus optional confidence floats. Parseable without try-catch.
03
Restrict the output to your label list (e.g. an asset-management taxonomy with 500 categories) via grammar enumeration.
04
Generate tags in any language the VLM supports. Useful for international content pipelines.
using LMKit.Model; using LMKit.TextGeneration; using LMKit.Graphics; using System.Text.Json; var vlm = LM.LoadFromModelID("qwen3-vl:8b"); var chat = new SingleTurnConversation(vlm); // 1. JSON schema: an array of tag strings, max 12 entries. var grammar = Grammar.FromJsonSchema(""" { "type": "object", "properties": { "tags": { "type": "array", "items": { "type": "string" }, "maxItems": 12 } }, "required": ["tags"] } """); // 2. Ask the VLM to tag the image. var json = await chat.SubmitAsync( "Tag this image. Return up to 12 short, single-word tags.", Attachment.FromFile("asset.jpg"), grammar); // 3. Parse and use. var doc = JsonDocument.Parse(json); var tags = doc.RootElement.GetProperty("tags"); foreach (var tag in tags.EnumerateArray()) Console.WriteLine(tag.GetString());
Auto-tag millions of stock images, marketing photos, archive scans. Re-tag with a new taxonomy in a single pass.
Tag product photos with attributes (color, material, style, season). Power faceted search without a manual taxonomy team.
Flag user uploads with multiple safety labels (violence, NSFW, IP). Run on the upload server; never send to a cloud.
Generate tags for images that feed your search engine. Plain text search becomes image-aware without extra infrastructure.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo adaptable to multi-label tagging via grammar enumeration.
Open on GitHub → How-to guideHow to constrain a VLM to a JSON array of tags with confidence scores.
Read the guide → How-to guidePatterns adaptable to multi-label image tagging.
Read the guide → API referenceGrammar API for JSON-schema and BNF-constrained outputs.
Open the reference →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.