Solutions · Vision · Image Labeling

Tag images with multiple labels.

Multi-label tagging from any image: people, objects, scenes, attributes, custom categories. Output a clean array of tags with confidence scores, ready to drop into your asset catalog or moderation pipeline.

5-minute quickstart Grammar API

Multi-label Custom taxonomies Confidence scores

Output

JSON array

Grammar-constrained output emits a typed list of tags.

Output

Confidence per tag

Token logprobs translate to per-label confidence.

Output

Open or closed sets

Free-form tags or a fixed taxonomy via grammar enumeration.

What you get

A tag-friendly structured response.

Multi-label by default

Unlike classification, labeling returns a SET. Three to thirty tags per image with no per-call cost.

Schema-constrained

Grammar enforces a JSON shape: array of strings, plus optional confidence floats. Parseable without try-catch.

Custom taxonomies

Restrict the output to your label list (e.g. an asset-management taxonomy with 500 categories) via grammar enumeration.

Multilingual tags

Generate tags in any language the VLM supports. Useful for international content pipelines.

How it works

Constrain the output to your taxonomy.

ImageLabeling.cs

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Graphics;
using System.Text.Json;

var vlm = LM.LoadFromModelID("qwen3-vl:8b");
var chat = new SingleTurnConversation(vlm);

// 1. JSON schema: an array of tag strings, max 12 entries.
var grammar = Grammar.FromJsonSchema("""
    { "type": "object",
      "properties": {
        "tags": {
          "type": "array",
          "items": { "type": "string" },
          "maxItems": 12
        }
      },
      "required": ["tags"]
    }
    """);

// 2. Ask the VLM to tag the image.
var json = await chat.SubmitAsync(
    "Tag this image. Return up to 12 short, single-word tags.",
    Attachment.FromFile("asset.jpg"),
    grammar);

// 3. Parse and use.
var doc  = JsonDocument.Parse(json);
var tags = doc.RootElement.GetProperty("tags");
foreach (var tag in tags.EnumerateArray())
    Console.WriteLine(tag.GetString());

Use cases

Where image labeling belongs.

Digital asset management

Auto-tag millions of stock images, marketing photos, archive scans. Re-tag with a new taxonomy in a single pass.

E-commerce catalogs

Tag product photos with attributes (color, material, style, season). Power faceted search without a manual taxonomy team.

Content moderation

Flag user uploads with multiple safety labels (violence, NSFW, IP). Run on the upload server; never send to a cloud.

Search indexing

Generate tags for images that feed your search engine. Plain text search becomes image-aware without extra infrastructure.

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Multi-Label Image Tagging

Console demo: VLM outputs a JSON array of 5-10 short, lower-case tags per image.

Open on GitHub → Sample

Multi-Label Image Tagging walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → Demo

Custom classification

Console demo adaptable to multi-label tagging via grammar enumeration.

Open on GitHub → Sample

Custom classification walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → How-to guide

Enforce structured output with grammar

How to constrain a VLM to a JSON array of tags with confidence scores.

Read the guide → How-to guide

Classify documents with custom categories

Patterns adaptable to multi-label image tagging.

Read the guide → API reference

Grammar

Grammar API for JSON-schema and BNF-constrained outputs.

Open the reference →

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

RAG & Knowledge

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Multi-label, your taxonomy.

Start in 5 minutes Back to Vision hub