Solutions · Vision · Image Labeling

Tag images with multiple labels.

Multi-label tagging from any image: people, objects, scenes, attributes, custom categories. Output a clean array of tags with confidence scores, ready to drop into your asset catalog or moderation pipeline.

Multi-label Custom taxonomies Confidence scores
Output

JSON array

Grammar-constrained output emits a typed list of tags.

Output

Confidence per tag

Token logprobs translate to per-label confidence.

Output

Open or closed sets

Free-form tags or a fixed taxonomy via grammar enumeration.

What you get

A tag-friendly structured response.

01

Multi-label by default

Unlike classification, labeling returns a SET. Three to thirty tags per image with no per-call cost.

02

Schema-constrained

Grammar enforces a JSON shape: array of strings, plus optional confidence floats. Parseable without try-catch.

03

Custom taxonomies

Restrict the output to your label list (e.g. an asset-management taxonomy with 500 categories) via grammar enumeration.

04

Multilingual tags

Generate tags in any language the VLM supports. Useful for international content pipelines.

How it works

Constrain the output to your taxonomy.

ImageLabeling.cs
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Graphics;
using System.Text.Json;

var vlm = LM.LoadFromModelID("qwen3-vl:8b");
var chat = new SingleTurnConversation(vlm);

// 1. JSON schema: an array of tag strings, max 12 entries.
var grammar = Grammar.FromJsonSchema("""
    { "type": "object",
      "properties": {
        "tags": {
          "type": "array",
          "items": { "type": "string" },
          "maxItems": 12
        }
      },
      "required": ["tags"]
    }
    """);

// 2. Ask the VLM to tag the image.
var json = await chat.SubmitAsync(
    "Tag this image. Return up to 12 short, single-word tags.",
    Attachment.FromFile("asset.jpg"),
    grammar);

// 3. Parse and use.
var doc  = JsonDocument.Parse(json);
var tags = doc.RootElement.GetProperty("tags");
foreach (var tag in tags.EnumerateArray())
    Console.WriteLine(tag.GetString());
Use cases

Where image labeling belongs.

Digital asset management

Auto-tag millions of stock images, marketing photos, archive scans. Re-tag with a new taxonomy in a single pass.

E-commerce catalogs

Tag product photos with attributes (color, material, style, season). Power faceted search without a manual taxonomy team.

Content moderation

Flag user uploads with multiple safety labels (violence, NSFW, IP). Run on the upload server; never send to a cloud.

Search indexing

Generate tags for images that feed your search engine. Plain text search becomes image-aware without extra infrastructure.

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Multi-label, your taxonomy.

Start in 5 minutes Back to Vision hub