Solutions · Vision · Image Classification

Classify images into your own categories.

Pick a label from a set you define. Zero-shot via VLM prompting, grammar-constrained to your exact label list. No upload, no training run required, no per-call billing. Move to LoRA fine-tuning when your dataset grows.

Custom categories Zero-shot or fine-tuned Deterministic output
Engine

VLM + Grammar

Vision-language model output constrained to a finite label set.

Engine

LoRA fine-tune

Train an adapter once, hot-swap at runtime.

Engine

Confidence scores

Token logprobs surface as per-label confidence.

How it works

Define labels, constrain the output.

A grammar restricts the VLM to emit one of your labels (and nothing else). The model picks the best match for the input image; the output is parseable, deterministic, and exact every time.

ImageClassification.cs
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Graphics;

var vlm = LM.LoadFromModelID("qwen3-vl:8b");
var chat = new SingleTurnConversation(vlm);

// 1. Define the allowed label set as a BNF grammar.
var labels = new[] { "defect", "clean", "borderline" };
var grammar = Grammar.FromAllowedValues(labels);

// 2. Submit the image and let the model pick a label.
var result = await chat.SubmitAsync(
    "Classify this part photo as one of: defect, clean, borderline.",
    Attachment.FromFile("part-photo.jpg"),
    grammar);

Console.WriteLine(result);  // "defect", "clean", or "borderline"
When to use what

Zero-shot vs fine-tune.

Zero-shot

Prompt-only, no training

Give the VLM the label set in the prompt + a grammar. Works well when categories are visually distinct and well-named. Quickest path from idea to running.

Few-shot

Prompt with examples

Include 2-4 reference images per label in the prompt context. Pushes accuracy without training infrastructure. Great for nuanced categories.

LoRA

Fine-tuned adapter

Train a LoRA on labeled examples; hot-swap at runtime. Best accuracy when you have a domain-specific taxonomy and a few hundred images per class.

Use cases

Where image classification belongs.

QA & defect classification

Manufacturing line photos, weld inspection, surface condition. Air-gapped factory floors stay air-gapped.

Document type routing

"Is this an invoice, a contract, a receipt, or an ID?" Route mailroom scans to the right downstream pipeline.

Content moderation

Flag user-uploaded content by category. Keep moderation policy on your own server, not a third-party endpoint.

Medical triage

Pre-classify medical imagery for a queue. PHI never leaves the hospital network.

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Deterministic labels, your taxonomy.

Start in 5 minutes Back to Vision hub