Solutions · Vision · Image Classification

Classify images into your own categories.

Pick a label from a set you define. Zero-shot via VLM prompting, grammar-constrained to your exact label list. No upload, no training run required, no per-call billing. Move to LoRA fine-tuning when your dataset grows.

5-minute quickstart Grammar API

Custom categories Zero-shot or fine-tuned Deterministic output

Engine

VLM + Grammar

Vision-language model output constrained to a finite label set.

Engine

LoRA fine-tune

Train an adapter once, hot-swap at runtime.

Engine

Confidence scores

Token logprobs surface as per-label confidence.

How it works

Define labels, constrain the output.

A grammar restricts the VLM to emit one of your labels (and nothing else). The model picks the best match for the input image; the output is parseable, deterministic, and exact every time.

ImageClassification.cs

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Graphics;

var vlm = LM.LoadFromModelID("qwen3-vl:8b");
var chat = new SingleTurnConversation(vlm);

// 1. Define the allowed label set as a BNF grammar.
var labels = new[] { "defect", "clean", "borderline" };
var grammar = Grammar.FromAllowedValues(labels);

// 2. Submit the image and let the model pick a label.
var result = await chat.SubmitAsync(
    "Classify this part photo as one of: defect, clean, borderline.",
    Attachment.FromFile("part-photo.jpg"),
    grammar);

Console.WriteLine(result);  // "defect", "clean", or "borderline"

When to use what

Zero-shot vs fine-tune.

Zero-shot

Prompt-only, no training

Give the VLM the label set in the prompt + a grammar. Works well when categories are visually distinct and well-named. Quickest path from idea to running.

Few-shot

Prompt with examples

Include 2-4 reference images per label in the prompt context. Pushes accuracy without training infrastructure. Great for nuanced categories.

LoRA

Fine-tuned adapter

Train a LoRA on labeled examples; hot-swap at runtime. Best accuracy when you have a domain-specific taxonomy and a few hundred images per class.

Use cases

Where image classification belongs.

QA & defect classification

Manufacturing line photos, weld inspection, surface condition. Air-gapped factory floors stay air-gapped.

Document type routing

"Is this an invoice, a contract, a receipt, or an ID?" Route mailroom scans to the right downstream pipeline.

Content moderation

Flag user-uploaded content by category. Keep moderation policy on your own server, not a third-party endpoint.

Medical triage

Pre-classify medical imagery for a queue. PHI never leaves the hospital network.

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Zero-Shot Image Classification

Console demo: classify images into a fixed list of categories with greedy decoding, no training set.

Open on GitHub → Sample

Zero-Shot Image Classification walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → Demo

Custom classification

Console demo: classify text or images into custom categories.

Open on GitHub → Sample

Custom classification walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → Demo

Document classification

Classify scanned documents with a VLM and grammar-constrained labels.

Open on GitHub → Sample

Document classification walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → How-to guide

Classify documents with custom categories

Define a taxonomy, constrain the model output, ship deterministic labels.

Read the guide → API reference

Grammar

Grammar API for grammar-constrained generation.

Open the reference →

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

RAG & Knowledge

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Deterministic labels, your taxonomy.

Start in 5 minutes Back to Vision hub