Solutions · Document Intelligence · Classification

Sort documents at mailroom scale.

Mailrooms, support inboxes, contract repositories, and shared drives fill up with mixed document types. Categorization in LM-Kit classifies any PDF or image into your categories with confidence scoring, zero-shot. Use the 30+ predefined categories or define your own. Single-document or parallel batch over an entire directory tree.

Start building free API reference

Zero-shot 30+ predefined categories Confidence scored

Predefined

Invoices, receipts, passports, bank statements, contracts, IDs, certificates, transcripts, payslips, and more.

Custom labels

Define your own categories with one-line descriptions. No training, no labeled data needed.

Batch throughput

Parallel classification across entire directory trees with real-time metrics.

Why zero-shot classification matters

Training a classifier is the long way around.

Traditional document classification means collecting thousands of labeled examples per category, training a model, monitoring drift, retraining when categories evolve. Zero-shot classification on vision-language models gives you the same accuracy without the data pipeline. Add a new category by adding a string. Remove one by removing it. The model adapts.

No labeled data

Categories are described in natural language. Vision-language models do the matching without supervised training.

Confidence scoring

Every classification carries a confidence score. Route low-confidence items to a human review queue.

PDF and image inputs

Works on any document the rest of the SDK accepts: PDFs (digital or scanned), photos, screenshots.

Parallel batch

Run classification concurrently across an entire directory tree. Throughput scales with cores or GPU.

Pluggable models

Use any vision-language model: Qwen3-VL, Gemma 4, MiniCPM-V, GLM-V 4.6 Flash. Smaller models for edge, larger for accuracy.

Pairs with extraction

Classify, then extract. Different categories drive different extraction schemas. The classify-and-extract pipeline ships as a guide.

Five-line classifier

From folder to sorted bins.

Classify one PDF into a built-in or user-defined category, with a confidence score per prediction.

SingleClassify.cs

using LMKit.Classification;
using LMKit.Model;

var vlm = VisionLanguageModel.LoadFromModelID("glm-4.6v-flash");
var classifier = new Categorization(vlm);

// Use the predefined catalogue (30+ document types).
classifier.UseDefaultCategories();

// Or define your own.
classifier.AddCategory("medical_record", "Patient chart, lab report, prescription");
classifier.AddCategory("discharge_summary", "Hospital discharge document");

CategorizationResult r = await classifier.ClassifyAsync(@"C:\inbox\scan_4521.pdf");

Console.WriteLine($"{r.Category} ({r.Confidence:P1})");
// medical_record (94.2%)

Walk an entire mailroom drop in parallel, sort each file into a per-category bin, and queue low-confidence cases for review.

BatchSort.cs

// Walk an entire mailroom drop, classify in parallel, sort into bins.
var files   = Directory.EnumerateFiles(@"C:\inbox", "*.*", SearchOption.AllDirectories);
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };

await Parallel.ForEachAsync(files, options, async (file, ct) =>
{
    var r = await classifier.ClassifyAsync(file, ct);
    var dest = Path.Combine(@"C:\sorted", r.Category, Path.GetFileName(file));
    Directory.CreateDirectory(Path.GetDirectoryName(dest)!);
    File.Move(file, dest);

    if (r.Confidence < 0.75) reviewQueue.Enqueue(file);
});

Demos and guides

Working references.

DemoDocument classification DemoBatch document classification GuideClassify with custom categories GuideClassify and extract pipeline

Related capabilities

Classification plus the workflow.

Document splitting

Many real-world scans bundle multiple documents in one PDF. Split first, then classify each segment.

Splitting page

Structured extraction

After classification, run extraction with the right schema per category. Invoice fields differ from passport fields.

Extraction page

OCR

Pure-image PDFs need OCR before any text-based classification. Vision-grounded classification works directly on the image.

OCR page

Document monitoring agent

Build an agent that watches a folder and reacts to new arrivals: classify, extract, route, archive.

How-to guide

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Mailroom scale. No labeled data.

Get Community Edition Download