Predefined
Invoices, receipts, passports, bank statements, contracts, IDs, certificates, transcripts, payslips, and more.
Mailrooms, support inboxes, contract repositories, and shared drives
fill up with mixed document types. Categorization in
LM-Kit classifies any PDF or image into your categories with
confidence scoring, zero-shot. Use the 30+ predefined categories or
define your own. Single-document or parallel batch over an entire
directory tree.
Invoices, receipts, passports, bank statements, contracts, IDs, certificates, transcripts, payslips, and more.
Define your own categories with one-line descriptions. No training, no labeled data needed.
Parallel classification across entire directory trees with real-time metrics.
Traditional document classification means collecting thousands of labeled examples per category, training a model, monitoring drift, retraining when categories evolve. Zero-shot classification on vision-language models gives you the same accuracy without the data pipeline. Add a new category by adding a string. Remove one by removing it. The model adapts.
Categories are described in natural language. Vision-language models do the matching without supervised training.
Every classification carries a confidence score. Route low-confidence items to a human review queue.
Works on any document the rest of the SDK accepts: PDFs (digital or scanned), photos, screenshots.
Run classification concurrently across an entire directory tree. Throughput scales with cores or GPU.
Use any vision-language model: Qwen3-VL, Gemma 4, MiniCPM-V, GLM-V 4.6 Flash. Smaller models for edge, larger for accuracy.
Classify, then extract. Different categories drive different extraction schemas. The classify-and-extract pipeline ships as a guide.
Classify one PDF into a built-in or user-defined category, with a confidence score per prediction.
using LMKit.Classification; using LMKit.Model; var vlm = VisionLanguageModel.LoadFromModelID("glm-4.6v-flash"); var classifier = new Categorization(vlm); // Use the predefined catalogue (30+ document types). classifier.UseDefaultCategories(); // Or define your own. classifier.AddCategory("medical_record", "Patient chart, lab report, prescription"); classifier.AddCategory("discharge_summary", "Hospital discharge document"); CategorizationResult r = await classifier.ClassifyAsync(@"C:\inbox\scan_4521.pdf"); Console.WriteLine($"{r.Category} ({r.Confidence:P1})"); // medical_record (94.2%)
Walk an entire mailroom drop in parallel, sort each file into a per-category bin, and queue low-confidence cases for review.
// Walk an entire mailroom drop, classify in parallel, sort into bins. var files = Directory.EnumerateFiles(@"C:\inbox", "*.*", SearchOption.AllDirectories); var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }; await Parallel.ForEachAsync(files, options, async (file, ct) => { var r = await classifier.ClassifyAsync(file, ct); var dest = Path.Combine(@"C:\sorted", r.Category, Path.GetFileName(file)); Directory.CreateDirectory(Path.GetDirectoryName(dest)!); File.Move(file, dest); if (r.Confidence < 0.75) reviewQueue.Enqueue(file); });
Many real-world scans bundle multiple documents in one PDF. Split first, then classify each segment.
After classification, run extraction with the right schema per category. Invoice fields differ from passport fields.
Pure-image PDFs need OCR before any text-based classification. Vision-grounded classification works directly on the image.
Build an agent that watches a folder and reacts to new arrivals: classify, extract, route, archive.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: classify documents with a VLM and grammar-constrained labels.
Open on GitHub → DemoConsole demo: high-throughput classification across many documents.
Open on GitHub → How-to guideDefine a taxonomy, constrain output, ship deterministic labels.
Read the guide → How-to guideCombine classification + structured extraction in one pass.
Read the guide →