Solutions · Document Intelligence · OCR

Industry-grade OCR. On your machine.

LM-Kit ships two complementary OCR engines: a native CPU-efficient engine (LMKitOcr) that excels at document OCR, runs extremely fast even on a single core, and scales linearly with optimised multithreading; and a vision-language engine (VlmOcr) running PaddleOCR-VL, GLM-OCR, LightOnOCR and other VLMs for layout-aware extraction including tables, formulas, charts, seals and bounding boxes. Both run 100% locally. Both achieve state-of-the-art benchmark accuracy against public document-OCR datasets.

34+ languages SOTA benchmarks 0 cloud calls

LMKitOcr

Native, CPU-efficient. Excels at document OCR. Extremely fast on a single core; scales linearly with optimised multithreading. 34+ languages. Auto-detects language, orientation, deskew.

VlmOcr

Vision-language model OCR. Backends: PaddleOCR-VL, GLM-OCR, LightOnOCR, Qwen3-VL, MiniCPM-V, Gemma 4. Tables, formulas, charts, coordinates, seals.

Hybrid pipeline

Pick fast classical OCR per page, fall back to VLM only when layout demands it. Best accuracy at lowest CPU.

SOTA benchmark accuracy on public document-OCR datasets
34+ languages supported by the native engine
7 VLM intents (text, Markdown, tables, formulas, charts, coordinates, seals)
0 cloud calls, per-page billing, or quota limits
Why local OCR matters

Sending invoices to a cloud is not free.

Cloud OCR services bill per page and ship your scans to a third party. For regulated workloads (HIPAA, GDPR, financial services, defence) that is a non-starter. For high-volume workloads it is a budget line. LM-Kit OCR runs entirely on the box, with no per-page cost, no data leaving the perimeter, no rate limits, and no quota outages.

No per-page cost

A million pages costs the same as one: zero per-page billing. Hardware amortises across years of throughput.

No data egress

Documents never leave your machine. HIPAA, GDPR, ISO 27001, SOC 2 compliance becomes architecturally simple.

CPU-efficient by default

LMKitOcr runs on commodity CPUs. Concurrency is auto-gated to physical core count. No GPU required for classical OCR.

VLM when layout demands it

Switch to VlmOcr for complex layouts: multi-column scientific papers, financial tables, hand-annotated forms, scanned receipts.

Production observability

Events for every phase: LanguageDetected, OrientationDetected, OcrStarting, OcrCompleted. Wire into OpenTelemetry.

Continuously improved

New models land regularly: PaddleOCR-VL 1.5, GLM-OCR, LightOnOCR-2, Qwen3-VL 32 langs. Bump a NuGet, inherit the upgrade.

Two engines, one API

Pick the right engine per workload.

Both engines implement the same OcrEngine abstract base. Swap them by changing one line, or use both side-by-side.

Seven VlmOcr intents

Right output, right shape.

VlmOcrIntent tells the engine what kind of content to optimise for. The same image yields different output structure per intent.

PlainText

Plain text

Reading-order text without any structure. Best for clean prose pages.

Markdown

Layout-aware Markdown

Headings, lists, paragraphs, tables, code blocks preserved. LLM-ready straight out of the engine.

TableRecognition

Tables

Cell-accurate table extraction with row/column structure. Markdown or HTML table output.

FormulaRecognition

Formulas

LaTeX output for mathematical expressions. Useful for scientific papers and textbooks.

ChartRecognition

Charts

Extract the underlying data points from bar / line / pie charts. Output as structured rows.

OcrWithCoordinates

Text spotting

Per-region text plus bounding boxes (x, y, width, height). Drives redaction, search-highlight, ROI workflows.

SealRecognition

Seals & stamps

Detect company seals, official stamps, signatures. Common in Asian markets, contract verification, legal docs.

Five-line OCR

Image to text, any backend.

Three code paths, three trade-offs: a native CPU engine for speed, a vision-language engine for layout preservation, and structured intents for tables and per-region coordinates. Pick a tab.

LMKitOcr is the native engine: fast on a single CPU core, language packs auto-download on first use, returns reading-order plain text plus per-region bounding boxes. Use it for invoice scans, IDs, forms, anywhere accuracy and speed beat layout fidelity.

LMKitOcr.cs
using LMKit.Extraction.Ocr;

// Native CPU-efficient OCR. Language packs auto-download on first use.
var ocr = new LMKitOcr();

// Auto-detect language and orientation. Pass a file, stream, or byte[].
OcrResult result = await ocr.RunAsync(new OcrParameters(@"C:\scans\invoice.png")
{
    DetectLanguage    = true,
    DetectOrientation = true,
});

Console.WriteLine(result.Text);                 // reading-order plain text
Console.WriteLine(result.PageElement.Bounds);   // per-region bounding boxes
Built for every deployment shape

A real local OCR engine.

Most OCR offerings ship as either a hosted cloud service or a desktop-locked product. Neither shape fits a .NET server, a regulated cleanroom, an offline kiosk, or a commodity laptop running a thousand pages overnight. LM-Kit OCR fits all of those, from the same NuGet, with the same API.

Hosted cloud OCR

Per-page billing that scales with volume. Documents leave your perimeter. Latency depends on region. Rate limits and quota outages are part of the SLA. A non-starter for HIPAA, GDPR, financial services, defence.

Desktop-only OCR

Strong accuracy, but tied to a single machine. Per-seat licensing. No native .NET embedding path. Cannot ship inside a server-side workflow or an embedded application.

LM-Kit OCR

Native .NET, embeddable, on-device. SOTA benchmark accuracy. No per-page cost, no data egress, no rate limits. Two engines for the speed-vs-layout tradeoff. Continuously improved with new specialised models.

OCR models in the catalog

Specialised vision OCR models.

Every model below is loadable from the LM-Kit Model Catalog. Sizes range from 0.9B (run on a phone) to multi-billion (server-class). All quantised to GGUF, all run via VlmOcr.

paddleocr-vl:0.9b

PaddleOCR-VL 1.5

Ultra-compact 0.9B. 94.5% on OmniDocBench. All seven intents supported. Default recommendation for laptop-class deployments.

glm-ocr

GLM-OCR 0.9B

Document parsing, structured info extraction. Strong on multi-language layouts, charts and tables.

lightonocr-2:1b

LightOnOCR-2

RLVR-trained 1B. End-to-end document conversion. Tables, receipts, forms, multi-column, math notation.

lightonocr-2-bbox:1b

LightOnOCR-2 BBox

OCR plus bounding box detection in a single pass. Eleven languages with embedded image localisation.

glm-4.6v-flash

GLM-V 4.6 Flash

Lightweight VLM. Strong OCR in 32 languages. Document understanding, screenshots, charts, native function calling.

qwen3-vl, gemma 4, minicpm-v

General-purpose multimodal

Use any vision-language model for OCR via VlmOcr. Useful when an agent already has a VLM loaded for chat or reasoning.

Where local OCR ships

Real workloads, real volumes.

Mailroom automation

Scan, OCR, classify, route. On-device throughput keeps the mailroom running even when the network does not.

Invoice / receipt processing

Extract vendor, line items, totals, tax. Pair OCR with structured extraction for end-to-end automation.

Healthcare records

Patient charts, lab reports, scanned forms. HIPAA-compliant by construction since data never leaves the box.

Legal & contracts

Old scanned contracts, hand-annotated agreements, seals and stamps. SealRecognition is purpose-built for this.

Scientific publishing

Multi-column papers, formulas as LaTeX, charts as data. FormulaRecognition + ChartRecognition.

Air-gapped & defence

Cleanroom deployments where cloud OCR is forbidden by policy. Same NuGet, no architecture change.

Related capabilities

OCR plus the rest of the stack.

Image processing

The preprocessing pipeline that makes OCR accurate. Deskew, smart binarize, despeckle, auto-crop. One method call each.

Image processing page

Document to Markdown

Universal converter that picks the right OCR backend per page. Output is LLM-ready Markdown.

Document to Markdown page

Structured data extraction

Pull typed fields (vendor, totals, dates) from scanned invoices. OCR is the front-end, extraction is the back-end.

Extraction page

PDF toolkit

Generate searchable PDFs from scans, extract pages, redact regions. OCR is the engine; the toolkit is the surrounding workflow.

PDF toolkit page

Document Intelligence chat

Ask questions about scanned PDFs. Vision-grounded RAG runs OCR transparently when the document needs it.

Chat with PDF page

Industry-grade OCR. Zero per-page billing.

Get Community Edition Download