LMKitOcr
Native, CPU-efficient. Excels at document OCR. Extremely fast on a single core; scales linearly with optimised multithreading. 34+ languages. Auto-detects language, orientation, deskew.
LM-Kit ships two complementary OCR engines: a native
CPU-efficient engine (LMKitOcr) that excels at document
OCR, runs extremely fast even on a single core, and scales linearly
with optimised multithreading; and a vision-language engine
(VlmOcr) running PaddleOCR-VL, GLM-OCR, LightOnOCR and
other VLMs for layout-aware extraction including tables, formulas,
charts, seals and bounding boxes. Both run 100% locally. Both
achieve state-of-the-art benchmark accuracy against
public document-OCR datasets.
LMKitOcrNative, CPU-efficient. Excels at document OCR. Extremely fast on a single core; scales linearly with optimised multithreading. 34+ languages. Auto-detects language, orientation, deskew.
VlmOcrVision-language model OCR. Backends: PaddleOCR-VL, GLM-OCR, LightOnOCR, Qwen3-VL, MiniCPM-V, Gemma 4. Tables, formulas, charts, coordinates, seals.
Pick fast classical OCR per page, fall back to VLM only when layout demands it. Best accuracy at lowest CPU.
Cloud OCR services bill per page and ship your scans to a third party. For regulated workloads (HIPAA, GDPR, financial services, defence) that is a non-starter. For high-volume workloads it is a budget line. LM-Kit OCR runs entirely on the box, with no per-page cost, no data leaving the perimeter, no rate limits, and no quota outages.
A million pages costs the same as one: zero per-page billing. Hardware amortises across years of throughput.
Documents never leave your machine. HIPAA, GDPR, ISO 27001, SOC 2 compliance becomes architecturally simple.
LMKitOcr runs on commodity CPUs. Concurrency is auto-gated to physical core count. No GPU required for classical OCR.
Switch to VlmOcr for complex layouts: multi-column scientific papers, financial tables, hand-annotated forms, scanned receipts.
Events for every phase: LanguageDetected, OrientationDetected, OcrStarting, OcrCompleted. Wire into OpenTelemetry.
New models land regularly: PaddleOCR-VL 1.5, GLM-OCR, LightOnOCR-2, Qwen3-VL 32 langs. Bump a NuGet, inherit the upgrade.
Both engines implement the same OcrEngine abstract base.
Swap them by changing one line, or use both side-by-side.
LMKitOcr
Excels at document OCR. Extremely fast on a single core, with optimised multithreading that scales linearly with available cores. Best for clean scans, business documents, mass-archival workloads. Auto-downloads language packs on first use. CPU-only, sub-second per page on modern desktops.
Supports automatic language detection, orientation detection, deskewing, smart binarization, despeckle. 34 ISO 639-2/T language codes. Continuously improved against public document-OCR benchmarks.
VlmOcr
Best-in-class on complex layouts: multi-column papers, financial tables, charts, formulas, hand-written annotations, seals. Choose any vision model: PaddleOCR-VL (ultra-compact, 94.5% OmniDocBench), GLM-OCR, LightOnOCR-2, or general VLMs.
Seven intents: PlainText, Markdown, TableRecognition, FormulaRecognition, ChartRecognition, OcrWithCoordinates, SealRecognition.
VlmOcrIntent tells the engine what kind of content to optimise
for. The same image yields different output structure per intent.
PlainText
Reading-order text without any structure. Best for clean prose pages.
Markdown
Headings, lists, paragraphs, tables, code blocks preserved. LLM-ready straight out of the engine.
TableRecognition
Cell-accurate table extraction with row/column structure. Markdown or HTML table output.
FormulaRecognition
LaTeX output for mathematical expressions. Useful for scientific papers and textbooks.
ChartRecognition
Extract the underlying data points from bar / line / pie charts. Output as structured rows.
OcrWithCoordinates
Per-region text plus bounding boxes (x, y, width, height). Drives redaction, search-highlight, ROI workflows.
SealRecognition
Detect company seals, official stamps, signatures. Common in Asian markets, contract verification, legal docs.
Three code paths, three trade-offs: a native CPU engine for speed, a vision-language engine for layout preservation, and structured intents for tables and per-region coordinates. Pick a tab.
LMKitOcr is the native engine: fast on a single CPU core,
language packs auto-download on first use, returns reading-order plain
text plus per-region bounding boxes. Use it for invoice scans, IDs,
forms, anywhere accuracy and speed beat layout fidelity.
using LMKit.Extraction.Ocr; // Native CPU-efficient OCR. Language packs auto-download on first use. var ocr = new LMKitOcr(); // Auto-detect language and orientation. Pass a file, stream, or byte[]. OcrResult result = await ocr.RunAsync(new OcrParameters(@"C:\scans\invoice.png") { DetectLanguage = true, DetectOrientation = true, }); Console.WriteLine(result.Text); // reading-order plain text Console.WriteLine(result.PageElement.Bounds); // per-region bounding boxes
VlmOcr drives a vision-language model with structured intents.
The Markdown intent preserves headings, lists, tables, and code blocks
end-to-end. Use it for scientific papers, mixed-format contracts, and
anything destined for an LLM context window.
using LMKit.Extraction.Ocr; using LMKit.Model; // Vision-language OCR with PaddleOCR-VL (94.5% OmniDocBench). var vlm = VisionLanguageModel.LoadFromModelID("paddleocr-vl:0.9b"); var ocr = new VlmOcr(vlm); // Layout-aware Markdown. Tables, headings, lists, code blocks all preserved. OcrResult result = await ocr.RunAsync(new OcrParameters(@"C:\papers\paper.pdf") { Intent = VlmOcrIntent.Markdown, ImageDetail = ImageDetail.High, }); Console.WriteLine(result.Text); // LLM-ready Markdown
Same engine, specialised intents. TableRecognition emits
structured cells with row and column spans; OcrWithCoordinates
returns every region with bounding boxes for downstream redaction or
highlighting. Swap the intent, keep the rest of the code identical.
// Pull every table out of a multi-page financial report. var tables = await ocr.RunAsync(new OcrParameters(report) { Intent = VlmOcrIntent.TableRecognition, }); // Locate every redactable region in a contract for downstream blackouts. var regions = await ocr.RunAsync(new OcrParameters(contract) { Intent = VlmOcrIntent.OcrWithCoordinates, }); foreach (var region in regions.PageElement.Children) { Console.WriteLine($"{region.Text} @ {region.Bounds}"); }
Most OCR offerings ship as either a hosted cloud service or a desktop-locked product. Neither shape fits a .NET server, a regulated cleanroom, an offline kiosk, or a commodity laptop running a thousand pages overnight. LM-Kit OCR fits all of those, from the same NuGet, with the same API.
Per-page billing that scales with volume. Documents leave your perimeter. Latency depends on region. Rate limits and quota outages are part of the SLA. A non-starter for HIPAA, GDPR, financial services, defence.
Strong accuracy, but tied to a single machine. Per-seat licensing. No native .NET embedding path. Cannot ship inside a server-side workflow or an embedded application.
Native .NET, embeddable, on-device. SOTA benchmark accuracy. No per-page cost, no data egress, no rate limits. Two engines for the speed-vs-layout tradeoff. Continuously improved with new specialised models.
Every model below is loadable from the LM-Kit Model Catalog. Sizes range
from 0.9B (run on a phone) to multi-billion (server-class). All quantised
to GGUF, all run via VlmOcr.
paddleocr-vl:0.9b
Ultra-compact 0.9B. 94.5% on OmniDocBench. All seven intents supported. Default recommendation for laptop-class deployments.
glm-ocr
Document parsing, structured info extraction. Strong on multi-language layouts, charts and tables.
lightonocr-2:1b
RLVR-trained 1B. End-to-end document conversion. Tables, receipts, forms, multi-column, math notation.
lightonocr-2-bbox:1b
OCR plus bounding box detection in a single pass. Eleven languages with embedded image localisation.
glm-4.6v-flash
Lightweight VLM. Strong OCR in 32 languages. Document understanding, screenshots, charts, native function calling.
qwen3-vl, gemma 4, minicpm-v
Use any vision-language model for OCR via VlmOcr. Useful when an agent already has a VLM loaded for chat or reasoning.
Scan, OCR, classify, route. On-device throughput keeps the mailroom running even when the network does not.
Extract vendor, line items, totals, tax. Pair OCR with structured extraction for end-to-end automation.
Patient charts, lab reports, scanned forms. HIPAA-compliant by construction since data never leaves the box.
Old scanned contracts, hand-annotated agreements, seals and stamps. SealRecognition is purpose-built for this.
Multi-column papers, formulas as LaTeX, charts as data. FormulaRecognition + ChartRecognition.
Cleanroom deployments where cloud OCR is forbidden by policy. Same NuGet, no architecture change.
The preprocessing pipeline that makes OCR accurate. Deskew, smart binarize, despeckle, auto-crop. One method call each.
Universal converter that picks the right OCR backend per page. Output is LLM-ready Markdown.
Pull typed fields (vendor, totals, dates) from scanned invoices. OCR is the front-end, extraction is the back-end.
Generate searchable PDFs from scans, extract pages, redact regions. OCR is the engine; the toolkit is the surrounding workflow.
Ask questions about scanned PDFs. Vision-grounded RAG runs OCR transparently when the document needs it.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: extract text, Markdown, tables, formulas from images.
Open on GitHub → DemoConsole demo: bounding-box accurate OCR for redaction and highlighting.
Open on GitHub → How-to guideEnd-to-end how-to for scanned-document OCR pipelines.
Read the guide → How-to guidePick a model, pick an intent, get structured output.
Read the guide →