VlmOcr
VLM-driven OCR engine with structured intents.
Modern OCR is no longer a single text-out engine. VLM-OCR reads layouts, tables, formulas, charts, and seals as structured intents. The output is Markdown, JSON, or plain text. Runs locally; SOTA benchmark scores on public document-OCR datasets.
VlmOcrVLM-driven OCR engine with structured intents.
LMKitOcrFirst-party engine, fast on a single core.
Choose the right OCR model for your hardware.
Traditional OCR returns a flat string. VLM-OCR understands what it sees: paragraphs are paragraphs, tables are tables, charts are charts, signatures are signatures. The output is structured from the start.
01
Plain text extraction with reading order preserved across columns and pages.
02
Headings, lists, emphasis, code blocks. Drop straight into a Markdown pipeline.
03
Structured table extraction with cells, headers, spans. Output as JSON or CSV.
04
LaTeX or MathML for inline and display math. Recover equations from scanned scientific papers.
05
Bar, line, pie, scatter, axis labels and values. Extract data points from chart images.
06
Bounding boxes per token, line, paragraph. Anchor downstream redaction or highlighting.
07
Detect and extract official stamps, seals, and signatures with bounding boxes. Useful for compliance workflows where legal artefacts must be flagged separately from body text.
using LMKit.Model; using LMKit.Extraction.Ocr; using LMKit.Graphics; var ocrModel = LM.LoadFromModelID("paddleocr-vl:0.9b"); var ocr = new VlmOcr(ocrModel); // Markdown intent: paragraphs, lists, headings. var md = await ocr.ExtractAsync( Attachment.FromFile("page.png"), intent: VlmOcrIntent.Markdown); // Tables intent: structured cells. var tables = await ocr.ExtractAsync( Attachment.FromFile("financials.png"), intent: VlmOcrIntent.Tables); // Formulas intent: LaTeX output. var formulas = await ocr.ExtractAsync( Attachment.FromFile("paper.png"), intent: VlmOcrIntent.Formulas);
Convert scanned PDFs to clean Markdown for ingestion into RAG, knowledge bases, or LLM context windows.
Extract equations, charts, and tables from published papers with structure intact. Reproduce LaTeX from PDFs.
Flag seals, signatures, and official stamps as separate artefacts. Drive automated compliance review.
Read mixed-format paper mail (letters, invoices, contracts), output structured Markdown for downstream pipelines.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: extract text, Markdown, tables, formulas from images.
Open on GitHub → DemoConsole demo: get bounding boxes for downstream redaction or highlighting.
Open on GitHub → How-to guidePick a model, pick an intent, get clean structured text.
Read the guide → How-to guideRecover row/column structure from photographed or scanned tables.
Read the guide →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.