Preprocess
Deskew, smart-binarize, despeckle, auto-crop, blank-detect, denoise. The OCR / VLM pipeline gold standard.
OCR accuracy lives or dies on input quality. A skewed scan, a noisy
fax, a low-contrast photograph, a multi-page TIFF: each needs the
right preprocessing before the OCR or VLM ever sees it. LM-Kit
bundles a complete image pipeline in ImageBuffer:
deskew, smart binarization, despeckle, auto-crop, blank detection,
format conversion, multi-page TIFF, plus a full Canvas
drawing API and image-similarity search via embeddings. All native,
all CPU-efficient, all in one NuGet.
Deskew, smart-binarize, despeckle, auto-crop, blank-detect, denoise. The OCR / VLM pipeline gold standard.
Resize, rotate, flip, format-convert, draw shapes via Canvas, Pen, Brush.
Vision-language understanding, OCR, image embeddings for similarity search.
A 3-degree skew can drop OCR accuracy from 99% to 70%. A noisy fax produces hallucinated characters. A scan with uniform borders wastes half the OCR window. A blank cover page costs you an inference call. Every one of those wins in the pipeline below adds up to higher accuracy at lower cost. LM-Kit ships them as one-line method calls.
ImageBuffer wraps native handles. Zero copies between operations. Pixel-format conversions happen in-place.
GRAY8, RGB24, RGBA32, BINARY1. Convert with ConvertGRAY8(), ConvertRGB24(), etc. Pick the right format for the target operation.
PNG, JPEG, WebP, BMP, TIFF, TGA, PNM. Multi-page TIFF supported via SaveAsMultipageTiff() and SelectPage().
Bilinear and Lanczos resampling for high-quality resize. Box-filter resize for thumbnails. Per-axis control over output dimensions.
Canvas, Pen, Brush. Lines, rectangles, ellipses, polygons, quadrilaterals. Drives annotation overlays for OCR coordinates and redaction.
Same buffer feeds OCR, VLMs, and image embeddings. No format dance between the preprocessing layer and the inference layer.
The pipeline below is what production OCR systems run before they hand an image to the model. Each step is a single method call.
Deskew
Deskew() measures page rotation up to ±15° and returns the corrected image plus the detected angle. Standard for scanned forms and faxes.
SmartBinarize
SmartBinarize() uses block detection, edge recovery, and inverse-text recovery to separate text from background even on dark or low-contrast pages.
OtsuBinarize
OtsuBinarize() picks an optimal binary threshold without configuration. Fastest path for clean scans.
DespeckleBitonal
DespeckleBitonal() strips salt-and-pepper noise from binary images. Critical for older scans and poor-quality fax input.
AutoCrop
AutoCrop(margin, tolerance) detects background colour from corners, removes uniform borders, leaves you with the actual content.
IsBlank / IsBlackAndWhite
Detect blank cover sheets, spacers, and already-bitonal images so the pipeline avoids redundant OCR calls.
Load, deskew, binarize, denoise, and auto-crop a single scan before handing it to the OCR engine.
using LMKit.Media.Image; // Load any common format. Auto-detected. using var img = ImageBuffer.Load(@"C:\scans\fax_3104.tif"); // Skip the pipeline entirely if the page is blank. if (img.IsBlank()) return; // Convert to grayscale once; downstream ops are faster. img.ConvertGRAY8(); // Correct skew up to 15 degrees. DeskewResult ds = img.Deskew(); Console.WriteLine($"corrected {ds.Angle:F2} deg"); // Adaptive binarize. Recovers dark text and edges automatically. img.SmartBinarize(); // Strip salt-and-pepper noise from the binary image. img.DespeckleBitonal(); // Remove uniform white borders so the OCR window is content-only. img.AutoCrop(margin: 8, tolerance: 12); // Hand the cleaned buffer to the OCR engine. var ocr = new LMKitOcr(); var result = await ocr.RunAsync(new OcrParameters(img));
Iterate over every page inside a multi-page TIFF, run the cleanup pipeline per page, and OCR independently.
// Iterate every page in a multi-page TIFF and OCR each one. using var tiff = ImageBuffer.Load(@"C:\archives\dossier.tif"); for (int page = 0; page < tiff.PageCount; page++) { using var single = tiff.ExtractPage(page); single.SmartBinarize(); single.DespeckleBitonal(); var r = await ocr.RunAsync(new OcrParameters(single)); log.Info($"page {page}: {r.Text.Length} chars"); }
Canvas turns any ImageBuffer into a drawing
surface. Annotate detected OCR regions, render redaction overlays,
build per-page composite outputs without leaving managed code.
using LMKit.Graphics; using LMKit.Media.Image; // Run VLM OCR with coordinate output. var regions = await vlmOcr.RunAsync(new OcrParameters(img) { Intent = VlmOcrIntent.OcrWithCoordinates }); // Draw bounding boxes onto a copy of the original image. using var annotated = img.Clone(); var canvas = new Canvas(annotated); var pen = new Pen(Color32.Magenta, thickness: 2); foreach (var region in regions.PageElement.Children) { canvas.DrawRectangle(region.Bounds, pen); } annotated.SavePng(@"C:\out\annotated.png");
Once an image is loaded, the rest of LM-Kit is one method away. Send it to a vision-language model for natural-language understanding, embed it for similarity search, classify it, or run OCR.
Send an image into a multi-turn chat with a vision-language model and follow up with refining questions.
using LMKit.Model; using LMKit.TextGeneration; // Vision-language understanding. Multi-turn aware. var vlm = VisionLanguageModel.LoadFromModelID("glm-4.6v-flash"); var chat = new MultiTurnConversation(vlm); chat.AddImage(@"C:\screenshots\dashboard.png"); Console.WriteLine(await chat.SubmitAsync("What does this dashboard report?")); Console.WriteLine(await chat.SubmitAsync("Which metric is trending down?"));
Embed a folder of product photos into a vector store and look up the visually nearest matches for a query image.
using LMKit.Embeddings; using LMKit.Retrieval; // Build a vector store of image embeddings. var embedder = new Embedder(LM.LoadFromModelID("nomic-embed-vision")); var store = new VectorStore(embedder.EmbeddingSize); foreach (var path in Directory.EnumerateFiles(@"C:\product-photos", "*.jpg")) { using var img = ImageBuffer.Load(path); var vec = await embedder.GetEmbeddingsAsync(img); store.AddVector(vec, metadata: path); } // Find visually similar items to a query image. using var query = ImageBuffer.Load(@"C:\query.jpg"); var queryVec = await embedder.GetEmbeddingsAsync(query); var nearest = store.FindNearest(queryVec, topK: 10); foreach (var hit in nearest) Console.WriteLine($"{hit.Score:F3} {hit.Metadata}");
Three libraries, three serialisation boundaries, three sets of dependencies. Buffer copies between layers cost both performance and code clarity.
Powerful, but most .NET wrappers (Emgu, OpenCvSharp) are heavy native deps. Smart binarization for OCR is bring-your-own.
Native unmanaged buffer. Same object handles preprocessing, OCR, VLMs, and embeddings. SmartBinarize, Deskew, DespeckleBitonal, AutoCrop are first-class. One NuGet, one dependency.
The downstream consumer. SmartBinarize and Deskew are what make OCR accuracy production-grade.
Convert images to PDF or to searchable PDF. Render PDF pages as images for vision input.
The image-embedding side of the multimodal embedder. Same vector store, text or image queries.
Classify images and scanned documents into 30+ predefined categories or your own.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Deskew, crop, resize, denoise. Built-in tools wrap each step.
Read the guide → How-to guideCaption, describe, classify, VQA via vision-language models.
Read the guide → DemoConsole demo: extract text, Markdown, tables, formulas from images.
Open on GitHub →