Solutions · Document Intelligence · Image processing

The pipeline behind every accurate OCR.

OCR accuracy lives or dies on input quality. A skewed scan, a noisy fax, a low-contrast photograph, a multi-page TIFF: each needs the right preprocessing before the OCR or VLM ever sees it. LM-Kit bundles a complete image pipeline in ImageBuffer: deskew, smart binarization, despeckle, auto-crop, blank detection, format conversion, multi-page TIFF, plus a full Canvas drawing API and image-similarity search via embeddings. All native, all CPU-efficient, all in one NuGet.

Native unmanaged buffer 4 pixel formats 7 file formats

Preprocess

Deskew, smart-binarize, despeckle, auto-crop, blank-detect, denoise. The OCR / VLM pipeline gold standard.

Manipulate

Resize, rotate, flip, format-convert, draw shapes via Canvas, Pen, Brush.

Understand

Vision-language understanding, OCR, image embeddings for similarity search.

Why preprocessing matters

Garbage in, garbage out.

A 3-degree skew can drop OCR accuracy from 99% to 70%. A noisy fax produces hallucinated characters. A scan with uniform borders wastes half the OCR window. A blank cover page costs you an inference call. Every one of those wins in the pipeline below adds up to higher accuracy at lower cost. LM-Kit ships them as one-line method calls.

Native, unmanaged

ImageBuffer wraps native handles. Zero copies between operations. Pixel-format conversions happen in-place.

Pixel formats

GRAY8, RGB24, RGBA32, BINARY1. Convert with ConvertGRAY8(), ConvertRGB24(), etc. Pick the right format for the target operation.

File formats

PNG, JPEG, WebP, BMP, TIFF, TGA, PNM. Multi-page TIFF supported via SaveAsMultipageTiff() and SelectPage().

Resampling

Bilinear and Lanczos resampling for high-quality resize. Box-filter resize for thumbnails. Per-axis control over output dimensions.

Drawing API

Canvas, Pen, Brush. Lines, rectangles, ellipses, polygons, quadrilaterals. Drives annotation overlays for OCR coordinates and redaction.

Vision integration

Same buffer feeds OCR, VLMs, and image embeddings. No format dance between the preprocessing layer and the inference layer.

Preprocessing pipeline

Six methods, one accurate OCR.

The pipeline below is what production OCR systems run before they hand an image to the model. Each step is a single method call.

Deskew

Correct page skew

Deskew() measures page rotation up to ±15° and returns the corrected image plus the detected angle. Standard for scanned forms and faxes.

SmartBinarize

Adaptive binarization

SmartBinarize() uses block detection, edge recovery, and inverse-text recovery to separate text from background even on dark or low-contrast pages.

OtsuBinarize

Automatic threshold

OtsuBinarize() picks an optimal binary threshold without configuration. Fastest path for clean scans.

DespeckleBitonal

Remove noise

DespeckleBitonal() strips salt-and-pepper noise from binary images. Critical for older scans and poor-quality fax input.

AutoCrop

Trim uniform borders

AutoCrop(margin, tolerance) detects background colour from corners, removes uniform borders, leaves you with the actual content.

IsBlank / IsBlackAndWhite

Skip empty pages

Detect blank cover sheets, spacers, and already-bitonal images so the pipeline avoids redundant OCR calls.

A real pipeline

From raw scan to OCR-ready.

Load, deskew, binarize, denoise, and auto-crop a single scan before handing it to the OCR engine.

PrepForOcr.cs
using LMKit.Media.Image;

// Load any common format. Auto-detected.
using var img = ImageBuffer.Load(@"C:\scans\fax_3104.tif");

// Skip the pipeline entirely if the page is blank.
if (img.IsBlank()) return;

// Convert to grayscale once; downstream ops are faster.
img.ConvertGRAY8();

// Correct skew up to 15 degrees.
DeskewResult ds = img.Deskew();
Console.WriteLine($"corrected {ds.Angle:F2} deg");

// Adaptive binarize. Recovers dark text and edges automatically.
img.SmartBinarize();

// Strip salt-and-pepper noise from the binary image.
img.DespeckleBitonal();

// Remove uniform white borders so the OCR window is content-only.
img.AutoCrop(margin: 8, tolerance: 12);

// Hand the cleaned buffer to the OCR engine.
var ocr    = new LMKitOcr();
var result = await ocr.RunAsync(new OcrParameters(img));
Manipulation & drawing

Resize, annotate, compose.

Canvas turns any ImageBuffer into a drawing surface. Annotate detected OCR regions, render redaction overlays, build per-page composite outputs without leaving managed code.

AnnotateOcrRegions.cs
using LMKit.Graphics;
using LMKit.Media.Image;

// Run VLM OCR with coordinate output.
var regions = await vlmOcr.RunAsync(new OcrParameters(img)
{
    Intent = VlmOcrIntent.OcrWithCoordinates
});

// Draw bounding boxes onto a copy of the original image.
using var annotated = img.Clone();
var canvas = new Canvas(annotated);
var pen    = new Pen(Color32.Magenta, thickness: 2);

foreach (var region in regions.PageElement.Children)
{
    canvas.DrawRectangle(region.Bounds, pen);
}

annotated.SavePng(@"C:\out\annotated.png");
Vision understanding & search

From bytes to meaning.

Once an image is loaded, the rest of LM-Kit is one method away. Send it to a vision-language model for natural-language understanding, embed it for similarity search, classify it, or run OCR.

Send an image into a multi-turn chat with a vision-language model and follow up with refining questions.

DescribeImage.cs
using LMKit.Model;
using LMKit.TextGeneration;

// Vision-language understanding. Multi-turn aware.
var vlm  = VisionLanguageModel.LoadFromModelID("glm-4.6v-flash");
var chat = new MultiTurnConversation(vlm);

chat.AddImage(@"C:\screenshots\dashboard.png");
Console.WriteLine(await chat.SubmitAsync("What does this dashboard report?"));
Console.WriteLine(await chat.SubmitAsync("Which metric is trending down?"));
Versus the alternatives

No more three-library juggling.

SkiaSharp / ImageSharp + OCR + embedder

Three libraries, three serialisation boundaries, three sets of dependencies. Buffer copies between layers cost both performance and code clarity.

OpenCV via wrappers

Powerful, but most .NET wrappers (Emgu, OpenCvSharp) are heavy native deps. Smart binarization for OCR is bring-your-own.

LM-Kit ImageBuffer

Native unmanaged buffer. Same object handles preprocessing, OCR, VLMs, and embeddings. SmartBinarize, Deskew, DespeckleBitonal, AutoCrop are first-class. One NuGet, one dependency.

Related capabilities

Image processing plus the rest of Document Intelligence.

OCR

The downstream consumer. SmartBinarize and Deskew are what make OCR accuracy production-grade.

OCR page

PDF toolkit

Convert images to PDF or to searchable PDF. Render PDF pages as images for vision input.

PDF toolkit

Multimodal embeddings

The image-embedding side of the multimodal embedder. Same vector store, text or image queries.

Embeddings page

Document classification

Classify images and scanned documents into 30+ predefined categories or your own.

Classification

Better images. Better AI.

Get Community Edition Download