U2-Net
Salient-object segmentation. Strong on general subjects.
Strip backgrounds with U2-Net or ModNet. Deskew scanned pages, crop to bounding boxes, resize for downstream models, measure skew angles. Each preprocessing step is a class and a built-in agent tool, runnable from code or from a function-calling agent.
Salient-object segmentation. Strong on general subjects.
Portrait matting. Hair-level edges, fast.
Built-in agent tools wrap each step.
U2-Net
Identifies and isolates the foreground subject. Good general-purpose model for products, objects, animals, and complex scenes.
ModNet
Specialised for people. Hair-level edge accuracy, fast enough for real-time webcam pipelines and video conferencing.
ONNX
Both engines run via the ONNX backend with CUDA, DirectML, or CPU. Same accelerator stack as the rest of LM-Kit.NET.
These preprocessing operations ship as both .NET classes (call from code) and built-in agent tools (call from a function-calling agent). Pair them with OCR, VLMs, or any vision pipeline.
Tool
image_deskewDetect and correct rotation in scanned pages. Critical for downstream OCR accuracy on phone-scanned documents.
Tool
image_measure_skewCompute the skew angle without rotating. Useful when you want to flag oblique scans without modifying them.
Tool
image_crop & image_resize_boxRegion-of-interest extraction by pixel coordinates or by detected region. Feed only the relevant patch to a VLM.
Tool
image_resizeAspect-aware resizing with quality interpolation. Standardise inputs before feeding downstream models.
Tool
image_infoInspect resolution, color space, EXIF, MIME. Use it as the first step in an agent-driven pipeline.
Tool
ocr_recognizeRun OCR as part of a preprocessing pipeline. Routes to the configured OCR engine (LMKit OCR, PaddleOCR-VL, GLM-OCR).
Remove backgrounds at upload time. Consistent catalog look without manual editing.
Real-time portrait matting on the user's machine. No cloud relay, no privacy compromise.
Deskew phone-scanned pages, crop to the page boundary, resize to the OCR engine's preferred resolution. Single agent loop chains the tools.
Background-strip a photo, redact a region, blur a face. Every operation runs on the device that holds the original.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Deskew, crop, resize, denoise. Built-in tools wrap each step.
Read the guide → How-to guideCatalog of agent-callable tools, including image preprocessing.
Read the guide → API referenceDrawing primitives, canvas, brush, pen, and the ImageBuffer type.
Open the reference →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.