Solutions · Document Intelligence · PDF toolkit

Every PDF operation. One SDK.

Most .NET teams pull in three or four libraries to handle PDFs: one to parse, one to render, one to write, one to OCR. LM-Kit ships a complete PDF toolkit in a single NuGet: PdfDocument for read / write / render, PdfSearchableMaker for OCR-stamped searchable PDFs, search-highlight engine for visual locate, and 15+ built-in agent tools for headless automation.

Start building free PDF API reference

Read & write OCR-searchable Search & highlight

`PdfDocument`

Parse, render, manipulate. File, stream, or byte-array input.

`PdfSearchableMaker`

Stamp invisible OCR text layer onto scanned PDFs. Searchable, copyable, indexable.

`SearchHighlightEngine`

Locate text and produce a marked-up PDF with visible highlights. Drives find-in-document UIs.

What's in the toolkit

Twelve capabilities, one library.

Every operation below is exposed both as a high-level .NET API and as a built-in agent tool (pdf_*) so an agent can perform it autonomously.

Read

Parse and inspect

Open PDFs from file, stream, or byte[]. Inspect metadata (title, author, permissions). Iterate pages.

Render

PDF to image

Render any page at any DPI. Drives thumbnails, page previews, and vision-model inputs.

Merge

Combine PDFs

pdf_merge tool plus direct API. Concatenate any number of PDFs into one. Preserves bookmarks and metadata.

Split

Split into pages

pdf_split for fixed-page splits. Pair with DocumentSplitter for semantic boundary detection (multi-document scans).

Full-text search

pdf_search finds matches by phrase or regex. Returns page numbers and per-match positions.

Highlight

Search-highlight

SearchHighlightEngine returns a marked-up PDF with visible highlights at every match. Drives in-app find-and-show UIs.

Searchable

Make scans searchable

PdfSearchableMaker runs OCR and embeds an invisible text layer. The result looks identical and is indexable / copyable.

Unlock

Encrypted PDFs

pdf_unlock opens password-protected PDFs given the password. Useful for legitimate access to protected archives.

Pages

Page operations

Rotate, delete, flatten annotations, set orientation. Inspect page count and per-page metadata via pdf_pages.

Extract

Text and images

pdf_extract pulls text and embedded images out of any PDF. Pair with EmbeddedImageOcr for OCR over images embedded in text-layer PDFs.

Metadata

Inspect properties

pdf_metadata reads title, author, subject, keywords, creation date, encryption status, page count.

Build

Generate from images

ImageToPdf wraps one or many images into a PDF. Pair with ImageToSearchablePdf to add OCR text in one pass.

PDF/A-1B archival

PdfGenerationOptions.Version = PdfA1b emits ISO 19005-1 archival PDFs. Supports PDF/A-1B, 2B, and 3B with full XMP metadata and an OCR text layer in the same pass.

TIFF

Multipage TIFF in

ImageToSearchablePdf.ConvertAsync ingests multipage TIFFs straight from scanners and fax servers and emits one searchable PDF/A. Per-page OCR runs in parallel.

Redact

Permanent redaction

PdfRedactor deletes content, not covers it: text glyphs, image pixels, vector graphics, and annotations under a search term, page area, or /Redact mark are removed and unrecoverable.

Real PDF code

Four working pipelines.

Convert a folder of scanned PDFs into searchable PDFs with an invisible text layer rendered by OCR.

SearchableFromScans.cs

using LMKit.Document.Pdf;
using LMKit.Extraction.Ocr;

// Turn a folder of scanned PDFs into searchable PDFs.
var ocr  = new LMKitOcr();   // CPU-only, fast
var maker = new PdfSearchableMaker(ocr);

foreach (var path in Directory.EnumerateFiles(@"C:\scans", "*.pdf"))
{
    await maker.MakeSearchableAsync(path, $@"C:\out\{Path.GetFileName(path)}");
}

// Output PDFs look identical and are now full-text indexable.

Convert a multipage TIFF straight from a scanner or fax server into an ISO 19005-1 PDF/A-1B archive with a searchable OCR text layer in one call.

TiffToPdfA.cs

using LMKit.Document.Conversion;
using LMKit.Document.Pdf;
using LMKit.Extraction.Ocr;

// On-device OCR + PDF/A-1B archival in one call.
var ocr = new LMKitOcr();

var options = new PdfGenerationOptions
{
    Version = PdfGenerationOptions.PdfVersion.PdfA1b,
    MaxDegreeOfParallelism = 4,
    EnableOrientationDetection = true,
};

await ImageToSearchablePdf.ConvertAsync(
    @"C:\fax\inbox\case-2026-0142.tif",
    ocr,
    @"C:\archive\case-2026-0142.pdf",
    options);

// Result: ISO 19005-1 (PDF/A-1B) compliant, OCR-searchable, audit-ready.

Run a search over an existing PDF and emit a new copy with every match visually highlighted for reviewers.

SearchAndHighlight.cs

using LMKit.Document.Pdf;
using LMKit.TextAnalysis;

var engine = new SearchHighlightEngine(@"C:\contracts\msa.pdf");

// Find every mention and produce a highlighted PDF for the reviewer.
SearchHighlightResult r = await engine.HighlightAsync(
    query: "indemnification",
    output: @"C:\out\msa-highlighted.pdf");

Console.WriteLine($"Found {r.Matches.Count} matches across {r.Pages.Count} pages");

Merge several PDFs into a single bundle and OCR the embedded images on every page in place.

MergeAndExtract.cs

using LMKit.Document.Pdf;

// Merge a stack of one-pagers into a single book.
var book = PdfDocument.Merge(
    @"C:\out\book.pdf",
    @"C:\pages\01.pdf",
    @"C:\pages\02.pdf",
    @"C:\pages\03.pdf");

// Extract every embedded image, OCR the ones that need it.
using var doc = new PdfDocument(@"C:\reports\annual.pdf");
var ocr      = new EmbeddedImageOcr(new LMKitOcr());

foreach (var page in doc.Pages)
{
    await ocr.RunAsync(page);    // updates page text in-place
}

Headless automation

Every operation is also an agent tool.

The same toolkit is registered as built-in agent tools so an LLM can drive it. Available tools include pdf_extract, pdf_merge, pdf_split, pdf_search, pdf_search_highlight, pdf_to_image, pdf_unlock, pdf_metadata, pdf_pages, image_to_pdf, eml_to_pdf, plus the conversion family (markdown_to_pdf, markdown_to_docx, markdown_to_html) and OCR (ocr_recognize). Register them on any agent and let it run document workflows end-to-end.

Related capabilities

PDF toolkit plus the rest.

PDF/A conversion

PdfAConverter turns existing PDFs into archival PDF/A-1b / 2b / 3b: fonts embedded, colours calibrated, prohibited constructs removed.

PDF/A conversion page

PDF redaction

PdfRedactor permanently removes text, images, vectors, and annotations under a mark. Content is deleted, not covered, and cannot be recovered.

PDF redaction page

OCR

Searchable-PDF generation runs on top of LMKitOcr or VlmOcr. Pick the engine to match accuracy / speed needs.

OCR page

Document conversion

Markdown to PDF, HTML to Markdown, image to PDF, and the full conversion catalogue.

Conversion page

Document splitting

When a single PDF holds multiple logical documents, semantic splitting separates them by content boundary.

Splitting page

Built-in tools

All pdf_* tools registered out of the box. Compose with ToolPermissionPolicy for safe agent execution.

Tools page

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

One library. Every PDF need.

Get Community Edition Download

Every PDF operation. One SDK.

PdfDocument

PdfSearchableMaker

SearchHighlightEngine

Parse and inspect

PDF to image

Combine PDFs

Split into pages

Full-text search

Search-highlight

Make scans searchable

Encrypted PDFs

Page operations

Text and images

Inspect properties

Generate from images

PDF/A-1B archival

Multipage TIFF in

Permanent redaction

PDF/A conversion

PDF redaction

OCR

Document conversion

Document splitting

Built-in tools

PDF Metadata Inspector

PDF Metadata Inspector walkthrough

PDF Merge Batch

PDF Merge Batch walkthrough

PDF Splitter by Page Range

PDF Splitter by Page Range walkthrough

PDF Pages to Image Thumbnails

PDF Pages to Image Thumbnails walkthrough

PDF to Multi-page TIFF Archive

PDF to Multi-page TIFF Archive walkthrough

PDF Page Rotator

PDF Page Rotator walkthrough

PDF Text Search with Highlights

PDF Text Search with Highlights walkthrough

Searchable PDF from Scans (PDF -> PDF/OCR)

Searchable PDF from Scans (PDF -> PDF/OCR) walkthrough

Multipage TIFF to PDF/A-1B Archive (TIFF -> PDF/OCR)

Multipage TIFF to PDF/A-1B Archive walkthrough

Encrypted PDF Workflows

Encrypted PDF Workflows walkthrough

Render PDF pages to images

Build a multi-format document ingestion pipeline

LMKit.Document.Pdf

`PdfDocument`

`PdfSearchableMaker`

`SearchHighlightEngine`