Solutions · Document Intelligence · PDF toolkit

Every PDF operation. One SDK.

Most .NET teams pull in three or four libraries to handle PDFs: one to parse, one to render, one to write, one to OCR. LM-Kit ships a complete PDF toolkit in a single NuGet: PdfDocument for read / write / render, PdfSearchableMaker for OCR-stamped searchable PDFs, search-highlight engine for visual locate, and 15+ built-in agent tools for headless automation.

Read & write OCR-searchable Search & highlight

PdfDocument

Parse, render, manipulate. File, stream, or byte-array input.

PdfSearchableMaker

Stamp invisible OCR text layer onto scanned PDFs. Searchable, copyable, indexable.

SearchHighlightEngine

Locate text and produce a marked-up PDF with visible highlights. Drives find-in-document UIs.

What's in the toolkit

Twelve capabilities, one library.

Every operation below is exposed both as a high-level .NET API and as a built-in agent tool (pdf_*) so an agent can perform it autonomously.

Read

Parse and inspect

Open PDFs from file, stream, or byte[]. Inspect metadata (title, author, permissions). Iterate pages.

Render

PDF to image

Render any page at any DPI. Drives thumbnails, page previews, and vision-model inputs.

Merge

Combine PDFs

pdf_merge tool plus direct API. Concatenate any number of PDFs into one. Preserves bookmarks and metadata.

Split

Split into pages

pdf_split for fixed-page splits. Pair with DocumentSplitter for semantic boundary detection (multi-document scans).

Search

Full-text search

pdf_search finds matches by phrase or regex. Returns page numbers and per-match positions.

Highlight

Search-highlight

SearchHighlightEngine returns a marked-up PDF with visible highlights at every match. Drives in-app find-and-show UIs.

Searchable

Make scans searchable

PdfSearchableMaker runs OCR and embeds an invisible text layer. The result looks identical and is indexable / copyable.

Unlock

Encrypted PDFs

pdf_unlock opens password-protected PDFs given the password. Useful for legitimate access to protected archives.

Pages

Page operations

Rotate, delete, flatten annotations, set orientation. Inspect page count and per-page metadata via pdf_pages.

Extract

Text and images

pdf_extract pulls text and embedded images out of any PDF. Pair with EmbeddedImageOcr for OCR over images embedded in text-layer PDFs.

Metadata

Inspect properties

pdf_metadata reads title, author, subject, keywords, creation date, encryption status, page count.

Build

Generate from images

ImageToPdf wraps one or many images into a PDF. Pair with ImageToSearchablePdf to add OCR text in one pass.

Archive

PDF/A-1B archival

PdfGenerationOptions.Version = PdfA1b emits ISO 19005-1 archival PDFs. Supports PDF/A-1B, 2B, and 3B with full XMP metadata and an OCR text layer in the same pass.

TIFF

Multipage TIFF in

ImageToSearchablePdf.ConvertAsync ingests multipage TIFFs straight from scanners and fax servers and emits one searchable PDF/A. Per-page OCR runs in parallel.

Real PDF code

Four working pipelines.

Headless automation

Every operation is also an agent tool.

The same toolkit is registered as built-in agent tools so an LLM can drive it. Available tools include pdf_extract, pdf_merge, pdf_split, pdf_search, pdf_search_highlight, pdf_to_image, pdf_unlock, pdf_metadata, pdf_pages, image_to_pdf, eml_to_pdf, plus the conversion family (markdown_to_pdf, markdown_to_docx, markdown_to_html) and OCR (ocr_recognize). Register them on any agent and let it run document workflows end-to-end.

Related capabilities

PDF toolkit plus the rest.

OCR

Searchable-PDF generation runs on top of LMKitOcr or VlmOcr. Pick the engine to match accuracy / speed needs.

OCR page

Document conversion

Markdown to PDF, HTML to Markdown, image to PDF, and the full conversion catalogue.

Conversion page

Document splitting

When a single PDF holds multiple logical documents, semantic splitting separates them by content boundary.

Splitting page

Built-in tools

All pdf_* tools registered out of the box. Compose with ToolPermissionPolicy for safe agent execution.

Tools page

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

PDF Metadata Inspector

Title, author, permissions, page sizes, encryption status via PdfInfo.

Open on GitHub →
Sample

PDF Metadata Inspector walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

PDF Merge Batch

Combine an ordered list of PDFs into a single output with PdfMerger.MergeFiles.

Open on GitHub →
Sample

PDF Merge Batch walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

PDF Splitter by Page Range

Slice one PDF into N smaller PDFs along caller-defined ranges with PdfSplitter.SplitToFiles.

Open on GitHub →
Sample

PDF Splitter by Page Range walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

PDF Pages to Image Thumbnails

Render each page to PNG / JPEG / WebP / BMP / TIFF / TGA / PNM via PdfRenderer.RenderPagesToFolder.

Open on GitHub →
Sample

PDF Pages to Image Thumbnails walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

PDF to Multi-page TIFF Archive

Pack every selected page into one multi-page TIFF via PdfRenderer.RenderPagesToMultipageTiff.

Open on GitHub →
Sample

PDF to Multi-page TIFF Archive walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

PDF Page Rotator

Auto-fix sideways scans or apply a uniform rotation with PdfEditor.ApplyToFile + PageEdit.

Open on GitHub →
Sample

PDF Page Rotator walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

PDF Text Search with Highlights

Layout-aware keyword search returning page index, snippet, and bounding-box rectangles.

Open on GitHub →
Sample

PDF Text Search with Highlights walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

Searchable PDF from Scans (PDF -> PDF/OCR)

Take an image-only scanned PDF and produce a searchable PDF via PdfSearchableMaker.ConvertToFile.

Open on GitHub →
Sample

Searchable PDF from Scans (PDF -> PDF/OCR) walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

Multipage TIFF to PDF/A-1B Archive (TIFF -> PDF/OCR)

Convert scanned multipage TIFFs into searchable PDF/A-1B (ISO 19005-1) archives via ImageToSearchablePdf.ConvertAsync + PdfGenerationOptions.Version = PdfA1b.

Open on GitHub →
Sample

Multipage TIFF to PDF/A-1B Archive walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
Demo

Encrypted PDF Workflows

Inspect, render, search, split, and edit password-protected PDFs end to end with the password flowing through every PDF class.

Open on GitHub →
Sample

Encrypted PDF Workflows walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs →
How-to guide

Render PDF pages to images

PdfToImage vs PdfRenderer, full format matrix, async + cancellation + progress, encrypted PDFs.

Read the guide →
How-to guide

Build a multi-format document ingestion pipeline

PDF + DOCX + HTML + EML through one pipeline.

Read the guide →
API reference

LMKit.Document.Pdf

API reference for the PDF toolkit namespace.

Open the reference →

One library. Every PDF need.

Get Community Edition Download