Solutions · Document Intelligence · PDF toolkit

Every PDF operation. One SDK.

Most .NET teams pull in three or four libraries to handle PDFs: one to parse, one to render, one to write, one to OCR. LM-Kit ships a complete PDF toolkit in a single NuGet: PdfDocument for read / write / render, PdfSearchableMaker for OCR-stamped searchable PDFs, search-highlight engine for visual locate, and 15+ built-in agent tools for headless automation.

Read & write OCR-searchable Search & highlight

PdfDocument

Parse, render, manipulate. File, stream, or byte-array input.

PdfSearchableMaker

Stamp invisible OCR text layer onto scanned PDFs. Searchable, copyable, indexable.

SearchHighlightEngine

Locate text and produce a marked-up PDF with visible highlights. Drives find-in-document UIs.

What's in the toolkit

Twelve capabilities, one library.

Every operation below is exposed both as a high-level .NET API and as a built-in agent tool (pdf_*) so an agent can perform it autonomously.

Read

Parse and inspect

Open PDFs from file, stream, or byte[]. Inspect metadata (title, author, permissions). Iterate pages.

Render

PDF to image

Render any page at any DPI. Drives thumbnails, page previews, and vision-model inputs.

Merge

Combine PDFs

pdf_merge tool plus direct API. Concatenate any number of PDFs into one. Preserves bookmarks and metadata.

Split

Split into pages

pdf_split for fixed-page splits. Pair with DocumentSplitter for semantic boundary detection (multi-document scans).

Search

Full-text search

pdf_search finds matches by phrase or regex. Returns page numbers and per-match positions.

Highlight

Search-highlight

SearchHighlightEngine returns a marked-up PDF with visible highlights at every match. Drives in-app find-and-show UIs.

Searchable

Make scans searchable

PdfSearchableMaker runs OCR and embeds an invisible text layer. The result looks identical and is indexable / copyable.

Unlock

Encrypted PDFs

pdf_unlock opens password-protected PDFs given the password. Useful for legitimate access to protected archives.

Pages

Page operations

Rotate, delete, flatten annotations, set orientation. Inspect page count and per-page metadata via pdf_pages.

Extract

Text and images

pdf_extract pulls text and embedded images out of any PDF. Pair with EmbeddedImageOcr for OCR over images embedded in text-layer PDFs.

Metadata

Inspect properties

pdf_metadata reads title, author, subject, keywords, creation date, encryption status, page count.

Build

Generate from images

ImageToPdf wraps one or many images into a PDF. Pair with ImageToSearchablePdf to add OCR text in one pass.

Real PDF code

Three working pipelines.

Headless automation

Every operation is also an agent tool.

The same toolkit is registered as built-in agent tools so an LLM can drive it. Available tools include pdf_extract, pdf_merge, pdf_split, pdf_search, pdf_search_highlight, pdf_to_image, pdf_unlock, pdf_metadata, pdf_pages, image_to_pdf, eml_to_pdf, plus the conversion family (markdown_to_pdf, markdown_to_docx, markdown_to_html) and OCR (ocr_recognize). Register them on any agent and let it run document workflows end-to-end.

Related capabilities

PDF toolkit plus the rest.

OCR

Searchable-PDF generation runs on top of LMKitOcr or VlmOcr. Pick the engine to match accuracy / speed needs.

OCR page

Document conversion

Markdown to PDF, HTML to Markdown, image to PDF, and the full conversion catalogue.

Conversion page

Document splitting

When a single PDF holds multiple logical documents, semantic splitting separates them by content boundary.

Splitting page

Built-in tools

All pdf_* tools registered out of the box. Compose with ToolPermissionPolicy for safe agent execution.

Tools page

One library. Every PDF need.

Get Community Edition Download