Solutions · Document Intelligence · Email processing

Inboxes are document corpora.

Email archives are some of the largest document corpora any organisation owns: every contract reply, every customer thread, every invoice attachment. LM-Kit parses EML, MBOX, and ICS files natively, extracts headers, bodies, attachments and calendar events, and feeds them into the same RAG, summarisation, classification, and extraction pipelines as any other document.

EML · MBOX · ICS Attachments preserved RAG-ready

EmlDocument

Single email parser. Headers, plain / HTML body, attachments, embedded images.

MboxDocument

Mailbox archive iterator. Stream through years of correspondence without loading it all.

IcsParser

Calendar events. Extract attendees, recurrence, locations, attachments.

What you can do with email

Six high-value workflows.

RAG over inboxes

Index a customer's full thread history into a vector store. Support agents query with grounded answers and source attribution per email.

Compliance archive

Convert MBOX archives to PDF for legal hold, eDiscovery, or regulatory submission. EmlToPdf renders each email as a page.

Auto-triage

Classify incoming email by intent (refund, technical issue, sales lead). Route via SupervisorOrchestrator to the right specialist agent.

Attachment extraction

Pull every PDF, image, or DOCX out of an inbox. Send each through the rest of the document pipeline (OCR, classification, extraction).

Thread summarisation

Long conversation threads collapsed to bullet points. Surface decisions, blockers, deadlines.

Calendar mining

Parse ICS attachments. Extract recurring meetings, attendees, agenda items. Useful for productivity and scheduling agents.

Real email pipelines

Parse, convert, index, classify.

Walk an mbox archive, convert each message to Markdown, and ingest into a DocumentRag index with metadata.

RagOverInbox.cs
using LMKit.Document.Eml;
using LMKit.Retrieval;

// Index every email in a mailbox archive into RAG.
var mbox = new MboxDocument(@"C:\archives\support@example.com.mbox");
var rag  = new DocumentRag(model, embedder);

foreach (EmlDocument eml in mbox.Emails)
{
    var markdown = new EmlToMarkdown().Convert(eml);
    await rag.ImportDocumentAsync(markdown,
        metadata: new()
        {
            Name        = eml.Subject,
            CustomData  = { ["from"] = eml.From, ["date"] = eml.Date.ToString("o") }
        });
}

// Now query across the entire inbox.
var answer = await rag.QueryPartitionsAsync(
    "What did this customer report about the December outage?");
Related capabilities

Email plus the rest.

Document RAG

The retrieval pipeline that powers RAG over inboxes. Source attribution per email.

Document RAG page

Document classification

Auto-triage incoming email by intent. Pluggable categories.

Classification page

Document conversion

EML to PDF, EML to Markdown, MBOX to Markdown. Format catalogue for archive workflows.

Conversion page

Email triage agent demo

End-to-end agent that classifies, drafts, and routes incoming email via SupervisorOrchestrator.

Multi-agent workflows page

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Inboxes are document corpora.

Get Community Edition Download