Solutions · Text Analysis · PII

Multimodal PII extraction 100% on-device.

Automatically detect and extract personally identifiable information from text and images without sending data to external servers. Built-in support for names, emails, SSNs, addresses, and unlimited custom labels. Precise character offsets enable automated redaction and masking for GDPR, CCPA, and HIPAA compliance workflows.

15+ built-in types Multimodal 100% on-device
Standard

GDPR

EU personal-data regulation. Local detection means no data leaves the box.

Standard

CCPA

California consumer privacy act. On-device PII redaction by default.

Standard

HIPAA

US healthcare PHI protection. Air-gapped processing supported.

Standard

PCI DSS

Payment card data security. Detect and mask card numbers locally.

Problem

Why PII detection matters.

Organizations face growing pressure to identify and protect sensitive personal data across documents, emails, forms, and databases while maintaining operational efficiency.

Risk

Data breach risks

Undetected PII in documents, logs, and communications creates exposure to data breaches. Average breach costs exceed $4.5M, with healthcare and financial sectors facing the highest penalties.

Compliance

Regulatory complexity

GDPR, CCPA, HIPAA, and PCI DSS each define different PII categories and impose different requirements. Manual compliance across multiple frameworks is error-prone and costly.

Privacy

Cloud privacy concerns

Sending sensitive data to external APIs for processing creates additional privacy risks, compliance headaches, and potential data sovereignty issues for regulated industries.

LM-Kit PII Extraction: privacy-first detection

Process sensitive documents entirely on-device with zero data transmission. Detect 15+ PII types with custom labels, get precise character offsets for automated redaction, and maintain full compliance without compromising privacy or performance.

Class detail

Intelligent PII detection engine.

The PiiExtraction class leverages advanced language models to identify and extract personally identifiable information from any content. Unlike regex-based solutions, LM-Kit understands context and semantics, accurately detecting PII even in varied formats, misspellings, and multi-language content.

  • 15+ built-in PII entity types including Name, Email, Phone, SSN, Address
  • Unlimited custom labels for organization-specific identifiers
  • Precise character offsets for every detected entity
  • Confidence scores to prioritize high-risk findings
  • Guidance property for domain-specific extraction rules
PiiBasic.cs
using LMKit.Model;
using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("qwen3.5:4b");
var pii = new PiiExtraction(model);

// Extract all PII from text
string content = "Contact Sarah Johnson at " +
    "[email protected] or (555) 234-5678. " +
    "SSN: 123-45-6789";

var entities = pii.Extract(content);

foreach (var entity in entities)
{
    Console.WriteLine($"[{entity.Type}] {entity.Value}");
}
Multimodal

Extract PII from images & documents.

Process scanned documents, screenshots, photos of IDs, and any image content with integrated OCR and vision model support. The PreferredInferenceModality property lets you choose between text-only, image-only, or full multimodal processing based on your use case.

  • Vision model support for direct image understanding
  • Optional OcrEngine for traditional OCR on raster content
  • Process scanned forms, IDs, receipts, and medical records
  • Handle screenshots and photos with embedded PII
  • Configurable modality: text, image, or multimodal
PiiFromImage.cs
using LMKit.Model;
using LMKit.TextAnalysis;
using LMKit.Data;

// Load a vision-capable model
var model = LM.LoadFromModelID("qwen3.5:4b");
var pii = new PiiExtraction(model);

// Configure for multimodal processing
pii.PreferredInferenceModality = InferenceModality.Multimodal;

// Extract PII from scanned document
var scan = new Attachment("patient_form.png");
var entities = await pii.ExtractAsync(scan);
Built-in types

Built-in PII entity types.

Comprehensive detection for the most common personally identifiable information categories, with full support for custom labels.

Type

Name

Full names, first/last names

Type

Email

Email addresses

Type

Phone

Phone numbers, all formats

Type

SSN

Social Security Numbers

Type

Address

Physical addresses

Type

DateOfBirth

Birth dates

Type

URL

Personal URLs, profiles

Type

IP Address

IPv4 and IPv6 addresses

Define custom PII labels

Add organization-specific identifiers like patient IDs, account numbers, employee IDs, passport numbers, and more.

Customization

Custom PII entity definitions.

Configure exactly which PII types to detect using PiiEntityDefinitions. Combine built-in types with custom labels tailored to your industry and compliance requirements. Use the Guidance property to provide domain-specific extraction instructions.

  • Mix built-in types with unlimited custom labels
  • PiiEntityDefinition class for full control over detection
  • Guidance property for domain-specific rules
  • HandleOther option for catching unexpected PII types
  • Enable/disable specific types per workflow
PiiCustom.cs
using LMKit.TextAnalysis;
using static LMKit.TextAnalysis.PiiExtraction;

var model = LM.LoadFromModelID("qwen3.5:4b");

// Create with custom entity definitions
var definitions = new List<PiiEntityDefinition>
{
    // Built-in types
    new(PiiEntityType.Name),
    new(PiiEntityType.Email),
    new(PiiEntityType.SSN),
    // Custom labels for healthcare
    new("PatientID", "Medical record numbers")
};
Scale

Batch processing for scale.

Process thousands of documents efficiently with async APIs and parallel execution. The batch PII extraction demo shows how to scan entire directories of files, generating comprehensive reports of detected PII across your document corpus.

  • Async/await pattern for non-blocking execution
  • Process entire directories with parallel scanning
  • CancellationToken support for graceful termination
  • MaxContextLength control for memory optimization
  • Generate audit reports with file-level findings
PiiBatch.cs
using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("qwen3.5:4b");
var pii = new PiiExtraction(model);

// Configure for batch processing
pii.MaxContextLength = 4096;

var files = Directory.GetFiles("./documents", "*.txt");
var results = new Dictionary<string, List<PiiEntity>>();

await Parallel.ForEachAsync(files, async (file, ct) =>
{
    var content = await File.ReadAllTextAsync(file, ct);
    var entities = await pii.ExtractAsync(content, ct);
    results[file] = entities.ToList();
});
Compliance

Built for privacy regulations.

LM-Kit PII Extraction helps organizations meet data protection requirements across major regulatory frameworks.

GDPR

EU General Data Protection Regulation

Detect personal data as defined by GDPR Article 4: names, email addresses, location data, online identifiers, and any information relating to an identified or identifiable person.

  • Data discovery for Subject Access Requests
  • Right to Erasure compliance
  • On-device processing for data sovereignty

CCPA

California Consumer Privacy Act

Identify personal information under CCPA's broad definition: information that identifies, relates to, or could reasonably be linked with a particular consumer or household.

  • Consumer data disclosure requests
  • Right to deletion workflows
  • Audit trail generation

HIPAA

Health Insurance Portability & Accountability

Detect Protected Health Information (PHI) including patient names, medical record numbers, health plan IDs, and any individually identifiable health information.

  • PHI detection in medical records
  • Custom labels for MRN, insurance IDs
  • De-identification support
Privacy

100% on-device processing.

Your sensitive data never leaves your infrastructure. Process PII detection entirely locally with zero external API calls.

01

Zero data transmission

No cloud APIs. No external services. Sensitive content stays on your device.

02

Data sovereignty

Meet jurisdictional requirements by keeping data within geographic boundaries.

03

Sub-50ms latency

No network round trips. Instant detection for real-time redaction workflows.

04

Audit ready

Demonstrate privacy compliance. No third-party data sharing to document.

Applications

Real-world use cases.

Organizations across industries deploy LM-Kit PII Extraction to automate compliance, protect customer data, and reduce manual review overhead.

Redaction

Document redaction

Automatically mask PII in contracts, reports, and correspondence before sharing externally. Character offsets enable precise redaction without manual review.

Email

Email scanning

Scan outbound emails and attachments for accidental PII disclosure. Block or flag messages containing sensitive data before they leave your network.

DLP

Data loss prevention

Integrate with DLP systems to detect PII in file uploads, cloud storage, and database exports. Prevent sensitive data from leaving secure environments.

Training

Training data sanitization

Remove PII from datasets before model training. Ensure AI systems are not trained on personally identifiable information that could be extracted later.

Support

Customer support logs

Scan support tickets and chat transcripts for PII before archival or analytics. Protect customer privacy while maintaining useful service records.

SAR

Subject Access Requests

Locate all PII related to a specific individual across your document corpus. Accelerate GDPR Subject Access Request fulfillment from weeks to hours.

Developer Resources

API reference.

Complete documentation for the PiiExtraction class and related types.

PiiExtraction

Main class for extracting PII from text and images. Initialize with model and optional entity definitions.

View docs

Extract

Synchronously extract PII entities from text content. Returns list of detected entities with positions.

View docs

ExtractAsync

Asynchronously extract PII with cancellation support. Accepts text strings or image attachments.

View docs

PiiEntityDefinitions

Configure which PII types to detect. Combine built-in types with custom labels.

View docs

Guidance

Provide domain-specific instructions for extraction. Improve accuracy for specialized content.

View docs

Confidence

Get the confidence score of the last extraction operation to prioritize findings.

View docs

PreferredInferenceModality

Control processing mode: text-only, image-only, or multimodal for different input types.

View docs

OcrEngine

Optional OCR engine for traditional text extraction from raster images before PII detection.

View docs

MaxContextLength

Control token limit for processing. Optimize memory usage for large documents.

View docs

Protect sensitive data today.

Multimodal PII detection. Custom labels. 100% on-device. Start building privacy-compliant applications with LM-Kit.NET.

Download free API documentation