Get Free Community License
Privacy & Compliance

Multimodal PII Extraction100% On-Device

Automatically detect and extract personally identifiable information from text and images without sending data to external servers. Built-in support for names, emails, SSNs, addresses, and unlimited custom labels. Precise character offsets enable automated redaction and masking for GDPR, CCPA, and HIPAA compliance workflows.

Text + Images Custom Labels Batch Processing
Real-Time PII Detection
8 entity types detected in 47ms
Contact Sarah Johnson at [email protected] or (555) 234-5678. SSN: 123-45-6789. Address: 742 Maple Street, Boston MA 02101
Name [0:13]
Phone [40:54]
SSN [61:72]
Address [84:115]
GDPR Ready
CCPA Compliant
HIPAA Support
15+
Built-in PII Types
100%
On-Device
<50ms
Avg. Latency

Why PII Detection Matters

Organizations face growing pressure to identify and protect sensitive personal data across documents, emails, forms, and databases while maintaining operational efficiency.

Data Breach Risks

Undetected PII in documents, logs, and communications creates exposure to data breaches. Average breach costs exceed $4.5M, with healthcare and financial sectors facing the highest penalties.

Regulatory Complexity

GDPR, CCPA, HIPAA, and PCI DSS each define different PII categories and impose different requirements. Manual compliance across multiple frameworks is error-prone and costly.

Cloud Privacy Concerns

Sending sensitive data to external APIs for processing creates additional privacy risks, compliance headaches, and potential data sovereignty issues for regulated industries.

LM-Kit PII Extraction: Privacy-First Detection

Process sensitive documents entirely on-device with zero data transmission. Detect 15+ PII types with custom labels, get precise character offsets for automated redaction, and maintain full compliance without compromising privacy or performance.

Core Extraction

Intelligent PII Detection Engine

The PiiExtraction class leverages advanced language models to identify and extract personally identifiable information from any content. Unlike regex-based solutions, LM-Kit understands context and semantics, accurately detecting PII even in varied formats, misspellings, and multi-language content.

  • 15+ built-in PII entity types including Name, Email, Phone, SSN, Address
  • Unlimited custom labels for organization-specific identifiers
  • Precise character offsets for every detected entity
  • Confidence scores to prioritize high-risk findings
  • Guidance property for domain-specific extraction rules
PiiExtraction.cs
using LMKit.Model;
using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("phi-3.5-mini");
var pii = new PiiExtraction(model);

// Extract all PII from text
string content = "Contact Sarah Johnson at " +
                "[email protected] or (555) 234-5678. " +
                "SSN: 123-45-6789";

var entities = pii.Extract(content);

foreach (var entity in entities)
{
    Console.WriteLine(
        $"[{entity.Type}] {entity.Value}");
    Console.WriteLine(
        $"  Position: {entity.Start}-{entity.End}");
}

// Output:
// [Name] Sarah Johnson
//   Position: 8-21
// [Email] [email protected]
//   Position: 25-42
// [Phone] (555) 234-5678
//   Position: 46-60
// [SSN] 123-45-6789
//   Position: 67-78
Multimodal

Extract PII from Images & Documents

Process scanned documents, screenshots, photos of IDs, and any image content with integrated OCR and vision model support. The PreferredInferenceModality property lets you choose between text-only, image-only, or full multimodal processing based on your use case.

  • Vision model support for direct image understanding
  • Optional OcrEngine for traditional OCR on raster content
  • Process scanned forms, IDs, receipts, and medical records
  • Handle screenshots and photos with embedded PII
  • Configurable modality: text, image, or multimodal
MultimodalPii.cs
using LMKit.Model;
using LMKit.TextAnalysis;
using LMKit.Data;

// Load a vision-capable model
var model = LM.LoadFromModelID("gemma-3-4b");
var pii = new PiiExtraction(model);

// Configure for multimodal processing
pii.PreferredInferenceModality = 
    InferenceModality.Multimodal;

// Extract PII from scanned document
var scan = new Attachment("patient_form.png");
var entities = await pii.ExtractAsync(scan);

foreach (var entity in entities)
{
    Console.WriteLine(
        $"[{entity.Type}] {entity.Value}");
}

// Process ID card photo
var idCard = new Attachment("drivers_license.jpg");
var idPii = await pii.ExtractAsync(idCard);

// Output: Name, Address, DOB, License Number

Built-in PII Entity Types

Comprehensive detection for the most common personally identifiable information categories, with full support for custom labels.

Name

Full names, first/last names

Email

Email addresses

Phone

Phone numbers, all formats

SSN

Social Security Numbers

Address

Physical addresses

DateOfBirth

Birth dates

URL

Personal URLs, profiles

IP Address

IPv4 and IPv6 addresses

Custom

Define your own labels

Define Custom PII Labels

Add organization-specific identifiers like patient IDs, account numbers, employee IDs, passport numbers, and more.

PatientID AccountNumber EmployeeID PassportNumber LicenseNumber PolicyNumber MRN
Customization

Custom PII Entity Definitions

Configure exactly which PII types to detect using PiiEntityDefinitions. Combine built-in types with custom labels tailored to your industry and compliance requirements. Use the Guidance property to provide domain-specific extraction instructions.

  • Mix built-in types with unlimited custom labels
  • PiiEntityDefinition class for full control over detection
  • Guidance property for domain-specific rules
  • HandleOther option for catching unexpected PII types
  • Enable/disable specific types per workflow
CustomPiiLabels.cs
using LMKit.TextAnalysis;
using static LMKit.TextAnalysis.PiiExtraction;

var model = LM.LoadFromModelID("phi-3.5-mini");

// Create with custom entity definitions
var definitions = new List<PiiEntityDefinition>
{
    // Built-in types
    new(PiiEntityType.Name),
    new(PiiEntityType.Email),
    new(PiiEntityType.SSN),
    
    // Custom labels for healthcare
    new("PatientID", "Medical record numbers"),
    new("InsuranceID", "Insurance policy numbers"),
    new("MRN", "Medical record numbers")
};

var pii = new PiiExtraction(model, definitions);

// Add domain-specific guidance
pii.Guidance = "Focus on healthcare identifiers. " +
    "MRN format is 'MRN-' followed by 8 digits.";

var entities = pii.Extract(medicalRecord);

// Results include both built-in and custom types
High Volume

Batch Processing for Scale

Process thousands of documents efficiently with async APIs and parallel execution. The batch PII extraction demo shows how to scan entire directories of files, generating comprehensive reports of detected PII across your document corpus.

  • Async/await pattern for non-blocking execution
  • Process entire directories with parallel scanning
  • CancellationToken support for graceful termination
  • MaxContextLength control for memory optimization
  • Generate audit reports with file-level findings
BatchPiiExtraction.cs
using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("phi-3.5-mini");
var pii = new PiiExtraction(model);

// Configure for batch processing
pii.MaxContextLength = 4096;

var files = Directory.GetFiles(
    "./documents", "*.txt");

var results = new Dictionary<string, List<PiiEntity>>();

await Parallel.ForEachAsync(files, 
    async (file, ct) =>
{
    var content = await File.ReadAllTextAsync(
        file, ct);
    var entities = await pii.ExtractAsync(
        content, ct);
    
    lock (results)
        results[file] = entities.ToList();
});

// Generate compliance report
foreach (var (file, entities) in results)
{
    Console.WriteLine($"{file}: {entities.Count} PII");
}

Built for Privacy Regulations

LM-Kit PII Extraction helps organizations meet data protection requirements across major regulatory frameworks.

EU General Data Protection Regulation

Detect personal data as defined by GDPR Article 4: names, email addresses, location data, online identifiers, and any information relating to an identified or identifiable person.

  • Data discovery for Subject Access Requests
  • Right to Erasure compliance
  • On-device processing for data sovereignty

California Consumer Privacy Act

Identify personal information under CCPA's broad definition: information that identifies, relates to, or could reasonably be linked with a particular consumer or household.

  • Consumer data disclosure requests
  • Right to deletion workflows
  • Audit trail generation

Health Insurance Portability & Accountability

Detect Protected Health Information (PHI) including patient names, medical record numbers, health plan IDs, and any individually identifiable health information.

  • PHI detection in medical records
  • Custom labels for MRN, insurance IDs
  • De-identification support

100% On-Device Processing

Your sensitive data never leaves your infrastructure. Process PII detection entirely locally with zero external API calls.

Zero Data Transmission

No cloud APIs. No external services. Sensitive content stays on your device.

Data Sovereignty

Meet jurisdictional requirements by keeping data within geographic boundaries.

Sub-50ms Latency

No network round trips. Instant detection for real-time redaction workflows.

Audit Ready

Demonstrate privacy compliance. No third-party data sharing to document.

Real-World Use Cases

Organizations across industries deploy LM-Kit PII Extraction to automate compliance, protect customer data, and reduce manual review overhead.

Document Redaction

Automatically mask PII in contracts, reports, and correspondence before sharing externally. Character offsets enable precise redaction without manual review.

Email Scanning

Scan outbound emails and attachments for accidental PII disclosure. Block or flag messages containing sensitive data before they leave your network.

Data Loss Prevention

Integrate with DLP systems to detect PII in file uploads, cloud storage, and database exports. Prevent sensitive data from leaving secure environments.

Training Data Sanitization

Remove PII from datasets before model training. Ensure AI systems are not trained on personally identifiable information that could be extracted later.

Customer Support Logs

Scan support tickets and chat transcripts for PII before archival or analytics. Protect customer privacy while maintaining useful service records.

Subject Access Requests

Locate all PII related to a specific individual across your document corpus. Accelerate GDPR Subject Access Request fulfillment from weeks to hours.

API Reference

Complete documentation for the PiiExtraction class and related types.

PiiExtraction

Main class for extracting PII from text and images. Initialize with model and optional entity definitions.

View Docs
Extract()

Synchronously extract PII entities from text content. Returns list of detected entities with positions.

View Docs
ExtractAsync()

Asynchronously extract PII with cancellation support. Accepts text strings or image attachments.

View Docs
PiiEntityDefinitions

Configure which PII types to detect. Combine built-in types with custom labels.

View Docs
Guidance

Provide domain-specific instructions for extraction. Improve accuracy for specialized content.

View Docs
Confidence

Get the confidence score of the last extraction operation to prioritize findings.

View Docs
PreferredInferenceModality

Control processing mode: text-only, image-only, or multimodal for different input types.

View Docs
OcrEngine

Optional OCR engine for traditional text extraction from raster images before PII detection.

View Docs
MaxContextLength

Control token limit for processing. Optimize memory usage for large documents.

View Docs

Protect Sensitive Data Today

Multimodal PII detection. Custom labels. 100% on-device. Start building privacy-compliant applications with LM-Kit.NET.