Multimodal PII Extraction100% On-Device
Automatically detect and extract personally identifiable information from text and images without sending data to external servers. Built-in support for names, emails, SSNs, addresses, and unlimited custom labels. Precise character offsets enable automated redaction and masking for GDPR, CCPA, and HIPAA compliance workflows.
Why PII Detection Matters
Organizations face growing pressure to identify and protect sensitive personal data across documents, emails, forms, and databases while maintaining operational efficiency.
Data Breach Risks
Undetected PII in documents, logs, and communications creates exposure to data breaches. Average breach costs exceed $4.5M, with healthcare and financial sectors facing the highest penalties.
Regulatory Complexity
GDPR, CCPA, HIPAA, and PCI DSS each define different PII categories and impose different requirements. Manual compliance across multiple frameworks is error-prone and costly.
Cloud Privacy Concerns
Sending sensitive data to external APIs for processing creates additional privacy risks, compliance headaches, and potential data sovereignty issues for regulated industries.
LM-Kit PII Extraction: Privacy-First Detection
Process sensitive documents entirely on-device with zero data transmission. Detect 15+ PII types with custom labels, get precise character offsets for automated redaction, and maintain full compliance without compromising privacy or performance.
Intelligent PII Detection Engine
The PiiExtraction class leverages advanced language models to identify and extract personally identifiable information from any content. Unlike regex-based solutions, LM-Kit understands context and semantics, accurately detecting PII even in varied formats, misspellings, and multi-language content.
- 15+ built-in PII entity types including Name, Email, Phone, SSN, Address
- Unlimited custom labels for organization-specific identifiers
- Precise character offsets for every detected entity
- Confidence scores to prioritize high-risk findings
- Guidance property for domain-specific extraction rules
using LMKit.Model; using LMKit.TextAnalysis; var model = LM.LoadFromModelID("phi-3.5-mini"); var pii = new PiiExtraction(model); // Extract all PII from text string content = "Contact Sarah Johnson at " + "[email protected] or (555) 234-5678. " + "SSN: 123-45-6789"; var entities = pii.Extract(content); foreach (var entity in entities) { Console.WriteLine( $"[{entity.Type}] {entity.Value}"); Console.WriteLine( $" Position: {entity.Start}-{entity.End}"); } // Output: // [Name] Sarah Johnson // Position: 8-21 // [Email] [email protected] // Position: 25-42 // [Phone] (555) 234-5678 // Position: 46-60 // [SSN] 123-45-6789 // Position: 67-78
Extract PII from Images & Documents
Process scanned documents, screenshots, photos of IDs, and any image content with integrated OCR and vision model support. The PreferredInferenceModality property lets you choose between text-only, image-only, or full multimodal processing based on your use case.
- Vision model support for direct image understanding
- Optional OcrEngine for traditional OCR on raster content
- Process scanned forms, IDs, receipts, and medical records
- Handle screenshots and photos with embedded PII
- Configurable modality: text, image, or multimodal
using LMKit.Model; using LMKit.TextAnalysis; using LMKit.Data; // Load a vision-capable model var model = LM.LoadFromModelID("gemma-3-4b"); var pii = new PiiExtraction(model); // Configure for multimodal processing pii.PreferredInferenceModality = InferenceModality.Multimodal; // Extract PII from scanned document var scan = new Attachment("patient_form.png"); var entities = await pii.ExtractAsync(scan); foreach (var entity in entities) { Console.WriteLine( $"[{entity.Type}] {entity.Value}"); } // Process ID card photo var idCard = new Attachment("drivers_license.jpg"); var idPii = await pii.ExtractAsync(idCard); // Output: Name, Address, DOB, License Number
Built-in PII Entity Types
Comprehensive detection for the most common personally identifiable information categories, with full support for custom labels.
Name
Full names, first/last names
Email addresses
Phone
Phone numbers, all formats
SSN
Social Security Numbers
Address
Physical addresses
DateOfBirth
Birth dates
URL
Personal URLs, profiles
IP Address
IPv4 and IPv6 addresses
Custom
Define your own labels
Define Custom PII Labels
Add organization-specific identifiers like patient IDs, account numbers, employee IDs, passport numbers, and more.
Custom PII Entity Definitions
Configure exactly which PII types to detect using PiiEntityDefinitions. Combine built-in types with custom labels tailored to your industry and compliance requirements. Use the Guidance property to provide domain-specific extraction instructions.
- Mix built-in types with unlimited custom labels
- PiiEntityDefinition class for full control over detection
- Guidance property for domain-specific rules
- HandleOther option for catching unexpected PII types
- Enable/disable specific types per workflow
using LMKit.TextAnalysis; using static LMKit.TextAnalysis.PiiExtraction; var model = LM.LoadFromModelID("phi-3.5-mini"); // Create with custom entity definitions var definitions = new List<PiiEntityDefinition> { // Built-in types new(PiiEntityType.Name), new(PiiEntityType.Email), new(PiiEntityType.SSN), // Custom labels for healthcare new("PatientID", "Medical record numbers"), new("InsuranceID", "Insurance policy numbers"), new("MRN", "Medical record numbers") }; var pii = new PiiExtraction(model, definitions); // Add domain-specific guidance pii.Guidance = "Focus on healthcare identifiers. " + "MRN format is 'MRN-' followed by 8 digits."; var entities = pii.Extract(medicalRecord); // Results include both built-in and custom types
Batch Processing for Scale
Process thousands of documents efficiently with async APIs and parallel execution. The batch PII extraction demo shows how to scan entire directories of files, generating comprehensive reports of detected PII across your document corpus.
- Async/await pattern for non-blocking execution
- Process entire directories with parallel scanning
- CancellationToken support for graceful termination
- MaxContextLength control for memory optimization
- Generate audit reports with file-level findings
using LMKit.TextAnalysis; var model = LM.LoadFromModelID("phi-3.5-mini"); var pii = new PiiExtraction(model); // Configure for batch processing pii.MaxContextLength = 4096; var files = Directory.GetFiles( "./documents", "*.txt"); var results = new Dictionary<string, List<PiiEntity>>(); await Parallel.ForEachAsync(files, async (file, ct) => { var content = await File.ReadAllTextAsync( file, ct); var entities = await pii.ExtractAsync( content, ct); lock (results) results[file] = entities.ToList(); }); // Generate compliance report foreach (var (file, entities) in results) { Console.WriteLine($"{file}: {entities.Count} PII"); }
Built for Privacy Regulations
LM-Kit PII Extraction helps organizations meet data protection requirements across major regulatory frameworks.
EU General Data Protection Regulation
Detect personal data as defined by GDPR Article 4: names, email addresses, location data, online identifiers, and any information relating to an identified or identifiable person.
- Data discovery for Subject Access Requests
- Right to Erasure compliance
- On-device processing for data sovereignty
California Consumer Privacy Act
Identify personal information under CCPA's broad definition: information that identifies, relates to, or could reasonably be linked with a particular consumer or household.
- Consumer data disclosure requests
- Right to deletion workflows
- Audit trail generation
Health Insurance Portability & Accountability
Detect Protected Health Information (PHI) including patient names, medical record numbers, health plan IDs, and any individually identifiable health information.
- PHI detection in medical records
- Custom labels for MRN, insurance IDs
- De-identification support
100% On-Device Processing
Your sensitive data never leaves your infrastructure. Process PII detection entirely locally with zero external API calls.
Zero Data Transmission
No cloud APIs. No external services. Sensitive content stays on your device.
Data Sovereignty
Meet jurisdictional requirements by keeping data within geographic boundaries.
Sub-50ms Latency
No network round trips. Instant detection for real-time redaction workflows.
Audit Ready
Demonstrate privacy compliance. No third-party data sharing to document.
Real-World Use Cases
Organizations across industries deploy LM-Kit PII Extraction to automate compliance, protect customer data, and reduce manual review overhead.
Document Redaction
Automatically mask PII in contracts, reports, and correspondence before sharing externally. Character offsets enable precise redaction without manual review.
Email Scanning
Scan outbound emails and attachments for accidental PII disclosure. Block or flag messages containing sensitive data before they leave your network.
Data Loss Prevention
Integrate with DLP systems to detect PII in file uploads, cloud storage, and database exports. Prevent sensitive data from leaving secure environments.
Training Data Sanitization
Remove PII from datasets before model training. Ensure AI systems are not trained on personally identifiable information that could be extracted later.
Customer Support Logs
Scan support tickets and chat transcripts for PII before archival or analytics. Protect customer privacy while maintaining useful service records.
Subject Access Requests
Locate all PII related to a specific individual across your document corpus. Accelerate GDPR Subject Access Request fulfillment from weeks to hours.
API Reference
Complete documentation for the PiiExtraction class and related types.
PiiExtraction
Main class for extracting PII from text and images. Initialize with model and optional entity definitions.
View DocsExtract()
Synchronously extract PII entities from text content. Returns list of detected entities with positions.
View DocsExtractAsync()
Asynchronously extract PII with cancellation support. Accepts text strings or image attachments.
View DocsPiiEntityDefinitions
Configure which PII types to detect. Combine built-in types with custom labels.
View DocsGuidance
Provide domain-specific instructions for extraction. Improve accuracy for specialized content.
View DocsConfidence
Get the confidence score of the last extraction operation to prioritize findings.
View DocsPreferredInferenceModality
Control processing mode: text-only, image-only, or multimodal for different input types.
View DocsOcrEngine
Optional OCR engine for traditional text extraction from raster images before PII detection.
View DocsMaxContextLength
Control token limit for processing. Optimize memory usage for large documents.
View DocsProtect Sensitive Data Today
Multimodal PII detection. Custom labels. 100% on-device. Start building privacy-compliant applications with LM-Kit.NET.