Privacy & Compliance

Multimodal PII Extraction100% On-Device

Automatically detect and extract personally identifiable information from text and images without sending data to external servers. Built-in support for names, emails, SSNs, addresses, and unlimited custom labels. Precise character offsets enable automated redaction and masking for GDPR, CCPA, and HIPAA compliance workflows.

Start Building Free API Reference

Text + Images Custom Labels Batch Processing

Real-Time PII Detection

8 entity types detected in 47ms

Contact Sarah Johnson at [email protected] or (555) 234-5678. SSN: 123-45-6789. Address: 742 Maple Street, Boston MA 02101

Name [0:13]

Email [18:35]

Phone [40:54]

SSN [61:72]

Address [84:115]

GDPR Ready

CCPA Compliant

HIPAA Support

15+

Built-in PII Types

100%

On-Device

<50ms

Avg. Latency

The Challenge

Why PII Detection Matters

Organizations face growing pressure to identify and protect sensitive personal data across documents, emails, forms, and databases while maintaining operational efficiency.

Data Breach Risks

Undetected PII in documents, logs, and communications creates exposure to data breaches. Average breach costs exceed $4.5M, with healthcare and financial sectors facing the highest penalties.

Regulatory Complexity

GDPR, CCPA, HIPAA, and PCI DSS each define different PII categories and impose different requirements. Manual compliance across multiple frameworks is error-prone and costly.

Cloud Privacy Concerns

Sending sensitive data to external APIs for processing creates additional privacy risks, compliance headaches, and potential data sovereignty issues for regulated industries.

LM-Kit PII Extraction: Privacy-First Detection

Process sensitive documents entirely on-device with zero data transmission. Detect 15+ PII types with custom labels, get precise character offsets for automated redaction, and maintain full compliance without compromising privacy or performance.

Core Extraction

Intelligent PII Detection Engine

The PiiExtraction class leverages advanced language models to identify and extract personally identifiable information from any content. Unlike regex-based solutions, LM-Kit understands context and semantics, accurately detecting PII even in varied formats, misspellings, and multi-language content.

15+ built-in PII entity types including Name, Email, Phone, SSN, Address
Unlimited custom labels for organization-specific identifiers
Precise character offsets for every detected entity
Confidence scores to prioritize high-risk findings
Guidance property for domain-specific extraction rules

API Reference View Demo

PiiExtraction.cs

using LMKit.Model;
using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("phi-3.5-mini");
var pii = new PiiExtraction(model);

// Extract all PII from text
string content = "Contact Sarah Johnson at " +
                "[email protected] or (555) 234-5678. " +
                "SSN: 123-45-6789";

var entities = pii.Extract(content);

foreach (var entity in entities)
{
    Console.WriteLine(
        $"[{entity.Type}] {entity.Value}");
    Console.WriteLine(
        $"  Position: {entity.Start}-{entity.End}");
}

// Output:
// [Name] Sarah Johnson
//   Position: 8-21
// [Email] [email protected]
//   Position: 25-42
// [Phone] (555) 234-5678
//   Position: 46-60
// [SSN] 123-45-6789
//   Position: 67-78

Multimodal

Extract PII from Images & Documents

Process scanned documents, screenshots, photos of IDs, and any image content with integrated OCR and vision model support. The PreferredInferenceModality property lets you choose between text-only, image-only, or full multimodal processing based on your use case.

Vision model support for direct image understanding
Optional OcrEngine for traditional OCR on raster content
Process scanned forms, IDs, receipts, and medical records
Handle screenshots and photos with embedded PII
Configurable modality: text, image, or multimodal

API Reference Batch Demo

                        
                        
                        
                        MultimodalPii.cs
                    

using LMKit.Model;
using LMKit.TextAnalysis;
using LMKit.Data;

// Load a vision-capable model
var model = LM.LoadFromModelID("gemma-3-4b");
var pii = new PiiExtraction(model);

// Configure for multimodal processing
pii.PreferredInferenceModality = 
    InferenceModality.Multimodal;

// Extract PII from scanned document
var scan = new Attachment("patient_form.png");
var entities = await pii.ExtractAsync(scan);

foreach (var entity in entities)
{
    Console.WriteLine(
        $"[{entity.Type}] {entity.Value}");
}

// Process ID card photo
var idCard = new Attachment("drivers_license.jpg");
var idPii = await pii.ExtractAsync(idCard);

// Output: Name, Address, DOB, License Number
                    

Detection Coverage

Built-in PII Entity Types

Comprehensive detection for the most common personally identifiable information categories, with full support for custom labels.

Name

Full names, first/last names

Email

Email addresses

Phone

Phone numbers, all formats

SSN

Social Security Numbers

Address

Physical addresses

DateOfBirth

Birth dates

URL

Personal URLs, profiles

IP Address

IPv4 and IPv6 addresses

Custom

Define your own labels

Define Custom PII Labels

Add organization-specific identifiers like patient IDs, account numbers, employee IDs, passport numbers, and more.

PatientID AccountNumber EmployeeID PassportNumber LicenseNumber PolicyNumber MRN

Customization

Custom PII Entity Definitions

Configure exactly which PII types to detect using PiiEntityDefinitions. Combine built-in types with custom labels tailored to your industry and compliance requirements. Use the Guidance property to provide domain-specific extraction instructions.

Mix built-in types with unlimited custom labels
PiiEntityDefinition class for full control over detection
Guidance property for domain-specific rules
HandleOther option for catching unexpected PII types
Enable/disable specific types per workflow

API Reference

                        
                        
                        
                        CustomPiiLabels.cs
                    

using LMKit.TextAnalysis;
using static LMKit.TextAnalysis.PiiExtraction;

var model = LM.LoadFromModelID("phi-3.5-mini");

// Create with custom entity definitions
var definitions = new List<PiiEntityDefinition>
{
    // Built-in types
    new(PiiEntityType.Name),
    new(PiiEntityType.Email),
    new(PiiEntityType.SSN),
    
    // Custom labels for healthcare
    new("PatientID", "Medical record numbers"),
    new("InsuranceID", "Insurance policy numbers"),
    new("MRN", "Medical record numbers")
};

var pii = new PiiExtraction(model, definitions);

// Add domain-specific guidance
pii.Guidance = "Focus on healthcare identifiers. " +
    "MRN format is 'MRN-' followed by 8 digits.";

var entities = pii.Extract(medicalRecord);

// Results include both built-in and custom types
                    

High Volume

Batch Processing for Scale

Process thousands of documents efficiently with async APIs and parallel execution. The batch PII extraction demo shows how to scan entire directories of files, generating comprehensive reports of detected PII across your document corpus.

Async/await pattern for non-blocking execution
Process entire directories with parallel scanning
CancellationToken support for graceful termination
MaxContextLength control for memory optimization
Generate audit reports with file-level findings

Batch Demo Async API

                        
                        
                        
                        BatchPiiExtraction.cs
                    

using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("phi-3.5-mini");
var pii = new PiiExtraction(model);

// Configure for batch processing
pii.MaxContextLength = 4096;

var files = Directory.GetFiles(
    "./documents", "*.txt");

var results = new Dictionary<string, List<PiiEntity>>();

await Parallel.ForEachAsync(files, 
    async (file, ct) =>
{
    var content = await File.ReadAllTextAsync(
        file, ct);
    var entities = await pii.ExtractAsync(
        content, ct);
    
    lock (results)
        results[file] = entities.ToList();
});

// Generate compliance report
foreach (var (file, entities) in results)
{
    Console.WriteLine($"{file}: {entities.Count} PII");
}
                    

Regulatory Compliance

Built for Privacy Regulations

LM-Kit PII Extraction helps organizations meet data protection requirements across major regulatory frameworks.

EU General Data Protection Regulation

Detect personal data as defined by GDPR Article 4: names, email addresses, location data, online identifiers, and any information relating to an identified or identifiable person.

Data discovery for Subject Access Requests
Right to Erasure compliance
On-device processing for data sovereignty

CCPA

California Consumer Privacy Act

Identify personal information under CCPA's broad definition: information that identifies, relates to, or could reasonably be linked with a particular consumer or household.

Consumer data disclosure requests
Right to deletion workflows
Audit trail generation

HIPAA

Health Insurance Portability & Accountability

Detect Protected Health Information (PHI) including patient names, medical record numbers, health plan IDs, and any individually identifiable health information.

PHI detection in medical records
Custom labels for MRN, insurance IDs
De-identification support

Privacy by Design

100% On-Device Processing

Your sensitive data never leaves your infrastructure. Process PII detection entirely locally with zero external API calls.

Zero Data Transmission

No cloud APIs. No external services. Sensitive content stays on your device.

Data Sovereignty

Meet jurisdictional requirements by keeping data within geographic boundaries.

Sub-50ms Latency

No network round trips. Instant detection for real-time redaction workflows.

Audit Ready

Demonstrate privacy compliance. No third-party data sharing to document.

Applications

Real-World Use Cases

Organizations across industries deploy LM-Kit PII Extraction to automate compliance, protect customer data, and reduce manual review overhead.

Document Redaction

Automatically mask PII in contracts, reports, and correspondence before sharing externally. Character offsets enable precise redaction without manual review.

Email Scanning

Scan outbound emails and attachments for accidental PII disclosure. Block or flag messages containing sensitive data before they leave your network.

Data Loss Prevention

Integrate with DLP systems to detect PII in file uploads, cloud storage, and database exports. Prevent sensitive data from leaving secure environments.

Training Data Sanitization

Remove PII from datasets before model training. Ensure AI systems are not trained on personally identifiable information that could be extracted later.

Customer Support Logs

Scan support tickets and chat transcripts for PII before archival or analytics. Protect customer privacy while maintaining useful service records.

Subject Access Requests

Locate all PII related to a specific individual across your document corpus. Accelerate GDPR Subject Access Request fulfillment from weeks to hours.

Developer Resources

API Reference

Complete documentation for the PiiExtraction class and related types.

PiiExtraction

Main class for extracting PII from text and images. Initialize with model and optional entity definitions.

View Docs

Extract()

Synchronously extract PII entities from text content. Returns list of detected entities with positions.

View Docs

ExtractAsync()

Asynchronously extract PII with cancellation support. Accepts text strings or image attachments.

View Docs

PiiEntityDefinitions

Configure which PII types to detect. Combine built-in types with custom labels.

View Docs

Guidance

Provide domain-specific instructions for extraction. Improve accuracy for specialized content.

View Docs

Confidence

Get the confidence score of the last extraction operation to prioritize findings.

View Docs

PreferredInferenceModality

Control processing mode: text-only, image-only, or multimodal for different input types.

View Docs

OcrEngine

Optional OCR engine for traditional text extraction from raster images before PII detection.

View Docs

MaxContextLength

Control token limit for processing. Optimize memory usage for large documents.

View Docs

Protect Sensitive Data Today

Multimodal PII detection. Custom labels. 100% on-device. Start building privacy-compliant applications with LM-Kit.NET.

Download Free View Code Samples

Multimodal PII Extraction100% On-Device

Why PII Detection Matters

Data Breach Risks

Regulatory Complexity

Cloud Privacy Concerns

LM-Kit PII Extraction: Privacy-First Detection

Intelligent PII Detection Engine

Extract PII from Images & Documents

Built-in PII Entity Types

Name

Email

Phone

SSN

Address

DateOfBirth

URL

IP Address

Custom

Define Custom PII Labels

Custom PII Entity Definitions

Batch Processing for Scale

Built for Privacy Regulations

EU General Data Protection Regulation

California Consumer Privacy Act

Health Insurance Portability & Accountability

100% On-Device Processing

Zero Data Transmission

Data Sovereignty

Sub-50ms Latency

Audit Ready

Real-World Use Cases

Document Redaction

Email Scanning

Data Loss Prevention

Training Data Sanitization

Customer Support Logs

Subject Access Requests

API Reference

Protect Sensitive Data Today

Ready to Build Local AI Agents?