GDPR
EU personal-data regulation. Local detection means no data leaves the box.
Automatically detect and extract personally identifiable information from text and images without sending data to external servers. Built-in support for names, emails, SSNs, addresses, and unlimited custom labels. Precise character offsets enable automated redaction and masking for GDPR, CCPA, and HIPAA compliance workflows.
EU personal-data regulation. Local detection means no data leaves the box.
California consumer privacy act. On-device PII redaction by default.
US healthcare PHI protection. Air-gapped processing supported.
Payment card data security. Detect and mask card numbers locally.
Organizations face growing pressure to identify and protect sensitive personal data across documents, emails, forms, and databases while maintaining operational efficiency.
Risk
Undetected PII in documents, logs, and communications creates exposure to data breaches. Average breach costs exceed $4.5M, with healthcare and financial sectors facing the highest penalties.
Compliance
GDPR, CCPA, HIPAA, and PCI DSS each define different PII categories and impose different requirements. Manual compliance across multiple frameworks is error-prone and costly.
Privacy
Sending sensitive data to external APIs for processing creates additional privacy risks, compliance headaches, and potential data sovereignty issues for regulated industries.
Process sensitive documents entirely on-device with zero data transmission. Detect 15+ PII types with custom labels, get precise character offsets for automated redaction, and maintain full compliance without compromising privacy or performance.
The PiiExtraction class leverages advanced language models to identify and extract personally identifiable information from any content. Unlike regex-based solutions, LM-Kit understands context and semantics, accurately detecting PII even in varied formats, misspellings, and multi-language content.
Guidance property for domain-specific extraction rulesusing LMKit.Model; using LMKit.TextAnalysis; var model = LM.LoadFromModelID("qwen3.5:4b"); var pii = new PiiExtraction(model); // Extract all PII from text string content = "Contact Sarah Johnson at " + "[email protected] or (555) 234-5678. " + "SSN: 123-45-6789"; var entities = pii.Extract(content); foreach (var entity in entities) { Console.WriteLine($"[{entity.Type}] {entity.Value}"); }
Process scanned documents, screenshots, photos of IDs, and any image content with integrated OCR and vision model support. The PreferredInferenceModality property lets you choose between text-only, image-only, or full multimodal processing based on your use case.
OcrEngine for traditional OCR on raster contentusing LMKit.Model; using LMKit.TextAnalysis; using LMKit.Data; // Load a vision-capable model var model = LM.LoadFromModelID("qwen3.5:4b"); var pii = new PiiExtraction(model); // Configure for multimodal processing pii.PreferredInferenceModality = InferenceModality.Multimodal; // Extract PII from scanned document var scan = new Attachment("patient_form.png"); var entities = await pii.ExtractAsync(scan);
Comprehensive detection for the most common personally identifiable information categories, with full support for custom labels.
Type
Full names, first/last names
Type
Email addresses
Type
Phone numbers, all formats
Type
Social Security Numbers
Type
Physical addresses
Type
Birth dates
Type
Personal URLs, profiles
Type
IPv4 and IPv6 addresses
Type
Define your own labels
Add organization-specific identifiers like patient IDs, account numbers, employee IDs, passport numbers, and more.
Configure exactly which PII types to detect using PiiEntityDefinitions. Combine built-in types with custom labels tailored to your industry and compliance requirements. Use the Guidance property to provide domain-specific extraction instructions.
PiiEntityDefinition class for full control over detectionGuidance property for domain-specific rulesHandleOther option for catching unexpected PII typesusing LMKit.TextAnalysis; using static LMKit.TextAnalysis.PiiExtraction; var model = LM.LoadFromModelID("qwen3.5:4b"); // Create with custom entity definitions var definitions = new List<PiiEntityDefinition> { // Built-in types new(PiiEntityType.Name), new(PiiEntityType.Email), new(PiiEntityType.SSN), // Custom labels for healthcare new("PatientID", "Medical record numbers") };
Process thousands of documents efficiently with async APIs and parallel execution. The batch PII extraction demo shows how to scan entire directories of files, generating comprehensive reports of detected PII across your document corpus.
CancellationToken support for graceful terminationMaxContextLength control for memory optimizationusing LMKit.TextAnalysis; var model = LM.LoadFromModelID("qwen3.5:4b"); var pii = new PiiExtraction(model); // Configure for batch processing pii.MaxContextLength = 4096; var files = Directory.GetFiles("./documents", "*.txt"); var results = new Dictionary<string, List<PiiEntity>>(); await Parallel.ForEachAsync(files, async (file, ct) => { var content = await File.ReadAllTextAsync(file, ct); var entities = await pii.ExtractAsync(content, ct); results[file] = entities.ToList(); });
LM-Kit PII Extraction helps organizations meet data protection requirements across major regulatory frameworks.
GDPR
Detect personal data as defined by GDPR Article 4: names, email addresses, location data, online identifiers, and any information relating to an identified or identifiable person.
CCPA
Identify personal information under CCPA's broad definition: information that identifies, relates to, or could reasonably be linked with a particular consumer or household.
HIPAA
Detect Protected Health Information (PHI) including patient names, medical record numbers, health plan IDs, and any individually identifiable health information.
Your sensitive data never leaves your infrastructure. Process PII detection entirely locally with zero external API calls.
01
No cloud APIs. No external services. Sensitive content stays on your device.
02
Meet jurisdictional requirements by keeping data within geographic boundaries.
03
No network round trips. Instant detection for real-time redaction workflows.
04
Demonstrate privacy compliance. No third-party data sharing to document.
Organizations across industries deploy LM-Kit PII Extraction to automate compliance, protect customer data, and reduce manual review overhead.
Redaction
Automatically mask PII in contracts, reports, and correspondence before sharing externally. Character offsets enable precise redaction without manual review.
Scan outbound emails and attachments for accidental PII disclosure. Block or flag messages containing sensitive data before they leave your network.
DLP
Integrate with DLP systems to detect PII in file uploads, cloud storage, and database exports. Prevent sensitive data from leaving secure environments.
Training
Remove PII from datasets before model training. Ensure AI systems are not trained on personally identifiable information that could be extracted later.
Support
Scan support tickets and chat transcripts for PII before archival or analytics. Protect customer privacy while maintaining useful service records.
SAR
Locate all PII related to a specific individual across your document corpus. Accelerate GDPR Subject Access Request fulfillment from weeks to hours.
Complete documentation for the PiiExtraction class and related types.
PiiExtractionMain class for extracting PII from text and images. Initialize with model and optional entity definitions.
ExtractSynchronously extract PII entities from text content. Returns list of detected entities with positions.
ExtractAsyncAsynchronously extract PII with cancellation support. Accepts text strings or image attachments.
PiiEntityDefinitionsConfigure which PII types to detect. Combine built-in types with custom labels.
GuidanceProvide domain-specific instructions for extraction. Improve accuracy for specialized content.
ConfidenceGet the confidence score of the last extraction operation to prioritize findings.
PreferredInferenceModalityControl processing mode: text-only, image-only, or multimodal for different input types.
OcrEngineOptional OCR engine for traditional text extraction from raster images before PII detection.
MaxContextLengthControl token limit for processing. Optimize memory usage for large documents.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: detect and offset-tag PII for redaction.
Open on GitHub → DemoConsole demo: high-throughput PII detection across a corpus.
Open on GitHub → How-to guideStandard PII types, custom labels, redaction patterns.
Read the guide → API referenceAPI reference for the PiiExtraction class.
Open the reference →Multimodal PII detection. Custom labels. 100% on-device. Start building privacy-compliant applications with LM-Kit.NET.