DetectLanguage(text)
Synchronous detection on a string.
Detect the language of plain text, image content, and document attachments
in milliseconds. The TextTranslation engine combines an LLM
backbone with deterministic script refiners (CJK, Cyrillic, Slavic) to
deliver accurate identification across 100+ languages, with confidence
scores you can trust.
DetectLanguage(text)Synchronous detection on a string.
DetectLanguage(attachment)Detection over PDF, Word, image, or any Attachment.
DetectLanguage(text, candidates)Constrained detection over a known candidate set for speed.
DetectLanguageAsync(...)Async overloads for all of the above.
Generic LLM language detection is good enough for European languages but
struggles with closely related scripts: Simplified vs Traditional Chinese,
Japanese vs Korean, Russian vs Ukrainian, Croatian vs Serbian.
TextTranslation wraps the LLM with three deterministic refiners
that resolve those edge cases via Unicode block analysis and statistical
lexicon scoring.
Disambiguates Simplified Chinese, Traditional Chinese, Japanese, and Korean by inspecting Unicode block membership of each character. Resolves cases where the LLM predicts "Chinese" but cannot tell which variant.
Distinguishes Russian, Ukrainian, Belarusian, Bulgarian, and other Cyrillic-script languages by counting language-specific characters and diacritics that don't appear in neighbouring orthographies.
Lexicon-based scoring across Croatian, Serbian, Slovenian, Polish, Czech,
and Slovak. Configurable thresholds (minWordCount,
minWordSingleLanguageCount, score threshold) for tuning
precision vs recall.
Every call surfaces a Confidence property (0.0 to 1.0) on the
TextTranslation instance after detection. Use it to gate
downstream behaviour: route certain documents to human review, fall back to
a default locale, or trigger a second pass with a candidate set.
Trust the result. Forward the document to language-specific pipelines (NER, sentiment, embeddings) without further checks.
Ambiguous content. Re-run with a narrow candidate set or process with both likely languages and merge results.
Likely too short, mixed-language, or noisy. Fall back to a default locale, flag for review, or skip translation entirely.
From a single string to a multi-page PDF, the same engine handles every input shape.
using LMKit.Model; using LMKit.Translation; using LMKit.TextGeneration; var model = LM.LoadFromModelID("qwen3.5:4b"); var translator = new TextTranslation(model); string text = "L'intelligenza artificiale sta trasformando il nostro modo di lavorare."; Language detected = translator.DetectLanguage(text); Console.WriteLine($"Detected: {detected}"); Console.WriteLine($"Confidence: {translator.Confidence:P1}"); // Detected: Italian // Confidence: 99.2%
using LMKit.Model; using LMKit.Translation; using LMKit.TextGeneration; var translator = new TextTranslation(model); // We know the document is one of three Slavic languages. // Constrain detection to skip the wider scan. var candidates = new List<Language> { Language.Polish, Language.Czech, Language.Slovak }; string text = "Sztuczna inteligencja zmienia sposób, w jaki pracujemy."; Language result = translator.DetectLanguage(text, candidates); // result == Language.Polish
using LMKit.Model; using LMKit.Translation; using LMKit.Data; // Vision-capable model required for image input. var model = LM.LoadFromModelID("gemma4:e4b"); var translator = new TextTranslation(model); var attachment = new Attachment("document.pdf"); Language detected = await translator.DetectLanguageAsync(attachment); Console.WriteLine($"Document language: {detected}"); Console.WriteLine($"Confidence: {translator.Confidence:P1}");
Route incoming documents to language-specific NER, sentiment, and OCR models so each pipeline runs on optimised tooling.
Auto-detect the language of incoming tickets and chat messages so triage and response generation use the right model and prompt template.
Apply locale-specific moderation rules (legal frameworks, vocabulary lists, profanity dictionaries) only after detecting the user's language.
Skip translation when the detected language already matches the target. Trim cloud-API spend by 30 to 60% in mixed corpora.
Index multilingual documents into language-specific shards. Apply BM25 with language-aware tokenizers for precision retrieval.
Detect language locally without sending content to a cloud detector. Critical for HIPAA, GDPR, and air-gapped deployments.
TextTranslationMain class hosting both detection and translation. DetectLanguage and DetectLanguageAsync with overloads for text, candidate sets, and attachments.
LanguageEnumeration of all supported languages. Returned by DetectLanguage and accepted by Translate as the target.
ConfidenceRead the confidence (0.0 to 1.0) of the most recent detection from the TextTranslation instance. Use to gate downstream actions.
AttachmentInput container for documents and images. Pass to DetectLanguage for multimodal detection over PDF, Word, Excel, PowerPoint, HTML, and images.
Need translation too? See the Text Translation page for the same engine's translation capabilities.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Zero cloud calls, zero per-request fees. One NuGet package, every platform.