Solutions · Text Analysis · Language detection

Identify the language of any input.

Detect the language of plain text, image content, and document attachments in milliseconds. The TextTranslation engine combines an LLM backbone with deterministic script refiners (CJK, Cyrillic, Slavic) to deliver accurate identification across 100+ languages, with confidence scores you can trust.

100+ languages Multimodal input Confidence scoring

DetectLanguage(text)

Synchronous detection on a string.

DetectLanguage(attachment)

Detection over PDF, Word, image, or any Attachment.

DetectLanguage(text, candidates)

Constrained detection over a known candidate set for speed.

DetectLanguageAsync(...)

Async overloads for all of the above.

Engine overview

An LLM detector hardened by script refiners.

Generic LLM language detection is good enough for European languages but struggles with closely related scripts: Simplified vs Traditional Chinese, Japanese vs Korean, Russian vs Ukrainian, Croatian vs Serbian. TextTranslation wraps the LLM with three deterministic refiners that resolve those edge cases via Unicode block analysis and statistical lexicon scoring.

CJK refiner

Disambiguates Simplified Chinese, Traditional Chinese, Japanese, and Korean by inspecting Unicode block membership of each character. Resolves cases where the LLM predicts "Chinese" but cannot tell which variant.

Cyrillic refiner

Distinguishes Russian, Ukrainian, Belarusian, Bulgarian, and other Cyrillic-script languages by counting language-specific characters and diacritics that don't appear in neighbouring orthographies.

Slavic refiner

Lexicon-based scoring across Croatian, Serbian, Slovenian, Polish, Czech, and Slovak. Configurable thresholds (minWordCount, minWordSingleLanguageCount, score threshold) for tuning precision vs recall.

Confidence

Trust the result, or verify by score.

Every call surfaces a Confidence property (0.0 to 1.0) on the TextTranslation instance after detection. Use it to gate downstream behaviour: route certain documents to human review, fall back to a default locale, or trigger a second pass with a candidate set.

High confidence (> 0.85)

Trust the result. Forward the document to language-specific pipelines (NER, sentiment, embeddings) without further checks.

Medium confidence (0.55 to 0.85)

Ambiguous content. Re-run with a narrow candidate set or process with both likely languages and merge results.

Low confidence (< 0.55)

Likely too short, mixed-language, or noisy. Fall back to a default locale, flag for review, or skip translation entirely.

Code samples

Three calling patterns.

From a single string to a multi-page PDF, the same engine handles every input shape.

DetectFromText.cs
using LMKit.Model;
using LMKit.Translation;
using LMKit.TextGeneration;

var model = LM.LoadFromModelID("qwen3.5:4b");
var translator = new TextTranslation(model);

string text = "L'intelligenza artificiale sta trasformando il nostro modo di lavorare.";
Language detected = translator.DetectLanguage(text);

Console.WriteLine($"Detected: {detected}");
Console.WriteLine($"Confidence: {translator.Confidence:P1}");
// Detected: Italian
// Confidence: 99.2%
Applications

When language detection matters.

Multilingual ingestion pipelines

Route incoming documents to language-specific NER, sentiment, and OCR models so each pipeline runs on optimised tooling.

Customer support automation

Auto-detect the language of incoming tickets and chat messages so triage and response generation use the right model and prompt template.

Content moderation

Apply locale-specific moderation rules (legal frameworks, vocabulary lists, profanity dictionaries) only after detecting the user's language.

Translation orchestration

Skip translation when the detected language already matches the target. Trim cloud-API spend by 30 to 60% in mixed corpora.

Search relevance

Index multilingual documents into language-specific shards. Apply BM25 with language-aware tokenizers for precision retrieval.

Data residency compliance

Detect language locally without sending content to a cloud detector. Critical for HIPAA, GDPR, and air-gapped deployments.

Developer Resources

API reference.

TextTranslation

Main class hosting both detection and translation. DetectLanguage and DetectLanguageAsync with overloads for text, candidate sets, and attachments.

View documentation

Language

Enumeration of all supported languages. Returned by DetectLanguage and accepted by Translate as the target.

View documentation

Confidence

Read the confidence (0.0 to 1.0) of the most recent detection from the TextTranslation instance. Use to gate downstream actions.

View documentation

Attachment

Input container for documents and images. Pass to DetectLanguage for multimodal detection over PDF, Word, Excel, PowerPoint, HTML, and images.

View documentation

Need translation too? See the Text Translation page for the same engine's translation capabilities.

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Detect on-device. Ship anywhere.

Zero cloud calls, zero per-request fees. One NuGet package, every platform.

Get Community Edition Download