Solutions · Text Analysis · Language detection

Identify the language of any input.

Detect the language of plain text, image content, and document attachments in milliseconds. The TextTranslation engine combines an LLM backbone with deterministic script refiners (CJK, Cyrillic, Slavic) to deliver accurate identification across 100+ languages, with confidence scores you can trust.

Start building free API reference

100+ languages Multimodal input Confidence scoring

`DetectLanguage(text)`

Synchronous detection on a string.

`DetectLanguage(attachment)`

Detection over PDF, Word, image, or any Attachment.

`DetectLanguage(text, candidates)`

Constrained detection over a known candidate set for speed.

`DetectLanguageAsync(...)`

Async overloads for all of the above.

Engine overview

An LLM detector hardened by script refiners.

Generic LLM language detection is good enough for European languages but struggles with closely related scripts: Simplified vs Traditional Chinese, Japanese vs Korean, Russian vs Ukrainian, Croatian vs Serbian. TextTranslation wraps the LLM with three deterministic refiners that resolve those edge cases via Unicode block analysis and statistical lexicon scoring.

CJK refiner

Disambiguates Simplified Chinese, Traditional Chinese, Japanese, and Korean by inspecting Unicode block membership of each character. Resolves cases where the LLM predicts "Chinese" but cannot tell which variant.

Cyrillic refiner

Distinguishes Russian, Ukrainian, Belarusian, Bulgarian, and other Cyrillic-script languages by counting language-specific characters and diacritics that don't appear in neighbouring orthographies.

Slavic refiner

Lexicon-based scoring across Croatian, Serbian, Slovenian, Polish, Czech, and Slovak. Configurable thresholds (minWordCount, minWordSingleLanguageCount, score threshold) for tuning precision vs recall.

Confidence

Trust the result, or verify by score.

Every call surfaces a Confidence property (0.0 to 1.0) on the TextTranslation instance after detection. Use it to gate downstream behaviour: route certain documents to human review, fall back to a default locale, or trigger a second pass with a candidate set.

High confidence (> 0.85)

Trust the result. Forward the document to language-specific pipelines (NER, sentiment, embeddings) without further checks.

Medium confidence (0.55 to 0.85)

Ambiguous content. Re-run with a narrow candidate set or process with both likely languages and merge results.

Low confidence (< 0.55)

Likely too short, mixed-language, or noisy. Fall back to a default locale, flag for review, or skip translation entirely.

Code samples

Three calling patterns.

From a single string to a multi-page PDF, the same engine handles every input shape.

DetectFromText.cs

using LMKit.Model;
using LMKit.Translation;
using LMKit.TextGeneration;

var model = LM.LoadFromModelID("qwen3.5:4b");
var translator = new TextTranslation(model);

string text = "L'intelligenza artificiale sta trasformando il nostro modo di lavorare.";
Language detected = translator.DetectLanguage(text);

Console.WriteLine($"Detected: {detected}");
Console.WriteLine($"Confidence: {translator.Confidence:P1}");
// Detected: Italian
// Confidence: 99.2%

ConstrainedDetection.cs

using LMKit.Model;
using LMKit.Translation;
using LMKit.TextGeneration;

var translator = new TextTranslation(model);

// We know the document is one of three Slavic languages.
// Constrain detection to skip the wider scan.
var candidates = new List<Language>
{
    Language.Polish,
    Language.Czech,
    Language.Slovak
};

string text = "Sztuczna inteligencja zmienia sposób, w jaki pracujemy.";
Language result = translator.DetectLanguage(text, candidates);
// result == Language.Polish

DetectFromAttachment.cs

using LMKit.Model;
using LMKit.Translation;
using LMKit.Data;

// Vision-capable model required for image input.
var model = LM.LoadFromModelID("gemma4:e4b");
var translator = new TextTranslation(model);

var attachment = new Attachment("document.pdf");
Language detected = await translator.DetectLanguageAsync(attachment);

Console.WriteLine($"Document language: {detected}");
Console.WriteLine($"Confidence: {translator.Confidence:P1}");

Applications

When language detection matters.

Multilingual ingestion pipelines

Route incoming documents to language-specific NER, sentiment, and OCR models so each pipeline runs on optimised tooling.

Customer support automation

Auto-detect the language of incoming tickets and chat messages so triage and response generation use the right model and prompt template.

Content moderation

Apply locale-specific moderation rules (legal frameworks, vocabulary lists, profanity dictionaries) only after detecting the user's language.

Translation orchestration

Skip translation when the detected language already matches the target. Trim cloud-API spend by 30 to 60% in mixed corpora.

Search relevance

Index multilingual documents into language-specific shards. Apply BM25 with language-aware tokenizers for precision retrieval.

Data residency compliance

Detect language locally without sending content to a cloud detector. Critical for HIPAA, GDPR, and air-gapped deployments.

Developer Resources

API reference.

`TextTranslation`

Main class hosting both detection and translation. DetectLanguage and DetectLanguageAsync with overloads for text, candidate sets, and attachments.

View documentation

`Language`

Enumeration of all supported languages. Returned by DetectLanguage and accepted by Translate as the target.

View documentation

`Confidence`

Read the confidence (0.0 to 1.0) of the most recent detection from the TextTranslation instance. Use to gate downstream actions.

View documentation

`Attachment`

Input container for documents and images. Pass to DetectLanguage for multimodal detection over PDF, Word, Excel, PowerPoint, HTML, and images.

View documentation

Need translation too? See the Text Translation page for the same engine's translation capabilities.

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Detect on-device. Ship anywhere.

Zero cloud calls, zero per-request fees. One NuGet package, every platform.

Get Community Edition Download

Identify the language of any input.

`DetectLanguage(text)`

`DetectLanguage(attachment)`

`DetectLanguage(text, candidates)`

`DetectLanguageAsync(...)`

An LLM detector hardened by script refiners.

CJK refiner

Cyrillic refiner

Slavic refiner

Trust the result, or verify by score.

High confidence (> 0.85)

Medium confidence (0.55 to 0.85)

Low confidence (< 0.55)

Three calling patterns.

When language detection matters.

Multilingual ingestion pipelines

Customer support automation

Content moderation

Translation orchestration

Search relevance

Data residency compliance

API reference.

`TextTranslation`

`Language`

`Confidence`

`Attachment`

Build it. Read it. Try it.

Language detection from document

Language detection from document walkthrough

Build a multi-language document pipeline

Detect on-device. Ship anywhere.