Solutions · Text Analysis · Keywords

Surface what matters from any content.

Extract the most relevant keywords and key phrases from text and images with the KeywordExtraction engine. Configure keyword count, n-gram size, and target language. Handle large documents with intelligent shrinking strategies. Run 100% on-device with Dynamic Sampling for fast, accurate results on any hardware.

Start building free API reference

50+ languages Multimodal Configurable n-grams

Property

`KeywordCount`

How many keywords to extract per document, capped by model capacity.

Property

`MaxNgramSize`

Maximum phrase length in words. Set to 1 for single-word keywords.

Property

`TargetLanguage`

Output language for generated keywords. Auto-detected when undefined.

Property

`TextShrinkingStrategy`

How the engine reduces oversized documents to fit the context window.

Problem

Why keyword extraction matters.

Every document, article, and customer message contains critical terms buried in noise. Extracting them manually is slow, inconsistent, and impossible at scale.

Manual tagging is unreliable

Human taggers produce inconsistent results, miss key terms, and cannot keep up with content volume.

Cloud APIs leak your data

Sending sensitive documents to third-party endpoints exposes intellectual property and violates compliance requirements.

Regex and TF-IDF fall short

Rule-based approaches cannot capture semantic meaning, multi-word phrases, or context-dependent importance.

Per-call pricing adds up fast

Processing thousands of documents daily through cloud APIs creates unpredictable and growing costs.

Solution

LLM-powered keyword extraction.

LLM-powered precision

Unlike rule-based methods, the KeywordExtraction engine uses language models to understand context, capturing multi-word phrases and semantically important terms.

Dynamic Sampling

LM-Kit's proprietary sampling technology delivers high accuracy even with smaller models running on CPU. Enterprise results without GPU requirements.

100% on-device privacy

All processing stays on your infrastructure. No API calls, no data exposure. Process sensitive legal, medical, or financial documents with confidence.

Unlimited scale

Process millions of documents with no per-call fees. Fixed licensing cost regardless of volume. Predictable budgets, unlimited throughput.

Class detail

`KeywordExtraction` class.

The foundation of topic discovery in your .NET applications. KeywordExtraction provides a high-level API to extract the most important keywords and phrases from any content. Configure the number of keywords, control n-gram size, set a target language, and handle documents that exceed model context limits with automatic text shrinking strategies. Works with both text and image inputs.

KeywordCount property sets how many keywords to extract (default: 5)
MaxNgramSize controls maximum phrase length (default: 3 words)
TargetLanguage for multilingual extraction with auto-detection
TextShrinkingStrategy handles oversized documents automatically
Guidance property steers extraction toward specific themes
Confidence score for every extraction operation
Sync and async methods: ExtractKeywords / ExtractKeywordsAsync

KeywordExtraction.cs

using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("qwen3.5:4b");
var extractor = new KeywordExtraction(model)
{
    KeywordCount = 8,
    MaxNgramSize = 3,
    TargetLanguage = Language.English
};

string article = File.ReadAllText("report.txt");
var keywords = extractor.ExtractKeywords(article);

Console.WriteLine($"Confidence: {extractor.Confidence:P1}");
foreach (var kw in keywords)
{
    Console.WriteLine($"- {kw.Value}");
}

Multimodal

Extract from text and images.

The KeywordExtraction engine works with both text content and image attachments through a unified API. Pass an Attachment object containing an image and the engine automatically applies OCR to extract visible text before identifying the most relevant keywords. Process scanned documents, screenshots, infographics, and photographs alongside plain text content.

Same ExtractKeywords method for both text and image inputs
Automatic OCR integration for scanned documents
Process screenshots, infographics, and photographs
Async variants for non-blocking image processing
Requires a vision-capable model for image extraction

KeywordsFromImage.cs

using LMKit.TextAnalysis;
using LMKit.Data;

// Use a vision-capable model for images
var model = LM.LoadFromModelID("qwen3.5:4b");
var extractor = new KeywordExtraction(model)
{
    KeywordCount = 6,
    MaxNgramSize = 3
};

// Extract from an image (OCR is automatic)
var attachment = new Attachment("infographic.png");
var imgKeywords = extractor.ExtractKeywords(attachment);

foreach (var kw in imgKeywords)
    Console.WriteLine($"- {kw.Value}");

Configuration

Precision control for every scenario.

Fine-tune extraction behavior with a comprehensive set of properties. Control how many keywords to extract, the maximum phrase length, target language, context window management, and how the engine handles documents that exceed the model's capacity. Use the Guidance property to steer extraction toward specific themes or terminology.

KeywordCount

Target keyword count

Sets the desired number of keywords. The actual count depends on model capacity and input data, but will never exceed this value.

MaxNgramSize

Maximum phrase length

Controls the maximum n-gram size. Set to 1 for single words, or higher for multi-word phrases like "machine learning" or "interest rate adjustment".

TargetLanguage

Output language

Specifies the language for generated keywords. When set to Undefined, the engine auto-detects the input language.

TextShrinkingStrategy

Large document handling

Determines how content is reduced when it exceeds the MaximumContextLength. Different strategies trade semantic integrity for length reduction.

Guidance

Custom instructions

Optional text that steers the extraction process toward specific themes, constraints, or terminology. Useful for domain-specific applications.

MaximumContextLength

Context window control

Limits the token count for model input. Reducing this value increases inference speed on CPU at the cost of some quality.

KeywordsAdvanced.cs

var model = LM.LoadFromModelID("qwen3.5:4b");
var extractor = new KeywordExtraction(model)
{
    // Extract up to 10 keywords
    KeywordCount = 10,
    // Allow phrases up to 4 words
    MaxNgramSize = 4,
    // Generate keywords in French
    TargetLanguage = Language.French,
    // Steer toward financial terms
    Guidance = "Focus on financial and" +
        " economic terminology",
    // Limit context for faster CPU inference
    MaximumContextLength = 2048
};

Multilingual

50+ languages, zero configuration.

Extract keywords from content in any language supported by the underlying model. The TargetLanguage property lets you explicitly set the output language or leave it as Undefined for automatic detection. Process multilingual content and generate keywords in a target language different from the source, enabling cross-language content analysis and indexing.

Auto-detect input language when TargetLanguage is Undefined
Cross-language extraction: input in one language, keywords in another
Process mixed-language documents seamlessly
Full support for CJK, Cyrillic, Arabic, and Latin scripts
Model-dependent: quality scales with the model's language training

Multilingual.cs

var model = LM.LoadFromModelID("qwen3.5:4b");

// Auto-detect language
var autoExtractor = new KeywordExtraction(model)
{
    KeywordCount = 5
};

string germanText = "Die Europäische Zentralbank hat " +
    "neue Maßnahmen zur Inflationsbekämpfung " +
    "und Zinspolitik angekündigt.";

var deKeywords = autoExtractor.ExtractKeywords(germanText);
// Output: Zentralbank, Inflationsbekämpfung...

// Cross-language: input German, output English

Ready-to-run demo application.

Clone the sample, run it, and see keyword extraction in action on your own data in minutes.

Console demo

Keyword extraction console demo

The Keyword Extraction Demo is a standalone console application that lets you point the engine at any text file and instantly surface the most relevant keywords. Choose from multiple pre-trained models, configure extraction parameters, and view results with timing and confidence metrics.

Select from pre-trained models or provide a custom model URI
Provide any text file as input for extraction
View extracted keywords with elapsed time and confidence score
Open the .csproj file directly, no extra installations needed
Uses Dynamic Sampling for fast, accurate results even on CPU

View demo

Applications

Real-world use cases.

Organizations across industries leverage LM-Kit's keyword extraction to power search, automate tagging, and unlock insights from unstructured content.

SEO

SEO and search optimization

Automatically identify the most relevant terms from web pages, articles, and product descriptions. Power SEO analysis, meta tag generation, and search relevance scoring.

CMS

Content management

Auto-tag articles, reports, and knowledge base entries with relevant keywords. Improve content discoverability and enable faceted search across document repositories.

Topics

Topic identification

Surface the main themes from large document sets. Identify trending topics in customer feedback, survey responses, and social media streams.

Insights

Customer insight mining

Extract key terms from support tickets, reviews, and survey answers. Identify the language your customers use to describe products, features, and issues.

i18n

Multilingual content indexing

Process documents in 50+ languages and generate keywords in a unified target language. Build cross-language search indexes and content catalogs.

Pipelines

Document intelligence pipelines

Feed extracted keywords into downstream systems: classification engines, RAG pipelines, recommendation algorithms, and analytics dashboards.

Integration

Integrate in minutes.

From NuGet install to keyword extraction in production in under 10 minutes. No cloud keys, no API limits, no surprises.

Quick start

Install via NuGet: dotnet add package LMKit.NET
Load a model: LM.LoadFromModelID("qwen3.5:4b")
Create the extractor: new KeywordExtraction(model)
Extract keywords: extractor.ExtractKeywords(text)

Production checklist

Choose model size based on hardware: 1B-4B for CPU, 8B+ for GPU deployments
Set MaximumContextLength for optimal speed vs. quality tradeoff on your hardware
Configure TextShrinkingStrategy for documents longer than the model context window
Use Guidance to focus extraction on your domain's terminology and priorities

Developer Resources

API reference.

Complete documentation for the KeywordExtraction class, supporting types, and related APIs.

`KeywordExtraction`

Core class for extracting keywords from text and images. Includes all configuration properties and extraction methods.

View docs

`KeywordItem`

Read-only value container representing a single extracted keyword. Returned as a collection by ExtractKeywords methods.

View docs

`TextShrinkingStrategy`

Enum defining strategies for handling content that exceeds the model context window. Options include Auto, Truncation, and more.

View docs

`Categorization`

Combine keyword extraction with classification. Use extracted keywords to inform custom content categorization workflows.

View docs

`Embedder`

Generate embeddings from extracted keywords for semantic search, clustering, and RAG applications.

View docs

`Attachment`

Data class for passing image content to the extraction engine. Used for multimodal keyword extraction from images.

View docs

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Ready to extract what matters?

Powerful keyword extraction. Multimodal support. Multilingual. 100% on-device. Start building intelligent .NET applications today.

Download free API documentation

Surface what matters from any content.

KeywordCount

MaxNgramSize

TargetLanguage

TextShrinkingStrategy

Manual tagging is unreliable

Cloud APIs leak your data

Regex and TF-IDF fall short

Per-call pricing adds up fast

LLM-powered precision

Dynamic Sampling

100% on-device privacy

Unlimited scale

Target keyword count

Maximum phrase length

Output language

Large document handling

Custom instructions

Context window control

Keyword extraction console demo

SEO and search optimization

Content management

Topic identification

Customer insight mining

Multilingual content indexing

Document intelligence pipelines

Quick start

Production checklist

KeywordExtraction

KeywordItem

TextShrinkingStrategy

Categorization

Embedder

Attachment

Keyword extraction

Keyword extraction walkthrough

Extract keywords from text

KeywordExtraction

`KeywordCount`

`MaxNgramSize`

`TargetLanguage`

`TextShrinkingStrategy`

`KeywordExtraction`

`KeywordItem`

`TextShrinkingStrategy`

`Categorization`

`Embedder`

`Attachment`