Keyword Extraction Engine

Surface What MattersFrom Any Content

Extract the most relevant keywords and key phrases from text and images with the KeywordExtraction engine. Configure keyword count, n-gram size, and target language. Handle large documents with intelligent shrinking strategies. Run 100% on-device with Dynamic Sampling for fast, accurate results on any hardware.

Start Building Free API Reference

Multimodal Multilingual Configurable 100% On-Device

Live Extraction Preview On-Device

"The European Central Bank announced new monetary policy measures today, targeting inflation reduction through interest rate adjustments. Analysts expect these quantitative tightening steps to impact bond markets and currency exchange rates across the eurozone by Q3."

Extracted Keywords

monetary policy European Central Bank quantitative tightening interest rate inflation bond markets currency exchange

Keywords Found

Max N-gram

98.2%

Confidence

<1s

Extraction Time

50+

Languages

100%

On-Device

The Challenge

Why Keyword Extraction Matters

Every document, article, and customer message contains critical terms buried in noise. Extracting them manually is slow, inconsistent, and impossible at scale.

Manual tagging is unreliable: Human taggers produce inconsistent results, miss key terms, and cannot keep up with content volume.

Cloud APIs leak your data: Sending sensitive documents to third-party endpoints exposes intellectual property and violates compliance requirements.

Regex and TF-IDF fall short: Rule-based approaches cannot capture semantic meaning, multi-word phrases, or context-dependent importance.

Per-call pricing adds up fast: Processing thousands of documents daily through cloud APIs creates unpredictable and growing costs.

LLM-Powered Precision

Unlike rule-based methods, the KeywordExtraction engine uses language models to understand context, capturing multi-word phrases and semantically important terms.

Dynamic Sampling

LM-Kit's proprietary sampling technology delivers high accuracy even with smaller models running on CPU. Enterprise results without GPU requirements.

100% On-Device Privacy

All processing stays on your infrastructure. No API calls, no data exposure. Process sensitive legal, medical, or financial documents with confidence.

Unlimited Scale

Process millions of documents with no per-call fees. Fixed licensing cost regardless of volume. Predictable budgets, unlimited throughput.

Core Engine

KeywordExtraction Class

The foundation of topic discovery in your .NET applications. KeywordExtraction provides a high-level API to extract the most important keywords and phrases from any content. Configure the number of keywords, control n-gram size, set a target language, and handle documents that exceed model context limits with automatic text shrinking strategies. Works with both text and image inputs.

KeywordCount property sets how many keywords to extract (default: 5)
MaxNgramSize controls maximum phrase length (default: 3 words)
TargetLanguage for multilingual extraction with auto-detection
TextShrinkingStrategy handles oversized documents automatically
Guidance property steers extraction toward specific themes
Confidence score for every extraction operation
Sync and async methods: ExtractKeywords / ExtractKeywordsAsync

API Reference View Sample

                        
                        
                        
                        KeywordExtraction.cs
                    

using LMKit.TextAnalysis;

var model = LM.LoadFromModelID("phi-3.5-mini");

var extractor = new KeywordExtraction(model)
{
    KeywordCount = 8,
    MaxNgramSize = 3,
    TargetLanguage = Language.English
};

string article = File.ReadAllText("report.txt");

var keywords = extractor.ExtractKeywords(article);

Console.WriteLine(
    $"Confidence: {extractor.Confidence:P1}");

foreach (var kw in keywords)
{
    Console.WriteLine($"  - {kw.Value}");
}
// Output:
//   Confidence: 97.8%
//   - monetary policy
//   - central bank
//   - interest rate
//   - quantitative tightening
//   ...
                    

Multimodal

Extract From Text and Images

The KeywordExtraction engine works with both text content and image attachments through a unified API. Pass an Attachment object containing an image and the engine automatically applies OCR to extract visible text before identifying the most relevant keywords. Process scanned documents, screenshots, infographics, and photographs alongside plain text content.

Same ExtractKeywords method for both text and image inputs
Automatic OCR integration for scanned documents
Process screenshots, infographics, and photographs
Async variants for non-blocking image processing
Requires a vision-capable model for image extraction

API Reference

                        
                        
                        
                        ImageKeywords.cs
                    

using LMKit.TextAnalysis;
using LMKit.Data;

// Use a vision-capable model for images
var model = LM.LoadFromModelID("gemma-3-4b");

var extractor = new KeywordExtraction(model)
{
    KeywordCount = 6,
    MaxNgramSize = 3
};

// Extract from an image (OCR is automatic)
var attachment = new Attachment("infographic.png");
var imgKeywords = extractor.ExtractKeywords(
    attachment);

foreach (var kw in imgKeywords)
{
    Console.WriteLine($"  - {kw.Value}");
}

// Async variant for non-blocking processing
var screenshot = new Attachment("dashboard.png");
var asyncResult = await extractor
    .ExtractKeywordsAsync(screenshot);
                    

Configuration

Precision Control for Every Scenario

Fine-tune extraction behavior with a comprehensive set of properties. Control how many keywords to extract, the maximum phrase length, target language, context window management, and how the engine handles documents that exceed the model's capacity. Use the Guidance property to steer extraction toward specific themes or terminology.

KeywordCount

Target Keyword Count

Sets the desired number of keywords. The actual count depends on model capacity and input data, but will never exceed this value.

Default: 5

MaxNgramSize

Maximum Phrase Length

Controls the maximum n-gram size. Set to 1 for single words, or higher for multi-word phrases like "machine learning" or "interest rate adjustment".

Default: 3

TargetLanguage

Output Language

Specifies the language for generated keywords. When set to Undefined, the engine auto-detects the input language.

Default: Undefined (auto-detect)

TextShrinkingStrategy

Large Document Handling

Determines how content is reduced when it exceeds the MaximumContextLength. Different strategies trade semantic integrity for length reduction.

Default: Auto

Guidance

Custom Instructions

Optional text that steers the extraction process toward specific themes, constraints, or terminology. Useful for domain-specific applications.

Default: null

MaximumContextLength

Context Window Control

Limits the token count for model input. Reducing this value increases inference speed on CPU at the cost of some quality.

Default: Auto (2048-8192)

                        
                        
                        
                        AdvancedConfig.cs
                    

var model = LM.LoadFromModelID("phi-3.5-mini");

var extractor = new KeywordExtraction(model)
{
    // Extract up to 10 keywords
    KeywordCount = 10,

    // Allow phrases up to 4 words
    MaxNgramSize = 4,

    // Generate keywords in French
    TargetLanguage = Language.French,

    // Steer toward financial terms
    Guidance = "Focus on financial and"
             + " economic terminology",

    // Limit context for faster CPU inference
    MaximumContextLength = 4096,

    // Handle oversized documents
    TextShrinkingStrategy =
        TextShrinkingStrategy.Auto
};

string longReport = File.ReadAllText(
    "annual-report.txt");

var keywords = await extractor
    .ExtractKeywordsAsync(longReport);

Console.WriteLine(
    $"Confidence: {extractor.Confidence:P1}");

foreach (var kw in keywords)
    Console.WriteLine($"  - {kw.Value}");
                    

Multilingual

50+ Languages, Zero Configuration

Extract keywords from content in any language supported by the underlying model. The TargetLanguage property lets you explicitly set the output language or leave it as Undefined for automatic detection. Process multilingual content and generate keywords in a target language different from the source, enabling cross-language content analysis and indexing.

Auto-detect input language when TargetLanguage is Undefined
Cross-language extraction: input in one language, keywords in another
Process mixed-language documents seamlessly
Full support for CJK, Cyrillic, Arabic, and Latin scripts
Model-dependent: quality scales with the model's language training

API Reference

                        
                        
                        
                        MultilingualExtraction.cs
                    

var model = LM.LoadFromModelID("phi-3.5-mini");

// Auto-detect language
var autoExtractor = new KeywordExtraction(model)
{
    KeywordCount = 5
};

string germanText = "Die Europäische Zentralbank hat "
    + "neue Maßnahmen zur Inflationsbekämpfung "
    + "und Zinspolitik angekündigt.";

var deKeywords = autoExtractor.ExtractKeywords(
    germanText);
// Output: Zentralbank, Inflationsbekämpfung...

// Cross-language: German input, English output
var crossExtractor = new KeywordExtraction(model)
{
    KeywordCount = 5,
    TargetLanguage = Language.English
};

var enKeywords = crossExtractor.ExtractKeywords(
    germanText);
// Output: central bank, inflation, ...

// Japanese content
string jpText = "人工知能の研究開発が進み、"
    + "自然言語処理の精度が向上している。";

var jpKeywords = autoExtractor.ExtractKeywords(
    jpText);
// Output: 人工知能, 自然言語処理, ...
                    

Try It Now

Ready-to-Run Demo Application

Clone the sample, run it, and see keyword extraction in action on your own data in minutes.

Keyword Extraction Console Demo

The Keyword Extraction Demo is a standalone console application that lets you point the engine at any text file and instantly surface the most relevant keywords. Choose from multiple pre-trained models, configure extraction parameters, and view results with timing and confidence metrics.

Select from pre-trained models or provide a custom model URI
Provide any text file as input for extraction
View extracted keywords with elapsed time and confidence score
Open the .csproj file directly, no extra installations needed
Uses Dynamic Sampling for fast, accurate results even on CPU

View on GitHub User Guide

keyword_extraction.exe

Select the model you want to use:
0 - Mistral Nemo 2407 12.2B
1 - Meta Llama 3.1 8B
2 - Google Gemma 2 9B
3 - Phi-3.5 Mini 3.8B
Or enter a custom model URI:
> 3

Loading Phi-3.5 Mini...
Model loaded in 2.14s

Please enter the path to the text file:
> article.txt

Extracting keywords...

Extracted elements:
- monetary policy
- central bank
- interest rate
- quantitative tightening
- inflation

Extraction done in 0.87s | Confidence: 97.8%

Applications

Real-World Use Cases

Organizations across industries leverage LM-Kit's keyword extraction to power search, automate tagging, and unlock insights from unstructured content.

SEO and Search Optimization

Automatically identify the most relevant terms from web pages, articles, and product descriptions. Power SEO analysis, meta tag generation, and search relevance scoring.

Content Management

Auto-tag articles, reports, and knowledge base entries with relevant keywords. Improve content discoverability and enable faceted search across document repositories.

Topic Identification

Surface the main themes from large document sets. Identify trending topics in customer feedback, survey responses, and social media streams.

Customer Insight Mining

Extract key terms from support tickets, reviews, and survey answers. Identify the language your customers use to describe products, features, and issues.

Multilingual Content Indexing

Process documents in 50+ languages and generate keywords in a unified target language. Build cross-language search indexes and content catalogs.

Document Intelligence Pipelines

Feed extracted keywords into downstream systems: classification engines, RAG pipelines, recommendation algorithms, and analytics dashboards.

Getting Started

Integrate in Minutes

From NuGet install to keyword extraction in production in under 10 minutes. No cloud keys, no API limits, no surprises.

Quick Start

1 Install via NuGet: dotnet add package LMKit.NET

2 Load a model: LM.LoadFromModelID("phi-3.5-mini")

3 Create the extractor: new KeywordExtraction(model)

4 Extract keywords: extractor.ExtractKeywords(text)

Production Checklist

1 Choose model size based on hardware: 1B-4B for CPU, 8B+ for GPU deployments

2 Set MaximumContextLength for optimal speed vs. quality tradeoff on your hardware

3 Configure TextShrinkingStrategy for documents longer than the model context window

4 Use Guidance to focus extraction on your domain's terminology and priorities

Developer Resources

API Reference

Complete documentation for the KeywordExtraction class, supporting types, and related APIs.

KeywordExtraction

Core class for extracting keywords from text and images. Includes all configuration properties and extraction methods.

View Docs

KeywordItem

Read-only value container representing a single extracted keyword. Returned as a collection by ExtractKeywords methods.

View Docs

TextShrinkingStrategy

Enum defining strategies for handling content that exceeds the model context window. Options include Auto, Truncation, and more.

View Docs

Categorization

Combine keyword extraction with classification. Use extracted keywords to inform custom content categorization workflows.

View Docs

Embedder

Generate embeddings from extracted keywords for semantic search, clustering, and RAG applications.

View Docs

Attachment

Data class for passing image content to the extraction engine. Used for multimodal keyword extraction from images.

View Docs

Ready to Extract What Matters?

Powerful keyword extraction. Multimodal support. Multilingual. 100% on-device. Start building intelligent .NET applications today.

Download Free View Code Sample

Surface What MattersFrom Any Content

Why Keyword Extraction Matters

LLM-Powered Precision

Dynamic Sampling

100% On-Device Privacy

Unlimited Scale

KeywordExtraction Class

Extract From Text and Images

Precision Control for Every Scenario

Target Keyword Count

Maximum Phrase Length

Output Language

Large Document Handling

Custom Instructions

Context Window Control

50+ Languages, Zero Configuration

Ready-to-Run Demo Application

Keyword Extraction Console Demo

Real-World Use Cases

SEO and Search Optimization

Content Management

Topic Identification

Customer Insight Mining

Multilingual Content Indexing

Document Intelligence Pipelines

Integrate in Minutes

Quick Start

Production Checklist

API Reference

Ready to Extract What Matters?

Ready to Build Local AI Agents?