Surface What MattersFrom Any Content
Extract the most relevant keywords and key phrases from text and images with the KeywordExtraction engine. Configure keyword count, n-gram size, and target language. Handle large documents with intelligent shrinking strategies. Run 100% on-device with Dynamic Sampling for fast, accurate results on any hardware.
Why Keyword Extraction Matters
Every document, article, and customer message contains critical terms buried in noise. Extracting them manually is slow, inconsistent, and impossible at scale.
Manual tagging is unreliable: Human taggers produce inconsistent results, miss key terms, and cannot keep up with content volume.
Cloud APIs leak your data: Sending sensitive documents to third-party endpoints exposes intellectual property and violates compliance requirements.
Regex and TF-IDF fall short: Rule-based approaches cannot capture semantic meaning, multi-word phrases, or context-dependent importance.
Per-call pricing adds up fast: Processing thousands of documents daily through cloud APIs creates unpredictable and growing costs.
LLM-Powered Precision
Unlike rule-based methods, the KeywordExtraction engine uses language models to understand context, capturing multi-word phrases and semantically important terms.
Dynamic Sampling
LM-Kit's proprietary sampling technology delivers high accuracy even with smaller models running on CPU. Enterprise results without GPU requirements.
100% On-Device Privacy
All processing stays on your infrastructure. No API calls, no data exposure. Process sensitive legal, medical, or financial documents with confidence.
Unlimited Scale
Process millions of documents with no per-call fees. Fixed licensing cost regardless of volume. Predictable budgets, unlimited throughput.
KeywordExtraction Class
The foundation of topic discovery in your .NET applications. KeywordExtraction provides a high-level API to extract the most important keywords and phrases from any content. Configure the number of keywords, control n-gram size, set a target language, and handle documents that exceed model context limits with automatic text shrinking strategies. Works with both text and image inputs.
- KeywordCount property sets how many keywords to extract (default: 5)
- MaxNgramSize controls maximum phrase length (default: 3 words)
- TargetLanguage for multilingual extraction with auto-detection
- TextShrinkingStrategy handles oversized documents automatically
- Guidance property steers extraction toward specific themes
- Confidence score for every extraction operation
- Sync and async methods: ExtractKeywords / ExtractKeywordsAsync
using LMKit.TextAnalysis; var model = LM.LoadFromModelID("phi-3.5-mini"); var extractor = new KeywordExtraction(model) { KeywordCount = 8, MaxNgramSize = 3, TargetLanguage = Language.English }; string article = File.ReadAllText("report.txt"); var keywords = extractor.ExtractKeywords(article); Console.WriteLine( $"Confidence: {extractor.Confidence:P1}"); foreach (var kw in keywords) { Console.WriteLine($" - {kw.Value}"); } // Output: // Confidence: 97.8% // - monetary policy // - central bank // - interest rate // - quantitative tightening // ...
Extract From Text and Images
The KeywordExtraction engine works with both text content and image attachments through a unified API. Pass an Attachment object containing an image and the engine automatically applies OCR to extract visible text before identifying the most relevant keywords. Process scanned documents, screenshots, infographics, and photographs alongside plain text content.
- Same ExtractKeywords method for both text and image inputs
- Automatic OCR integration for scanned documents
- Process screenshots, infographics, and photographs
- Async variants for non-blocking image processing
- Requires a vision-capable model for image extraction
using LMKit.TextAnalysis; using LMKit.Data; // Use a vision-capable model for images var model = LM.LoadFromModelID("gemma-3-4b"); var extractor = new KeywordExtraction(model) { KeywordCount = 6, MaxNgramSize = 3 }; // Extract from an image (OCR is automatic) var attachment = new Attachment("infographic.png"); var imgKeywords = extractor.ExtractKeywords( attachment); foreach (var kw in imgKeywords) { Console.WriteLine($" - {kw.Value}"); } // Async variant for non-blocking processing var screenshot = new Attachment("dashboard.png"); var asyncResult = await extractor .ExtractKeywordsAsync(screenshot);
Precision Control for Every Scenario
Fine-tune extraction behavior with a comprehensive set of properties. Control how many keywords to extract, the maximum phrase length, target language, context window management, and how the engine handles documents that exceed the model's capacity. Use the Guidance property to steer extraction toward specific themes or terminology.
KeywordCount
Target Keyword Count
Sets the desired number of keywords. The actual count depends on model capacity and input data, but will never exceed this value.
Default: 5MaxNgramSize
Maximum Phrase Length
Controls the maximum n-gram size. Set to 1 for single words, or higher for multi-word phrases like "machine learning" or "interest rate adjustment".
Default: 3TargetLanguage
Output Language
Specifies the language for generated keywords. When set to Undefined, the engine auto-detects the input language.
Default: Undefined (auto-detect)TextShrinkingStrategy
Large Document Handling
Determines how content is reduced when it exceeds the MaximumContextLength. Different strategies trade semantic integrity for length reduction.
Default: AutoGuidance
Custom Instructions
Optional text that steers the extraction process toward specific themes, constraints, or terminology. Useful for domain-specific applications.
Default: nullMaximumContextLength
Context Window Control
Limits the token count for model input. Reducing this value increases inference speed on CPU at the cost of some quality.
Default: Auto (2048-8192)var model = LM.LoadFromModelID("phi-3.5-mini"); var extractor = new KeywordExtraction(model) { // Extract up to 10 keywords KeywordCount = 10, // Allow phrases up to 4 words MaxNgramSize = 4, // Generate keywords in French TargetLanguage = Language.French, // Steer toward financial terms Guidance = "Focus on financial and" + " economic terminology", // Limit context for faster CPU inference MaximumContextLength = 4096, // Handle oversized documents TextShrinkingStrategy = TextShrinkingStrategy.Auto }; string longReport = File.ReadAllText( "annual-report.txt"); var keywords = await extractor .ExtractKeywordsAsync(longReport); Console.WriteLine( $"Confidence: {extractor.Confidence:P1}"); foreach (var kw in keywords) Console.WriteLine($" - {kw.Value}");
50+ Languages, Zero Configuration
Extract keywords from content in any language supported by the underlying model. The TargetLanguage property lets you explicitly set the output language or leave it as Undefined for automatic detection. Process multilingual content and generate keywords in a target language different from the source, enabling cross-language content analysis and indexing.
- Auto-detect input language when TargetLanguage is Undefined
- Cross-language extraction: input in one language, keywords in another
- Process mixed-language documents seamlessly
- Full support for CJK, Cyrillic, Arabic, and Latin scripts
- Model-dependent: quality scales with the model's language training
var model = LM.LoadFromModelID("phi-3.5-mini"); // Auto-detect language var autoExtractor = new KeywordExtraction(model) { KeywordCount = 5 }; string germanText = "Die Europäische Zentralbank hat " + "neue Maßnahmen zur Inflationsbekämpfung " + "und Zinspolitik angekündigt."; var deKeywords = autoExtractor.ExtractKeywords( germanText); // Output: Zentralbank, Inflationsbekämpfung... // Cross-language: German input, English output var crossExtractor = new KeywordExtraction(model) { KeywordCount = 5, TargetLanguage = Language.English }; var enKeywords = crossExtractor.ExtractKeywords( germanText); // Output: central bank, inflation, ... // Japanese content string jpText = "人工知能の研究開発が進み、" + "自然言語処理の精度が向上している。"; var jpKeywords = autoExtractor.ExtractKeywords( jpText); // Output: 人工知能, 自然言語処理, ...
Ready-to-Run Demo Application
Clone the sample, run it, and see keyword extraction in action on your own data in minutes.
Keyword Extraction Console Demo
The Keyword Extraction Demo is a standalone console application that lets you point the engine at any text file and instantly surface the most relevant keywords. Choose from multiple pre-trained models, configure extraction parameters, and view results with timing and confidence metrics.
- Select from pre-trained models or provide a custom model URI
- Provide any text file as input for extraction
- View extracted keywords with elapsed time and confidence score
- Open the .csproj file directly, no extra installations needed
- Uses Dynamic Sampling for fast, accurate results even on CPU
0 - Mistral Nemo 2407 12.2B
1 - Meta Llama 3.1 8B
2 - Google Gemma 2 9B
3 - Phi-3.5 Mini 3.8B
Or enter a custom model URI:
> 3
Loading Phi-3.5 Mini...
Model loaded in 2.14s
Please enter the path to the text file:
> article.txt
Extracting keywords...
Extracted elements:
- monetary policy
- central bank
- interest rate
- quantitative tightening
- inflation
Extraction done in 0.87s | Confidence: 97.8%
Real-World Use Cases
Organizations across industries leverage LM-Kit's keyword extraction to power search, automate tagging, and unlock insights from unstructured content.
SEO and Search Optimization
Automatically identify the most relevant terms from web pages, articles, and product descriptions. Power SEO analysis, meta tag generation, and search relevance scoring.
Content Management
Auto-tag articles, reports, and knowledge base entries with relevant keywords. Improve content discoverability and enable faceted search across document repositories.
Topic Identification
Surface the main themes from large document sets. Identify trending topics in customer feedback, survey responses, and social media streams.
Customer Insight Mining
Extract key terms from support tickets, reviews, and survey answers. Identify the language your customers use to describe products, features, and issues.
Multilingual Content Indexing
Process documents in 50+ languages and generate keywords in a unified target language. Build cross-language search indexes and content catalogs.
Document Intelligence Pipelines
Feed extracted keywords into downstream systems: classification engines, RAG pipelines, recommendation algorithms, and analytics dashboards.
Integrate in Minutes
From NuGet install to keyword extraction in production in under 10 minutes. No cloud keys, no API limits, no surprises.
Quick Start
dotnet add package LMKit.NET
LM.LoadFromModelID("phi-3.5-mini")
new KeywordExtraction(model)
extractor.ExtractKeywords(text)
Production Checklist
MaximumContextLength for optimal speed vs. quality tradeoff on your hardware
TextShrinkingStrategy for documents longer than the model context window
Guidance to focus extraction on your domain's terminology and priorities
API Reference
Complete documentation for the KeywordExtraction class, supporting types, and related APIs.
KeywordExtraction
Core class for extracting keywords from text and images. Includes all configuration properties and extraction methods.
View DocsKeywordItem
Read-only value container representing a single extracted keyword. Returned as a collection by ExtractKeywords methods.
View DocsTextShrinkingStrategy
Enum defining strategies for handling content that exceeds the model context window. Options include Auto, Truncation, and more.
View DocsCategorization
Combine keyword extraction with classification. Use extracted keywords to inform custom content categorization workflows.
View DocsEmbedder
Generate embeddings from extracted keywords for semantic search, clustering, and RAG applications.
View DocsAttachment
Data class for passing image content to the extraction engine. Used for multimodal keyword extraction from images.
View DocsReady to Extract What Matters?
Powerful keyword extraction. Multimodal support. Multilingual. 100% on-device. Start building intelligent .NET applications today.