Solutions · Text Analysis · Embeddings

Text & image embeddings for .NET.

Transform text, images, and documents into high-dimensional vectors that capture semantic meaning. Power semantic search, RAG pipelines, clustering, and recommendations with multimodal embeddings generated 100% on-device. Choose from multiple models, store vectors with the built-in database, Qdrant, or PostgreSQL (pgvector), and rerank results for precision retrieval.

Start building free API reference

Multimodal vectors 4 storage backends RAG with reranking

Model

`nomic-embed-text`

General-purpose text embeddings, strong on retrieval benchmarks.

Model

`nomic-embed-vision`

Image embeddings in the same vector space as nomic-embed-text.

Model

`bge-m3`

Multilingual retrieval across 100+ languages, dense and sparse output.

Model

`embeddinggemma-300m`

Compact 300M-parameter multilingual embeddings, runs on CPU.

Infrastructure

Complete embedding infrastructure for .NET.

Generate, store, search, and rerank semantic vectors from text and images with a unified API that runs entirely on your hardware.

Multimodal

Multimodal embeddings

Embed text and images through the same Embedder API. Use nomic-embed-text for text vectors and nomic-embed-vision for image vectors, both aligned in the same semantic space for cross-modal search.

Storage

Unified vector storage

Store and query vectors with the DataSource abstraction. Switch between in-memory, built-in file-based database, Qdrant, PostgreSQL (pgvector), or your own IVectorStore backend without changing application logic.

Models

Multiple embedding models

Choose from nomic-embed-text, bge-m3, embeddinggemma-300m, Qwen3 Embedding models, and more. Load any model from the LM-Kit catalog or directly from Hugging Face with a single line of code.

Text vectors

Text embeddings.

Convert any text into dense vector representations that capture semantic meaning with the Embedder class. A single API handles individual strings, batch processing for multiple inputs, and pre-tokenized content. Compare vectors with VectorOperations.CosineSimilarity to measure how closely two pieces of content relate, powering semantic search, duplicate detection, and intelligent routing.

Single text, batch, and tokenized input methods
Async variants (GetEmbeddingsAsync) for non-blocking processing
Multilingual support across 100+ languages with bge-m3
EmbeddingSize property to query model vector dimensions
VectorOperations.CosineSimilarity for measuring semantic similarity

TextEmbeddings.cs

var model = LM.LoadFromModelID("embeddinggemma-300m");
var embedder = new Embedder(model);

// Single text embedding
float[] vector = embedder.GetEmbeddings("Machine learning is fascinating.");
Console.WriteLine($"Dimensions: {vector.Length}");

// Batch embedding for similarity
var texts = new[] {
    "The cat sat on the mat.",
    "A feline rested on the rug.",
    "Stock market closed higher."
};
float[][] vectors = embedder.GetEmbeddings(texts);

// Compute cosine similarity
float sim = VectorOperations.CosineSimilarity(vectors[0], vectors[1]);

Image vectors

Image embeddings.

Generate vector representations from images using vision-enabled embedding models like nomic-embed-vision. Image vectors are aligned in the same semantic space as text vectors from nomic-embed-text, enabling cross-modal search where a text query finds relevant images and vice versa. Supports PNG, JPEG, TIFF, BMP, GIF, WEBP, and more through the ImageBuffer or Attachment APIs.

Cross-modal alignment: text and image vectors in the same space
Batch image embedding with GetEmbeddings(IEnumerable<ImageBuffer>)
Supported formats: PNG, JPEG, TIFF, BMP, GIF, WEBP, PSD, HDR, TGA
File attachment support for document-based workflows
Unified API: same Embedder class for text and images

ImageEmbeddings.cs

// Load vision-enabled embedding model
var model = LM.LoadFromModelID("nomic-embed-vision");
var embedder = new Embedder(model);

// Generate image embedding
var image = ImageBuffer.LoadAsRGB("photo.jpg");
float[] imgVector = embedder.GetEmbeddings(image);
Console.WriteLine($"Image vector: {imgVector.Length} dims");

// Cross-modal: compare text to image
var textModel = LM.LoadFromModelID("nomic-embed-text");
var textEmbedder = new Embedder(textModel);
float[] textVec = textEmbedder.GetEmbeddings("a sunset over the ocean");

float sim = VectorOperations.CosineSimilarity(imgVector, textVec);
Console.WriteLine($"Text-Image sim: {sim:F4}");

// Batch embed multiple images
var images = Directory.GetFiles("./photos", "*.jpg")
    .Select(f => ImageBuffer.LoadAsRGB(f));
float[][] allVecs = embedder.GetEmbeddings(images);

Vector database

Unified vector storage.

Store, persist, and query embedding vectors with the DataSource abstraction. LM-Kit supports five storage patterns that share a single API: in-memory for rapid prototyping, a built-in file-based vector database for offline applications, Qdrant integration for distributed deployments, PostgreSQL (pgvector) for embeddings alongside relational data, and a custom IVectorStore interface for proprietary backends. Switch between storage strategies without rewriting code.

Zero setup

In-memory

RAM-based storage with optional Serialize() to disk. Ideal for prototyping and real-time processing.

Recommended

Built-in file DB

Self-contained file-based engine for desktop tools and offline apps. Handles millions of embeddings without external services.

Production

Qdrant

Out-of-the-box Qdrant integration via QdrantEmbeddingStore for HNSW indexing and distributed workloads.

Relational

PostgreSQL (pgvector)

PostgreSQL with the pgvector extension via PgVectorEmbeddingStore. Keep embeddings next to your relational data, with automatic extension, schema, and index provisioning.

Custom

Custom `IVectorStore`

Implement the IVectorStore interface to plug in any proprietary backend or existing database system.

VectorStorage.cs

var embedModel = LM.LoadFromModelID("embeddinggemma-300m");

// Create in-memory DataSource
var dataSource = DataSource.CreateInMemoryDataSource("my-collection", embedModel);

// Import content with automatic chunking
var ragEngine = new RagEngine(embedModel);
ragEngine.AddDataSource(dataSource);
ragEngine.ImportText(
    File.ReadAllText("document.txt"),
    new TextChunking() { MaxChunkSize = 500 });

Retrieval

RAG pipelines & reranking.

Embeddings are the backbone of retrieval-augmented generation. LM-Kit's RagEngine handles the complete pipeline from document ingestion and chunking to vector search and result reranking with the Reranker class. Use SearchSimilar for top-K retrieval, apply metadata filters for precision, and rerank results by semantic relevance to deliver the most accurate context to your LLM.

RagEngine: end-to-end document import, chunking, and retrieval
SearchSimilar with configurable top-K and minimum score thresholds
Reranker class for semantic result reranking with cross-encoder models
Metadata filtering on DataSource and Section structures
Markdown-aware, semantic, and layout-based chunking strategies

RagPipeline.cs

var embedModel = LM.LoadFromModelID("embeddinggemma-300m");
var chatModel = LM.LoadFromModelID("qwen3.5:4b");
var rerankerModel = LM.LoadFromModelID("bge-m3-reranker");

// Build the RAG pipeline
var rag = new RagEngine(embedModel);
var ds = DataSource.LoadFromFile("knowledge-base.dat");
rag.AddDataSource(ds);

// Search with reranking
var reranker = new Reranker(rerankerModel);

Models

Multiple embedding models.

LM-Kit ships with a curated catalog of embedding models optimized for different use cases and hardware profiles. Load any model with a single LoadFromModelID call, or pull directly from Hugging Face repositories. From lightweight CPU-friendly models to state-of-the-art multilingual embeddings with 100+ language support, choose the right trade-off between accuracy, speed, and memory footprint for your application.

One-line model loading from the LM-Kit catalog
Direct Hugging Face URL loading for any compatible GGUF model
Dedicated text, vision, and reranking model categories
Multi-task models (lmkit-4b) that combine embeddings with other capabilities

LoadModels.cs

// Load from LM-Kit model catalog
var gemma = LM.LoadFromModelID("embeddinggemma-300m");

// Multilingual text embeddings
var bgeM3 = LM.LoadFromModelID("bge-m3");

// Vision embeddings
var vision = LM.LoadFromModelID("nomic-embed-vision");

// Reranking model
var reranker = LM.LoadFromModelID("bge-m3-reranker");

// Or load from Hugging Face directly
var custom = new LM("https://huggingface.co/...");

Applications

Real-world use cases.

Embeddings are the foundation for a wide range of intelligent applications. Here is how teams use LM-Kit's embedding infrastructure in production.

Semantic search

Replace keyword search with meaning-based retrieval. Find relevant documents, code snippets, and knowledge base articles even when the query uses different terminology than the source.

RAG

Retrieval-augmented generation

Ground LLM responses in your data. Chunk documents, embed them into a vector store, retrieve the most relevant context, and feed it to the model for accurate, sourced answers.

Vision

Image similarity search

Find visually and semantically similar images across large collections. Enable "shop the look" features, detect near-duplicates, and build visual recommendation engines.

Cluster

Clustering & topic modeling

Group similar content automatically. Discover themes in customer feedback, organize document collections, and segment user behavior based on semantic similarity.

Dedup

Duplicate detection

Identify semantically similar content regardless of wording. Detect duplicate support tickets, flag plagiarism, and deduplicate datasets before training.

Recommend

Recommendation systems

Build content and product recommendation engines based on semantic similarity between users, items, and behaviors.

Local-first

Local embeddings, zero compromises.

Every vector is generated on your hardware. No API keys, no rate limits, no data leaving your network.

Complete data privacy

Your text and images never leave your infrastructure. Embeddings are generated entirely on-device, making compliance with GDPR, HIPAA, and data residency requirements straightforward.

Zero per-token costs

No API fees, no rate limits, no billing surprises. Embed millions of documents at CPU speed with a fixed infrastructure cost. Scale horizontally with no marginal expense per vector.

Hardware-accelerated

Leverage CUDA (NVIDIA), Vulkan (AMD/Intel), Metal (Apple Silicon), or multi-GPU configurations. LM-Kit automatically selects the fastest available backend for your hardware.

Cross-platform .NET

Deploy on Windows, macOS, Linux, iOS, and Android. Target .NET 4.6.2 through .NET 9.0. One NuGet package, zero external dependencies, every platform supported.

Developer Resources

API reference.

Complete documentation for all embedding, storage, and retrieval classes.

`Embedder`

Generate text and image embeddings with single, batch, and async methods. The core class for all embedding operations.

View docs

`DataSource`

Unified vector storage abstraction supporting in-memory, file-based, Qdrant, PostgreSQL (pgvector), and custom backends with metadata management.

View docs

`RagEngine`

End-to-end retrieval-augmented generation pipeline with document import, chunking, and semantic search.

View docs

`Reranker`

Rerank search results by semantic relevance using cross-encoder models for improved retrieval precision.

View docs

`VectorOperations`

Static methods for cosine similarity and other vector operations to measure semantic distance between embeddings.

View docs

`IVectorStore`

Interface abstraction for plugging in any custom vector backend. Implement to connect proprietary storage systems.

View docs

`TextChunking`

Configure document chunking with max chunk size, overlap, and strategy options for optimal embedding quality.

View docs

`QdrantEmbeddingStore`

Out-of-the-box Qdrant integration with HNSW indexing and payload filtering for cloud-scale deployments.

View docs

`PgVectorEmbeddingStore`

PostgreSQL (pgvector) integration with cosine similarity search and automatic schema and index provisioning for relational-data deployments.

View docs

`ImageBuffer`

Load and process images for embedding generation. Supports PNG, JPEG, TIFF, BMP, GIF, WEBP, and more.

View docs

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Ready to build with embeddings?

Multimodal vectors. Four storage backends. RAG with reranking. All running 100% on your hardware.

Download free API documentation

Text & image embeddings for .NET.

nomic-embed-text

nomic-embed-vision

bge-m3

embeddinggemma-300m

Multimodal embeddings

Unified vector storage

Multiple embedding models

In-memory

Built-in file DB

Qdrant

PostgreSQL (pgvector)

Custom IVectorStore

Semantic search

Retrieval-augmented generation

Image similarity search

Clustering & topic modeling

Duplicate detection

Recommendation systems

Complete data privacy

Zero per-token costs

Hardware-accelerated

Cross-platform .NET

Embedder

DataSource

RagEngine

Reranker

VectorOperations

IVectorStore

TextChunking

QdrantEmbeddingStore

PgVectorEmbeddingStore

ImageBuffer

Multimodal Embeddings

Multimodal Embeddings walkthrough

Image similarity search

Image similarity search walkthrough

RAG (retrieval-augmented generation)

Build a semantic search engine

Embedder

`nomic-embed-text`

`nomic-embed-vision`

`bge-m3`

`embeddinggemma-300m`

Custom `IVectorStore`

`Embedder`

`DataSource`

`RagEngine`

`Reranker`

`VectorOperations`

`IVectorStore`

`TextChunking`

`QdrantEmbeddingStore`

`PgVectorEmbeddingStore`

`ImageBuffer`