Semantic Vectors

Text & ImageEmbeddings for .NET

Transform text, images, and documents into high-dimensional vectors that capture semantic meaning. Power semantic search, RAG pipelines, clustering, and recommendations with multimodal embeddings generated 100% on-device. Choose from multiple models, store vectors with the built-in database or Qdrant, and rerank results for precision retrieval.

Start Building Free API Reference

Multimodal Built-in Vector DB 100% On-Device

Embedding Space embeddinggemma-300m

Text Input

"Machine learning algorithms"

[0.0234, -0.0891, 0.1407, 0.0562, -0.0284, ...]

Text Input

"Neural network training"

[0.0198, -0.0847, 0.1352, 0.0518, -0.0301, ...]

Image Input

diagram.png (nomic-embed-vision)

[0.0412, -0.0653, 0.0987, 0.0741, -0.0195, ...]

Text Input

"Cooking pasta recipes"

[0.0891, 0.0423, -0.0756, -0.0312, 0.0648, ...]

Cosine Similarity

0.9247

Text Vectors

100+ languages

Image Vectors

Cross-modal search

Vector Storage

4 backends

Platform Overview

Complete Embedding Infrastructure for .NET

Generate, store, search, and rerank semantic vectors from text and images with a unified API that runs entirely on your hardware.

Multimodal Embeddings

Embed text and images through the same Embedder API. Use nomic-embed-text for text vectors and nomic-embed-vision for image vectors, both aligned in the same semantic space for cross-modal search.

Unified Vector Storage

Store and query vectors with the DataSource abstraction. Switch between in-memory, built-in file-based database, Qdrant, or your own IVectorStore backend without changing application logic.

Multiple Embedding Models

Choose from nomic-embed-text, bge-m3, embeddinggemma-300m, Qwen3 Embedding models, and more. Load any model from the LM-Kit catalog or directly from Hugging Face with a single line of code.

Text Vectors

Text Embeddings

Convert any text into dense vector representations that capture semantic meaning with the Embedder class. A single API handles individual strings, batch processing for multiple inputs, and pre-tokenized content. Compare vectors with VectorOperations.CosineSimilarity to measure how closely two pieces of content relate, powering semantic search, duplicate detection, and intelligent routing.

Single text, batch, and tokenized input methods
Async variants (GetEmbeddingsAsync) for non-blocking processing
Multilingual support across 100+ languages with bge-m3
EmbeddingSize property to query model vector dimensions
VectorOperations.CosineSimilarity for measuring semantic similarity

API Reference View Demo

                        
                        
                        
                        TextEmbeddings.cs
                    

var model = LM.LoadFromModelID("embeddinggemma-300m");
var embedder = new Embedder(model);

// Single text embedding
float[] vector = embedder.GetEmbeddings(
    "Machine learning is fascinating.");
Console.WriteLine($"Dimensions: {vector.Length}");

// Batch embedding for similarity
var texts = new[] {
    "The cat sat on the mat.",
    "A feline rested on the rug.",
    "Stock market closed higher today."
};

float[][] vectors = embedder.GetEmbeddings(texts);

// Compare semantic similarity
float similar = VectorOperations.CosineSimilarity(
    vectors[0], vectors[1]);
Console.WriteLine($"Similar: {similar:F4}");
// Output: 0.9247

float different = VectorOperations.CosineSimilarity(
    vectors[0], vectors[2]);
Console.WriteLine($"Different: {different:F4}");
// Output: 0.1832
                    

Image Vectors

Image Embeddings

Generate vector representations from images using vision-enabled embedding models like nomic-embed-vision. Image vectors are aligned in the same semantic space as text vectors from nomic-embed-text, enabling cross-modal search where a text query finds relevant images and vice versa. Supports PNG, JPEG, TIFF, BMP, GIF, WEBP, and more through the ImageBuffer or Attachment APIs.

Cross-modal alignment: text and image vectors in the same space
Batch image embedding with GetEmbeddings(IEnumerable<ImageBuffer>)
Supported formats: PNG, JPEG, TIFF, BMP, GIF, WEBP, PSD, HDR, TGA
File attachment support for document-based workflows
Unified API: same Embedder class for text and images

API Reference Image Search Demo

                        
                        
                        
                        ImageEmbeddings.cs
                    

// Load vision-enabled embedding model
var model = LM.LoadFromModelID("nomic-embed-vision");
var embedder = new Embedder(model);

// Generate image embedding
var image = ImageBuffer.LoadAsRGB("photo.jpg");
float[] imgVector = embedder.GetEmbeddings(image);

Console.WriteLine(
    $"Image vector: {imgVector.Length} dims");

// Cross-modal: compare text to image
var textModel = LM.LoadFromModelID("nomic-embed-text");
var textEmbedder = new Embedder(textModel);

float[] textVec = textEmbedder.GetEmbeddings(
    "a sunset over the ocean");

float sim = VectorOperations.CosineSimilarity(
    imgVector, textVec);
Console.WriteLine($"Text-Image sim: {sim:F4}");

// Batch embed multiple images
var images = Directory.GetFiles("./photos", "*.jpg")
    .Select(f => ImageBuffer.LoadAsRGB(f));
float[][] allVecs = embedder.GetEmbeddings(images);
                    

Vector Database

Unified Vector Storage

Store, persist, and query embedding vectors with the DataSource abstraction. LM-Kit supports four storage patterns that share a single API: in-memory for rapid prototyping, a built-in file-based vector database for offline applications, Qdrant integration for distributed deployments, and a custom IVectorStore interface for proprietary backends. Switch between storage strategies without rewriting code.

In-Memory

Zero Setup

RAM-based storage with optional Serialize() to disk. Ideal for prototyping and real-time processing.

Built-in File DB

Local

Self-contained file-based engine for desktop tools and offline apps. Handles millions of embeddings without external services.

Qdrant

Cloud-Scale

Out-of-the-box Qdrant integration via QdrantEmbeddingStore for HNSW indexing and distributed workloads.

Custom IVectorStore

Extensible

Implement the IVectorStore interface to plug in any proprietary backend or existing database system.

Vector Storage Guide DataSource API

                        
                        
                        
                        VectorStorage.cs
                    

var embedModel = LM.LoadFromModelID(
    "embeddinggemma-300m");

// Create in-memory DataSource
var dataSource = DataSource.CreateInMemoryDataSource(
    "my-collection", embedModel);

// Import content with automatic chunking
var ragEngine = new RagEngine(embedModel);
ragEngine.AddDataSource(dataSource);
ragEngine.ImportText(
    File.ReadAllText("document.txt"),
    new TextChunking() { MaxChunkSize = 500 },
    "my-collection",
    "document-section");

// Serialize to disk for later reuse
dataSource.Serialize("./cache/collection.bin");

// Or use the built-in file-based DB
var fileDb = DataSource.CreateFileDataSource(
    "embeddings.dat",
    "my-collection",
    embedModel);

// Same API, different backend
ragEngine.AddDataSource(fileDb);
                    

Search & Retrieval

RAG Pipelines & Reranking

Embeddings are the backbone of retrieval-augmented generation. LM-Kit's RagEngine handles the complete pipeline from document ingestion and chunking to vector search and result reranking with the Reranker class. Use SearchSimilar for top-K retrieval, apply metadata filters for precision, and rerank results by semantic relevance to deliver the most accurate context to your LLM.

RagEngine: end-to-end document import, chunking, and retrieval
SearchSimilar with configurable top-K and minimum score thresholds
Reranker class for semantic result reranking with cross-encoder models
Metadata filtering on DataSource and Section structures
Markdown-aware, semantic, and layout-based chunking strategies

RagEngine API Reranker API RAG Demo

                        
                        
                        
                        RagWithReranking.cs
                    

var embedModel = LM.LoadFromModelID(
    "embeddinggemma-300m");
var chatModel = LM.LoadFromModelID(
    "phi-3.5-mini");
var rerankerModel = LM.LoadFromModelID(
    "bge-reranker-v2-m3");

// Build the RAG pipeline
var rag = new RagEngine(embedModel);
var ds = DataSource.LoadFromFile(
    "knowledge-base.dat");
rag.AddDataSource(ds);

// Search with reranking
var reranker = new Reranker(rerankerModel);
var results = rag.SearchSimilar(
    "How does embeddings storage work?",
    topK: 10);

var reranked = reranker.Rerank(
    "How does embeddings storage work?",
    results);

// Feed top results into LLM
var chat = new MultiTurnConversation(chatModel);
chat.SystemPrompt = "Answer using this context: "
    + string.Join("\n", reranked.Take(3));
                    

Model Catalog

Multiple Embedding Models

LM-Kit ships with a curated catalog of embedding models optimized for different use cases and hardware profiles. Load any model with a single LoadFromModelID call, or pull directly from Hugging Face repositories. From lightweight CPU-friendly models to state-of-the-art multilingual embeddings with 100+ language support, choose the right trade-off between accuracy, speed, and memory footprint for your application.

embeddinggemma-300m nomic-embed-text-1.5 bge-m3 bge-small-en Qwen3 Embedding nomic-embed-vision bge-reranker-v2-m3 Qwen3 Reranker lmkit-4b (multi-task)

One-line model loading from the LM-Kit catalog
Direct Hugging Face URL loading for any compatible GGUF model
Dedicated text, vision, and reranking model categories
Multi-task models (lmkit-4b) that combine embeddings with other capabilities

Browse Model Catalog

                        
                        
                        
                        ModelLoading.cs
                    

// Load from LM-Kit model catalog
var gemma = LM.LoadFromModelID(
    "embeddinggemma-300m");

// Multilingual text embeddings
var bgeM3 = LM.LoadFromModelID("bge-m3");

// Vision embeddings
var vision = LM.LoadFromModelID(
    "nomic-embed-vision");

// Reranking model
var reranker = LM.LoadFromModelID(
    "bge-reranker-v2-m3");

// Or load from Hugging Face directly
var custom = new LM(
    "https://huggingface.co/lm-kit/" +
    "nomic-embed-text-1.5/resolve/main/" +
    "nomic-embed-text-1.5-F16.gguf");

// Check embedding dimensions
Console.WriteLine(
    $"Dimensions: {gemma.EmbeddingSize}");

// All models use the same Embedder API
var embedder = new Embedder(gemma);
float[] vec = embedder.GetEmbeddings(
    "Hello world");
                    

Applications

Real-World Use Cases

Embeddings are the foundation for a wide range of intelligent applications. Here is how teams use LM-Kit's embedding infrastructure in production.

Semantic Search

Replace keyword search with meaning-based retrieval. Find relevant documents, code snippets, and knowledge base articles even when the query uses different terminology than the source.

Retrieval-Augmented Generation

Ground LLM responses in your data. Chunk documents, embed them into a vector store, retrieve the most relevant context, and feed it to the model for accurate, sourced answers.

Image Similarity Search

Find visually and semantically similar images across large collections. Enable "shop the look" features, detect near-duplicates, and build visual recommendation engines.

Clustering & Topic Modeling

Group similar content automatically. Discover themes in customer feedback, organize document collections, and segment user behavior based on semantic similarity.

Duplicate Detection

Identify semantically similar content regardless of wording. Detect duplicate support tickets, flag plagiarism, and deduplicate datasets before training.

Recommendation Systems

Build content-based recommendations by comparing user preferences with item embeddings. Suggest articles, products, or media based on semantic affinity rather than collaborative filtering alone.

Why LM-Kit

Local Embeddings, Zero Compromises

Every vector is generated on your hardware. No API keys, no rate limits, no data leaving your network.

Complete Data Privacy

Your text and images never leave your infrastructure. Embeddings are generated entirely on-device, making compliance with GDPR, HIPAA, and data residency requirements straightforward.

Zero Per-Token Costs

No API fees, no rate limits, no billing surprises. Embed millions of documents at CPU speed with a fixed infrastructure cost. Scale horizontally with no marginal expense per vector.

Hardware-Accelerated

Leverage CUDA (NVIDIA), Vulkan (AMD/Intel), Metal (Apple Silicon), or multi-GPU configurations. LM-Kit automatically selects the fastest available backend for your hardware.

Cross-Platform .NET

Deploy on Windows, macOS, Linux, iOS, and Android. Target .NET 4.6.2 through .NET 9.0. One NuGet package, zero external dependencies, every platform supported.

Developer Resources

API Reference

Complete documentation for all embedding, storage, and retrieval classes.

Embedder

Generate text and image embeddings with single, batch, and async methods. The core class for all embedding operations.

View Docs

DataSource

Unified vector storage abstraction supporting in-memory, file-based, Qdrant, and custom backends with metadata management.

View Docs

RagEngine

End-to-end retrieval-augmented generation pipeline with document import, chunking, and semantic search.

View Docs

Reranker

Rerank search results by semantic relevance using cross-encoder models for improved retrieval precision.

View Docs

VectorOperations

Static methods for cosine similarity and other vector operations to measure semantic distance between embeddings.

View Docs

IVectorStore

Interface abstraction for plugging in any custom vector backend. Implement to connect proprietary storage systems.

View Docs

TextChunking

Configure document chunking with max chunk size, overlap, and strategy options for optimal embedding quality.

View Docs

QdrantEmbeddingStore

Out-of-the-box Qdrant integration with HNSW indexing and payload filtering for cloud-scale deployments.

View on GitHub

ImageBuffer

Load and process images for embedding generation. Supports PNG, JPEG, TIFF, BMP, GIF, WEBP, and more.

View Docs

Ready to Build with Embeddings?

Multimodal vectors. Four storage backends. RAG with reranking. All running 100% on your hardware.

Download Free Getting Started Guide

Text & ImageEmbeddings for .NET

Complete Embedding Infrastructure for .NET

Multimodal Embeddings

Unified Vector Storage

Multiple Embedding Models

Text Embeddings

Image Embeddings

Unified Vector Storage

In-Memory

Built-in File DB

Qdrant

Custom IVectorStore

RAG Pipelines & Reranking

Multiple Embedding Models

Real-World Use Cases

Semantic Search

Retrieval-Augmented Generation

Image Similarity Search

Clustering & Topic Modeling

Duplicate Detection

Recommendation Systems

Local Embeddings, Zero Compromises

Complete Data Privacy

Zero Per-Token Costs

Hardware-Accelerated

Cross-Platform .NET

API Reference

Ready to Build with Embeddings?

Ready to Build Local AI Agents?