nomic-embed-vision
Image vectors aligned with the text embedder.
Generate dense vector representations of images and reuse them
in any vector store. Nomic Embed Vision produces vectors
aligned with nomic-embed-text, so a text query
can retrieve an image and vice versa.
nomic-embed-visionImage vectors aligned with the text embedder.
nomic-embed-textText-side of the same vector space.
Same vector stores used by text RAG.
01
Index images, query with a text prompt. The aligned vector space brings semantic matches without OCR or captioning.
02
Find visually similar images at scale. Useful for duplicate detection, style search, near-duplicate consolidation.
03
Mix text passages and images in a single vector store. Retrieve diagrams by text questions; ground answers in both modalities.
04
The Embedder class accepts text strings AND image attachments. One API, one inference engine.
using LMKit.Model; using LMKit.Embeddings; using LMKit.Graphics; using LMKit.Data.Storage; // 1. Load the vision and text embedders. Same vector space. var visionModel = LM.LoadFromModelID("nomic-embed-vision"); var textModel = LM.LoadFromModelID("nomic-embed-text"); var imageEmbedder = new Embedder(visionModel); var textEmbedder = new Embedder(textModel); // 2. Embed images and store in a vector store. var store = new FileSystemVectorStore("./image-index"); foreach (var path in Directory.EnumerateFiles("./assets", "*.jpg")) { float[] vec = imageEmbedder.GetEmbedding(Attachment.FromFile(path)); await store.UpsertAsync(path, vec, new() { ["file"] = path }); } // 3. Query with text; cross-modal retrieval works. float[] query = textEmbedder.GetEmbedding("forklift on a warehouse floor"); var hits = await store.SearchAsync(query, topK: 5); foreach (var hit in hits) Console.WriteLine($"{hit.Score:F3} {hit.Id}");
Product image search, stock-photo discovery, real estate listings. Text or image query, same index.
Consolidate large media libraries. Spot reuploads, copyright violations, slightly-edited variants.
Mixed corpora (text + images). Diagrams and figures become first-class retrieval targets.
Group large image sets without labels. Use cluster centroids as auto-discovered categories.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: index images, query by text, retrieve by similarity.
Open on GitHub → How-to guideCross-modal retrieval with Nomic Embed Vision.
Read the guide → How-to guideFoundational guide; image embeddings plug into the same store.
Read the guide → API referenceAPI reference for the unified text + image embedder.
Open the reference →The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.
01 · AI Agents
ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.
AI Agents02 · Document Intelligence
PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.
Document Intelligence03 · Vision & Multimodal
Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.
Vision & Multimodal04 · RAG & Knowledge
Built-in vector store, Qdrant connector, embeddings, hybrid retrieval, document chunking, source citations.
RAG & Knowledge05 · Text Analysis
Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.
Text Analysis06 · Speech & Audio
A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.
Speech & Audio07 · Text Generation
Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.
Text GenerationThe foundation
Every capability above runs on this runtime.
Foundation
The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.