Solutions · Vision · Image Embeddings

Image embeddings, aligned with text.

Generate dense vector representations of images and reuse them in any vector store. Nomic Embed Vision produces vectors aligned with nomic-embed-text, so a text query can retrieve an image and vice versa.

Cross-modal Aligned vector space Built-in vector stores
Model

nomic-embed-vision

Image vectors aligned with the text embedder.

Companion

nomic-embed-text

Text-side of the same vector space.

Stores

In-memory, file, Qdrant

Same vector stores used by text RAG.

What you get

Cross-modal retrieval, without two pipelines.

01

Text-to-image search

Index images, query with a text prompt. The aligned vector space brings semantic matches without OCR or captioning.

02

Image-to-image search

Find visually similar images at scale. Useful for duplicate detection, style search, near-duplicate consolidation.

03

Multimodal RAG

Mix text passages and images in a single vector store. Retrieve diagrams by text questions; ground answers in both modalities.

04

Same API as text

The Embedder class accepts text strings AND image attachments. One API, one inference engine.

How it works

Index an image, query with text.

ImageEmbeddings.cs
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Graphics;
using LMKit.Data.Storage;

// 1. Load the vision and text embedders. Same vector space.
var visionModel = LM.LoadFromModelID("nomic-embed-vision");
var textModel   = LM.LoadFromModelID("nomic-embed-text");

var imageEmbedder = new Embedder(visionModel);
var textEmbedder  = new Embedder(textModel);

// 2. Embed images and store in a vector store.
var store = new FileSystemVectorStore("./image-index");
foreach (var path in Directory.EnumerateFiles("./assets", "*.jpg"))
{
    float[] vec = imageEmbedder.GetEmbedding(Attachment.FromFile(path));
    await store.UpsertAsync(path, vec, new() { ["file"] = path });
}

// 3. Query with text; cross-modal retrieval works.
float[] query = textEmbedder.GetEmbedding("forklift on a warehouse floor");
var hits = await store.SearchAsync(query, topK: 5);
foreach (var hit in hits)
    Console.WriteLine($"{hit.Score:F3}  {hit.Id}");
Use cases

Where image embeddings belong.

Visual search

Product image search, stock-photo discovery, real estate listings. Text or image query, same index.

Duplicate & near-duplicate

Consolidate large media libraries. Spot reuploads, copyright violations, slightly-edited variants.

Multimodal RAG

Mixed corpora (text + images). Diagrams and figures become first-class retrieval targets.

Clustering & topic discovery

Group large image sets without labels. Use cluster centroids as auto-discovered categories.

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Cross-modal, one vector space.

Start in 5 minutes Back to Vision hub