Solutions · Vision · Image Embeddings

Image embeddings, aligned with text.

Generate dense vector representations of images and reuse them in any vector store. Nomic Embed Vision produces vectors aligned with nomic-embed-text, so a text query can retrieve an image and vice versa.

5-minute quickstart Embedder API

Cross-modal Aligned vector space Built-in vector stores

Model

`nomic-embed-vision`

Image vectors aligned with the text embedder.

Companion

`nomic-embed-text`

Text-side of the same vector space.

Stores

In-memory, file, Qdrant, pgvector

Same vector stores used by text RAG.

What you get

Cross-modal retrieval, without two pipelines.

Text-to-image search

Index images, query with a text prompt. The aligned vector space brings semantic matches without OCR or captioning.

Image-to-image search

Find visually similar images at scale. Useful for duplicate detection, style search, near-duplicate consolidation.

Multimodal RAG

Mix text passages and images in a single vector store. Retrieve diagrams by text questions; ground answers in both modalities.

Same API as text

The Embedder class accepts text strings AND image attachments. One API, one inference engine.

How it works

Index an image, query with text.

ImageEmbeddings.cs

using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Graphics;
using LMKit.Data.Storage;

// 1. Load the vision and text embedders. Same vector space.
var visionModel = LM.LoadFromModelID("nomic-embed-vision");
var textModel   = LM.LoadFromModelID("nomic-embed-text");

var imageEmbedder = new Embedder(visionModel);
var textEmbedder  = new Embedder(textModel);

// 2. Embed images and store in a vector store.
var store = new FileSystemVectorStore("./image-index");
foreach (var path in Directory.EnumerateFiles("./assets", "*.jpg"))
{
    float[] vec = imageEmbedder.GetEmbedding(Attachment.FromFile(path));
    await store.UpsertAsync(path, vec, new() { ["file"] = path });
}

// 3. Query with text; cross-modal retrieval works.
float[] query = textEmbedder.GetEmbedding("forklift on a warehouse floor");
var hits = await store.SearchAsync(query, topK: 5);
foreach (var hit in hits)
    Console.WriteLine($"{hit.Score:F3}  {hit.Id}");

Use cases

Where image embeddings belong.

Visual search

Product image search, stock-photo discovery, real estate listings. Text or image query, same index.

Duplicate & near-duplicate

Consolidate large media libraries. Spot reuploads, copyright violations, slightly-edited variants.

Multimodal RAG

Mixed corpora (text + images). Diagrams and figures become first-class retrieval targets.

Clustering & topic discovery

Group large image sets without labels. Use cluster centroids as auto-discovered categories.

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo

Image similarity search

Console demo: index images, query by text, retrieve by similarity.

Open on GitHub → Sample

Image similarity search walkthrough

Step-by-step doc page: prerequisites, setup, code path, expected output.

Read on docs → How-to guide

Search images by visual similarity

Cross-modal retrieval with Nomic Embed Vision.

Read the guide → How-to guide

Build a semantic search engine

Foundational guide; image embeddings plug into the same store.

Read the guide → API reference

Embedder

API reference for the unified text + image embedder.

Open the reference →

LM-Kit.NET pillars

Seven pillars, one foundation.

The seven pillars of LM-Kit.NET, plus the local runtime they share. Highlighted card is where you are now.

01 · AI Agents

Orchestration patterns

ReAct planning, supervisors, parallel and pipeline orchestrators, persistent memory, MCP clients, custom tools.

AI Agents

02 · Document Intelligence

Parse PDFs, images, EML

PDF text and table extraction, on-device OCR reaching SOTA benchmark scores, structured field extraction with grammar-constrained generation.

Document Intelligence

03 · Vision & Multimodal

VLMs, image classification, chat with image

Image understanding, classification, labeling, multimodal chat, image embeddings, VLM-OCR, background removal. Same conversation surface as LLMs.

Vision & Multimodal

04 · RAG & Knowledge

Vector search and retrieval

Built-in vector store, Qdrant and pgvector connectors, embeddings, hybrid retrieval, document chunking, source citations.

RAG & Knowledge

05 · Text Analysis

Classification, NER, PII, sentiment

Built-in classifiers and an extractor that emits typed C# objects via grammar-constrained sampling. Sentiment, keywords, language detection.

Text Analysis

06 · Speech & Audio

Audio transcription, STT

A growing local speech-to-text stack: hallucination suppression, Voice Activity Detection, real-time translation, streaming output, 100+ languages.

Speech & Audio

07 · Text Generation

Conversations, rewriting, summaries

Single-turn, multi-turn, and stateless conversation primitives. Translate, correct, rewrite, summarise. Prompt templates, streaming, grammar-constrained outputs.

Text Generation

The foundation

Every capability above runs on this runtime.

Foundation

Local Inference

The runtime all seven pillars sit on. The LM-Kit.NET NuGet ships the complete inference system: open-weight LLMs, vision-language models, embeddings, on-device speech-to-text, OCR and classifiers, accelerated on CPU, AVX2, CUDA 12/13, Vulkan or Metal. One package, zero cloud calls, predictable latency, full data and technology sovereignty.

Explore the foundation

Cross-modal, one vector space.

Start in 5 minutes Back to Vision hub