Solutions · Text generation · Conversation primitives

Three primitives. Every interaction.

Chatbots, RAG, agents, document Q&A, and structured extraction all sit on top of the same three classes: SingleTurnConversation for one-shot calls, MultiTurnConversation for stateful chat with KV-cache reuse, and StatelessConversation for parallel-safe deterministic invocation. Pick the right one once, the rest of the SDK composes naturally on top.

Streaming Cancellation Session persistence

SingleTurn

One prompt in, one response out. No history. Useful for classification, extraction, summarisation calls.

MultiTurn

Stateful chat. Tracks history, reuses KV-cache between turns, supports streaming, hibernation, and tools.

Stateless

No shared state. Safe to invoke in parallel from many threads. Ideal for batch and high-throughput servers.

Why three primitives

One model, three calling patterns.

Every workload that touches an LLM falls into one of three shapes. A request that has no history. A conversation that does. A throughput-oriented call that needs to run many times in parallel without contaminating other invocations. Mixing the three behind one class produces leaky abstractions and surprising bugs; splitting them keeps each path simple and predictable.

Honest about state

Each class names its lifetime: nothing carried across calls, full history with KV-cache, or explicitly parallel-safe. Reading the type tells you what to expect.

KV-cache reuse

MultiTurnConversation reuses the KV-cache between turns. The model never re-tokenises the prior history. Latency stays low across long sessions.

Deterministic parallelism

StatelessConversation guarantees no shared mutable state. Hundreds of concurrent calls do not contaminate each other. The right shape for a server endpoint.

Streaming everywhere

All three primitives expose SubmitAsync with token-by-token streaming via async enumerables. Render in real time without changing class.

Cancellation native

CancellationToken on every async call. Cancel a slow generation cleanly; the engine respects it within the next token.

Composes upward

RAG chat, PDF chat, agents, function calling, skills all sit on top of one of these three. Choosing the right primitive at the bottom keeps the higher layers honest.

Which one to use

A short decision tree.

No history needed

SingleTurnConversation

Each call is independent. Classifying a string, extracting fields from one document, summarising one passage. Lower memory, simpler call site, no cleanup. Fastest for one-off operations.

Conversation context matters

MultiTurnConversation

A user is talking to the model over multiple turns. History is maintained; the KV-cache is reused between turns to keep latency low; tools, skills, agents, and hibernation hook in here.

Throughput & parallelism

StatelessConversation

A server endpoint receiving many concurrent requests. No history sharing between calls; each call independent and parallel-safe. Maximises throughput on shared GPU.

Real code

Four patterns, four call sites.

The three conversation primitives plus a streaming-and-cancellation snippet that applies to all of them. Pick a tab to see the matching pattern in production C#.

SingleTurnConversation is the cheapest path: one prompt in, one reply out, no history, no follow-up. Use it for translation, classification, one-shot extraction, and any RPC-style call.

SingleTurn.cs
using LMKit.TextGeneration;

// One-shot. No history, no follow-up. Cheapest path.
var chat = new SingleTurnConversation(model)
{
    SystemMessage = "You translate technical English to French."
};

string reply = await chat.SubmitAsync("What is a kernel panic?");
Console.WriteLine(reply);
Shared surface

What every primitive does.

Six capabilities show up on all three classes. Switching primitives does not mean re-learning the API.

System prompt

Set SystemMessage before any submission. The conversation honours it for every turn.

Sync and async

Submit for synchronous code, SubmitAsync with cancellation token for async paths.

Streaming

StreamAsync yields tokens via IAsyncEnumerable. Render in real time, abort cleanly, log per-token.

Sampling controls

Per-instance SamplingMode, LogitBias, TokenPenaltyPolicy, SpeculativeDecoding. Tune output without changing the call site.

Structured output

Pair with grammar-constrained generation to force JSON, schema, or whitelist output. The conversation type does not change.

Cancellation

CancellationToken on every async method. Honoured at the next-token boundary.

Where each one ships

Concrete workloads.

SingleTurn: classification

Sentiment, intent, language detection. Each input independent; lower memory than a stateful session.

SingleTurn: extraction

Pull fields from one document. The conversation does not need to remember prior documents; each call is a clean slate.

MultiTurn: chatbots

Customer support, internal assistants, code copilots. KV-cache reuse keeps follow-up turns fast.

MultiTurn: agents

Agent loops require iteration with shared state. MultiTurnConversation backs the agent runtime.

Stateless: server endpoints

Hundreds of parallel calls per second through one model. Stateless guarantees no contamination between concurrent requests.

Stateless: batch jobs

Process a queue of independent inputs. Maximise throughput; the model is shared, the conversation state is not.

Related capabilities

Conversations plus the rest.

Prompt templates

Compose dynamic system prompts and turn content through the templating engine before submitting.

Prompt templates

Sampling controls

Per-conversation sampling parameters: temperature, logit bias, repetition penalty, speculative decoding.

Sampling controls

Context hibernation

Pause an idle MultiTurnConversation; free the GPU; resume hours later transparently.

Context hibernation

Chatbots

The chatbot story end-to-end: memory, RAG, tools, skills, all on top of MultiTurnConversation.

Chatbots

Three primitives. Every interaction.

Get Community Edition Download