Solutions · Text generation · Conversation primitives

Three primitives. Every interaction.

Chatbots, RAG, agents, document Q&A, and structured extraction all sit on top of the same three classes: SingleTurnConversation for one-shot calls, MultiTurnConversation for stateful chat with KV-cache reuse, and StatelessConversation for parallel-safe deterministic invocation. Pick the right one once, the rest of the SDK composes naturally on top.

Start building free TextGeneration API

Streaming Cancellation Session persistence

SingleTurn

One prompt in, one response out. No history. Useful for classification, extraction, summarisation calls.

MultiTurn

Stateful chat. Tracks history, reuses KV-cache between turns, supports streaming, hibernation, and tools.

Stateless

No shared state. Safe to invoke in parallel from many threads. Ideal for batch and high-throughput servers.

Why three primitives

One model, three calling patterns.

Every workload that touches an LLM falls into one of three shapes. A request that has no history. A conversation that does. A throughput-oriented call that needs to run many times in parallel without contaminating other invocations. Mixing the three behind one class produces leaky abstractions and surprising bugs; splitting them keeps each path simple and predictable.

Honest about state

Each class names its lifetime: nothing carried across calls, full history with KV-cache, or explicitly parallel-safe. Reading the type tells you what to expect.

KV-cache reuse

MultiTurnConversation reuses the KV-cache between turns. The model never re-tokenises the prior history. Latency stays low across long sessions.

Deterministic parallelism

StatelessConversation guarantees no shared mutable state. Hundreds of concurrent calls do not contaminate each other. The right shape for a server endpoint.

Streaming everywhere

All three primitives expose SubmitAsync with token-by-token streaming via async enumerables. Render in real time without changing class.

Cancellation native

CancellationToken on every async call. Cancel a slow generation cleanly; the engine respects it within the next token.

Composes upward

RAG chat, PDF chat, agents, function calling, skills all sit on top of one of these three. Choosing the right primitive at the bottom keeps the higher layers honest.

Which one to use

A short decision tree.

No history needed

SingleTurnConversation

Each call is independent. Classifying a string, extracting fields from one document, summarising one passage. Lower memory, simpler call site, no cleanup. Fastest for one-off operations.

Conversation context matters

MultiTurnConversation

A user is talking to the model over multiple turns. History is maintained; the KV-cache is reused between turns to keep latency low; tools, skills, agents, and hibernation hook in here.

Throughput & parallelism

StatelessConversation

A server endpoint receiving many concurrent requests. No history sharing between calls; each call independent and parallel-safe. Maximises throughput on shared GPU.

Real code

Four patterns, four call sites.

The three conversation primitives plus a streaming-and-cancellation snippet that applies to all of them. Pick a tab to see the matching pattern in production C#.

SingleTurnConversation is the cheapest path: one prompt in, one reply out, no history, no follow-up. Use it for translation, classification, one-shot extraction, and any RPC-style call.

SingleTurn.cs

using LMKit.TextGeneration;

// One-shot. No history, no follow-up. Cheapest path.
var chat = new SingleTurnConversation(model)
{
    SystemMessage = "You translate technical English to French."
};

string reply = await chat.SubmitAsync("What is a kernel panic?");
Console.WriteLine(reply);

MultiTurnConversation keeps history and the KV-cache across calls. Follow-up turns reuse the cache for near-instant first tokens. Hibernate when the user goes idle; rehydrate on the next message.

MultiTurnStreaming.cs

// Stateful chat with token-by-token streaming.
var chat = new MultiTurnConversation(model)
{
    SystemMessage = "You are a helpful assistant for the GIS team."
};

await foreach (var token in chat.StreamAsync("What does WGS84 stand for?"))
{
    Console.Write(token.Text);
}

// Follow-up reuses the KV-cache.
var reply = await chat.SubmitAsync("And the difference vs ED50?");

// Hibernate when the user goes idle. Resume hours later transparently.
if (chat is IKVCache cache)
{
    _ = cache.HibernateAsync();
}

StatelessConversation is built for server endpoints handling many parallel calls. No history, no shared state, no cross-request contamination. Pair with Task.WhenAll for batch throughput.

StatelessParallel.cs

// Server endpoint. Many parallel calls. No state shared between them.
var chat = new StatelessConversation(model)
{
    SystemMessage = "Classify each input as Spam, Ham, or Promotional."
};

// Many parallel calls, no contamination, no shared history.
var classified = await Task.WhenAll(
    inputs.Select(input => chat.SubmitAsync(input, ct)));

Cancellation is honoured at the next token boundary. Same shape on all three primitives: pass a CancellationToken, catch OperationCanceledException, the partial reply is yours.

CancellationAndStreaming.cs

// User stops generation mid-response. Cancellation is honoured at the next token.
var cts = new CancellationTokenSource();
cancelButton.Click += (_, _) => cts.Cancel();

try
{
    await foreach (var token in chat.StreamAsync(prompt, cts.Token))
    {
        outputBox.Append(token.Text);
    }
}
catch (OperationCanceledException)
{
    outputBox.Append("\n[stopped]");
}

Shared surface

What every primitive does.

Six capabilities show up on all three classes. Switching primitives does not mean re-learning the API.

System prompt

Set SystemMessage before any submission. The conversation honours it for every turn.

Sync and async

Submit for synchronous code, SubmitAsync with cancellation token for async paths.

Streaming

StreamAsync yields tokens via IAsyncEnumerable. Render in real time, abort cleanly, log per-token.

Sampling controls

Per-instance SamplingMode, LogitBias, TokenPenaltyPolicy, SpeculativeDecoding. Tune output without changing the call site.

Structured output

Pair with grammar-constrained generation to force JSON, schema, or whitelist output. The conversation type does not change.

Cancellation

CancellationToken on every async method. Honoured at the next-token boundary.

Where each one ships

Concrete workloads.

SingleTurn: classification

Sentiment, intent, language detection. Each input independent; lower memory than a stateful session.

SingleTurn: extraction

Pull fields from one document. The conversation does not need to remember prior documents; each call is a clean slate.

MultiTurn: chatbots

Customer support, internal assistants, code copilots. KV-cache reuse keeps follow-up turns fast.

MultiTurn: agents

Agent loops require iteration with shared state. MultiTurnConversation backs the agent runtime.

Stateless: server endpoints

Hundreds of parallel calls per second through one model. Stateless guarantees no contamination between concurrent requests.

Stateless: batch jobs

Process a queue of independent inputs. Maximise throughput; the model is shared, the conversation state is not.

Demos and guides

Working references.

DemoMulti-turn chat DemoPersistent session DemoCustom sampling APIMultiTurnConversation APISingleTurnConversation APIStatelessConversation

Related capabilities

Conversations plus the rest.

Prompt templates

Compose dynamic system prompts and turn content through the templating engine before submitting.

Prompt templates

Sampling controls

Per-conversation sampling parameters: temperature, logit bias, repetition penalty, speculative decoding.

Sampling controls

Context hibernation

Pause an idle MultiTurnConversation; free the GPU; resume hours later transparently.

Context hibernation

Chatbots

The chatbot story end-to-end: memory, RAG, tools, skills, all on top of MultiTurnConversation.

Chatbots

Demos & docs

Build it. Read it. Try it.

Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.

Demo