SingleTurn
One prompt in, one response out. No history. Useful for classification, extraction, summarisation calls.
Chatbots, RAG, agents, document Q&A, and structured extraction
all sit on top of the same three classes:
SingleTurnConversation for one-shot calls,
MultiTurnConversation for stateful chat with
KV-cache reuse, and StatelessConversation for
parallel-safe deterministic invocation. Pick the right one
once, the rest of the SDK composes naturally on top.
One prompt in, one response out. No history. Useful for classification, extraction, summarisation calls.
Stateful chat. Tracks history, reuses KV-cache between turns, supports streaming, hibernation, and tools.
No shared state. Safe to invoke in parallel from many threads. Ideal for batch and high-throughput servers.
Every workload that touches an LLM falls into one of three shapes. A request that has no history. A conversation that does. A throughput-oriented call that needs to run many times in parallel without contaminating other invocations. Mixing the three behind one class produces leaky abstractions and surprising bugs; splitting them keeps each path simple and predictable.
Each class names its lifetime: nothing carried across calls, full history with KV-cache, or explicitly parallel-safe. Reading the type tells you what to expect.
MultiTurnConversation reuses the KV-cache between turns. The model never re-tokenises the prior history. Latency stays low across long sessions.
StatelessConversation guarantees no shared mutable state. Hundreds of concurrent calls do not contaminate each other. The right shape for a server endpoint.
All three primitives expose SubmitAsync with token-by-token streaming via async enumerables. Render in real time without changing class.
CancellationToken on every async call. Cancel a slow generation cleanly; the engine respects it within the next token.
RAG chat, PDF chat, agents, function calling, skills all sit on top of one of these three. Choosing the right primitive at the bottom keeps the higher layers honest.
No history needed
Each call is independent. Classifying a string, extracting fields from one document, summarising one passage. Lower memory, simpler call site, no cleanup. Fastest for one-off operations.
Conversation context matters
A user is talking to the model over multiple turns. History is maintained; the KV-cache is reused between turns to keep latency low; tools, skills, agents, and hibernation hook in here.
Throughput & parallelism
A server endpoint receiving many concurrent requests. No history sharing between calls; each call independent and parallel-safe. Maximises throughput on shared GPU.
The three conversation primitives plus a streaming-and-cancellation snippet that applies to all of them. Pick a tab to see the matching pattern in production C#.
SingleTurnConversation is the cheapest path: one prompt
in, one reply out, no history, no follow-up. Use it for translation,
classification, one-shot extraction, and any RPC-style call.
using LMKit.TextGeneration; // One-shot. No history, no follow-up. Cheapest path. var chat = new SingleTurnConversation(model) { SystemMessage = "You translate technical English to French." }; string reply = await chat.SubmitAsync("What is a kernel panic?"); Console.WriteLine(reply);
MultiTurnConversation keeps history and the KV-cache
across calls. Follow-up turns reuse the cache for near-instant first
tokens. Hibernate when the user goes idle; rehydrate on the next
message.
// Stateful chat with token-by-token streaming. var chat = new MultiTurnConversation(model) { SystemMessage = "You are a helpful assistant for the GIS team." }; await foreach (var token in chat.StreamAsync("What does WGS84 stand for?")) { Console.Write(token.Text); } // Follow-up reuses the KV-cache. var reply = await chat.SubmitAsync("And the difference vs ED50?"); // Hibernate when the user goes idle. Resume hours later transparently. if (chat is IKVCache cache) { _ = cache.HibernateAsync(); }
StatelessConversation is built for server endpoints
handling many parallel calls. No history, no shared state, no
cross-request contamination. Pair with Task.WhenAll
for batch throughput.
// Server endpoint. Many parallel calls. No state shared between them. var chat = new StatelessConversation(model) { SystemMessage = "Classify each input as Spam, Ham, or Promotional." }; // Many parallel calls, no contamination, no shared history. var classified = await Task.WhenAll( inputs.Select(input => chat.SubmitAsync(input, ct)));
Cancellation is honoured at the next token boundary. Same shape on
all three primitives: pass a CancellationToken, catch
OperationCanceledException, the partial reply is yours.
// User stops generation mid-response. Cancellation is honoured at the next token. var cts = new CancellationTokenSource(); cancelButton.Click += (_, _) => cts.Cancel(); try { await foreach (var token in chat.StreamAsync(prompt, cts.Token)) { outputBox.Append(token.Text); } } catch (OperationCanceledException) { outputBox.Append("\n[stopped]"); }
Six capabilities show up on all three classes. Switching primitives does not mean re-learning the API.
Set SystemMessage before any submission. The conversation honours it for every turn.
Submit for synchronous code, SubmitAsync with cancellation token for async paths.
StreamAsync yields tokens via IAsyncEnumerable. Render in real time, abort cleanly, log per-token.
Per-instance SamplingMode, LogitBias, TokenPenaltyPolicy, SpeculativeDecoding. Tune output without changing the call site.
Pair with grammar-constrained generation to force JSON, schema, or whitelist output. The conversation type does not change.
CancellationToken on every async method. Honoured at the next-token boundary.
Sentiment, intent, language detection. Each input independent; lower memory than a stateful session.
Pull fields from one document. The conversation does not need to remember prior documents; each call is a clean slate.
Customer support, internal assistants, code copilots. KV-cache reuse keeps follow-up turns fast.
Agent loops require iteration with shared state. MultiTurnConversation backs the agent runtime.
Hundreds of parallel calls per second through one model. Stateless guarantees no contamination between concurrent requests.
Process a queue of independent inputs. Maximise throughput; the model is shared, the conversation state is not.
Compose dynamic system prompts and turn content through the templating engine before submitting.
Per-conversation sampling parameters: temperature, logit bias, repetition penalty, speculative decoding.
Pause an idle MultiTurnConversation; free the GPU; resume hours later transparently.
The chatbot story end-to-end: memory, RAG, tools, skills, all on top of MultiTurnConversation.
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: one-shot prompt with a local LLM.
Open on GitHub → DemoConsole demo: history-aware chat with streaming.
Open on GitHub → DemoConsole demo: save and restore conversation state across runs.
Open on GitHub → How-to guideHow-to: hibernate and rehydrate KV-cache + history.
Read the guide →