OnNewSegment
Fires per audio segment, not per file.
Subscribe to OnNewSegment and the transcription engine pushes
each segment to your UI as soon as it lands, complete with timestamps,
detected language, and a per-segment confidence score. No buffering until
end-of-file. No polling. Sub-second latency on a single CPU core for the
smaller Whisper variants.
OnNewSegmentFires per audio segment, not per file.
Text, start/end times, confidence, language.
WAV, MP3, FLAC, OGG, M4A, WMA.
CancellationToken stops mid-stream.
Batch transcription is fine for archival workflows. For live captions, dictation
apps, real-time translation, meeting assistants, accessibility tools, anything
a person is actually waiting on, you need streaming output. SpeechToText
exposes streaming as a first-class event, not as a chunking workaround.
Subscribe to OnNewSegment, call TranscribeAsync, segments arrive on your handler as the model produces them.
using LMKit.Model;
using LMKit.Speech;
var model = LM.LoadFromModelID("whisper-large-turbo3");
var stt = new SpeechToText(model);
stt.OnNewSegment += (s, seg) =>
Console.WriteLine($"[{seg.StartTime:hh\\:mm\\:ss}] {seg.Text}");
await stt.TranscribeAsync("meeting.mp3");
Each segment carries everything you need to render a caption, fix subtitle timing, route by language, or flag low-confidence regions for human review.
Text
The decoded text for this segment. Already passes dictation formatting if enabled (punctuation, capitalisation, sentence boundaries).
Time
StartTime and EndTime as TimeSpan. Plug straight into an SRT/VTT writer or sync with a video timeline.
Score
Per-segment confidence in [0, 1]. Threshold to mark uncertain regions or trigger re-transcription with a larger model.
Lang
ISO language code identified for this segment. Useful for multilingual audio: route each segment to the right downstream pipeline.
The same event hook drives live captions, dictation UIs, and meeting transcripts. Pick the shape that matches your app.
Push each segment to a caption overlay. Confidence threshold marks uncertain segments in grey so a human editor can quickly spot review candidates.
stt.OnNewSegment += (s, seg) =>
{
var css = seg.Confidence < 0.6 ? "caption low" : "caption";
captionView.Append(new CaptionLine(seg.Text, seg.StartTime, css));
};
Append each segment to a text editor. Enable dictation formatting so the editor sees properly punctuated, capitalised sentences rather than raw token streams.
stt.DictationFormatting = true;
stt.OnNewSegment += (s, seg) =>
editor.Document.Insert(editor.CaretIndex, seg.Text + " ");
Stream microphone audio chunks. The engine handles segment boundaries internally so the handler only fires when there is something to render.
var wav = WaveFile.FromMicrophone(deviceIndex: 0);
var cts = new CancellationTokenSource();
await stt.TranscribeAsync(wav, cts.Token);
Working console demos on GitHub, step-by-step how-to guides on the docs site, and the API reference for the classes used on this page.
Console demo: streaming STT with VAD and hallucination suppression.
Open on GitHub → AppCross-platform desktop transcription app built on LM-Kit.NET and .NET MAUI.
Open → API referenceAPI reference for the streaming transcription engine.
Open the reference →Whisper-family models, segment-by-segment streaming, full on-device. Pair with Voice Activity Detection for cleaner output on noisy audio.