🎙️Introducing Speech-to-Text Support in LM-Kit

Introduction

LM-Kit has already enabled .NET developers to integrate advanced text and vision processing into their .NET applications. Today, we’re excited to expand these capabilities even further by introducing on-device speech recognition and audio analysis.

Whether you’re building transcription tools for customer support, multilingual accessibility features, or local voice-controlled interfaces, LM-Kit.NET’s new audio capabilities provide the flexibility and performance to get started quickly, without compromising on privacy.

chat
Speaking cat

A New Dimension: Audio Processing Made Easy

LM-Kit.NET now enables .NET developers to leverage cutting-edge audio processing features without relying on cloud providers. Your data stays local, secure, and under your complete control.

Key audio capabilities include:

  • On-Device AI Transcription: Fast, secure, and accurate speech-to-text directly on your device.

  • Voice Activity Detection (VAD): Precisely detect speech segments with SileroVAD 5, customizable for specialized use cases. more information about VAD.

  • Automatic Language Detection: Quickly identify the language from your audio input.

  • Real-Time Speech Translation: Instantly translate speech into English from over 100 supported languages.

  • Universal WAV Compatibility: Effortlessly process audio with standard WAV file support.

  • Batch Processing: Efficiently handle multiple audio streams simultaneously with built-in multithreading.

Powering Speech Recognition with Whisper

LM-Kit integrates Whisper v3 models, offering a balance of accuracy and speed optimized for local execution. We’ve included quantized versions of Whisper models in our catalog to maximize performance on any hardware setup.

Quick Start: Transcribe Audio with Ease

With LM-Kit, converting audio to text is straightforward. Here are some quick examples:

Language Detection Example:
				
					var model = LM.LoadFromModelID("whisper-large-turbo3");
var wavFile = new WaveFile(@"d:\discussion.wav");
var engine = new SpeechToText(model);
var language = engine.DetectLanguage(wavFile);
Console.WriteLine($"Detected language: {language}");
				
			
Speech-to-Text Example:
				
					var model = LM.LoadFromModelID("whisper-large-turbo3");
var wavFile = new WaveFile(@"d:\discussion.wav");
var engine = new SpeechToText(model);
engine.OnNewSegment += (sender, e) => Console.WriteLine(e.Segment);
var transcription = engine.Transcribe(wavFile);
				
			
Speech Translation (Audio to English):
				
					var model = LM.LoadFromModelID("whisper-large3");
var wavFile = new WaveFile(@"d:\discussion.wav");
SpeechToText engine = new(model)
{
    Mode = SpeechToText.SpeechToTextMode.Translation
};
engine.OnNewSegment += (sender, e) => Console.WriteLine(e.Segment);
var transcription = engine.Transcribe(wavFile);
				
			

Try It Yourself – No Registration Required

Curious to see how it works? Check out our demo repository:

👉 LM-Kit.NET Speech-to-Text Demo Repository

Download, run the example, and experience powerful speech-to-text locally, completely registration-free.

LM-Kit Speech to Text Demo
LM-Kit Speech to Text Demo

What's Next?

We’re continuously enhancing our speech processing capabilities. Upcoming features include:

  • Real-Time Recognition: Continuous streaming with minimal delay.

  • Speaker Diarization: Distinguish between multiple speakers in a single recording.

  • Expanded VAD Logic: Improved speech detection in noisy environments.

  • More Models: Broader support for open and commercial STT models.

Unleash the Power of Local Speech AI

LM-Kit.NET empowers your applications with state-of-the-art audio processing while preserving your data privacy and control. Upgrade your apps today, and transform audio data into actionable insights effortlessly.

Share Post