Voxtral transcribes on the pace of sound.

In the present day, we’re releasing Voxtral Transcribe 2, two next-generation speech-to-text fashions with state-of-the-art transcription high quality, diarization, and ultra-low latency. The household consists of Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for reside purposes. Voxtral Realtime is open-weights underneath the Apache 2.0 license.

We’re additionally launching an audio playground in Mistral Studio to check transcription immediately, powered by Voxtral Transcribe 2, with diarization and timestamps.

Highlights.

Voxtral Mini Transcribe V2: State-of-the-art transcription with speaker diarization, context biasing, and word-level timestamps in 13 languages.
Voxtral Realtime: Goal-built for reside transcription with latency configurable right down to sub-200ms, enabling voice brokers and real-time purposes.
Greatest-in-class effectivity: Business-leading accuracy at a fraction of the price, with Voxtral Mini Transcribe V2 reaching the bottom phrase error charge, on the lowest worth level.
Open weights: Voxtral Realtime ships underneath Apache 2.0, deployable on edge for privacy-first purposes.

Voxtral Realtime.

Voxtral Realtime is purpose-built for purposes the place latency issues. Not like approaches that adapt offline fashions by processing audio in chunks, Realtime makes use of a novel streaming structure that transcribes audio because it arrives. The mannequin delivers transcriptions with delay configurable right down to sub-200ms, unlocking a brand new class of voice-first purposes.

Phrase error charge (decrease is best) throughout languages within the FLEURS transcription benchmark.

At 2.4 seconds delay, best for subtitling, Realtime matches Voxtral Mini Transcribe V2, our newest batch mannequin. At 480ms delay, it stays inside 1-2% phrase error charge, enabling voice brokers with near-offline accuracy.

The mannequin is natively multilingual, reaching robust transcription efficiency in 13 languages, together with English, Chinese language, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs effectively on edge gadgets, guaranteeing privateness and safety for delicate deployments.

We’re releasing the mannequin weights underneath Apache 2.0 on the Hugging Face Hub.

Voxtral Mini Transcribe V2.

Common diarization error charge (decrease is best) throughout 5 English benchmarks (Switchboard, CallHome, AMI-IHM, AMI-SDM, SBCSAE) and the TalkBank multilingual benchmark (German, Spanish, English, Chinese language, Japanese).

Common phrase error charge (decrease is best) throughout the top-10 languages within the FLEURS transcription benchmark.

Voxtral Mini Transcribe V2 delivers important enhancements in transcription and diarization high quality throughout languages and domains. At roughly 4% phrase error charge on FLEURS and $0.003/min, Voxtral presents the most effective price-performance of any transcription API. It outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Meeting Common, and Deepgram Nova on accuracy, and processes audio roughly 3x sooner than ElevenLabs’ Scribe v2 whereas matching on high quality at one-fifth the price.

Enterprise-ready options.

Voxtral Mini Transcribe V2 introduces key capabilities for enterprise deployments.

Speaker diarization.

Generate transcriptions with speaker labels and exact begin/finish instances. Best for assembly transcription, interview evaluation, and multi-party name processing. Word: with overlapping speech, the mannequin sometimes transcribes one speaker.

Context biasing.

Present as much as 100 phrases or phrases to information the mannequin towards appropriate spellings of names, technical phrases, or domain-specific vocabulary. Notably helpful for correct nouns or {industry} terminology that customary fashions typically miss. Context biasing is optimized for English; assist for different languages is experimental.

Phrase-level timestamps.

Generate exact begin and finish timestamps for every phrase, enabling purposes like subtitle era, audio search, and content material alignment.

Expanded language assist.

Like Realtime, this mannequin now helps 13 languages: English, Chinese language, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. Non-English efficiency considerably outpaces rivals.

Noise robustness.

Maintains transcription accuracy in difficult acoustic environments, corresponding to manufacturing unit flooring, busy name facilities, and area recordings.

Longer audio assist.

Course of recordings as much as 3 hours in a single request.

Phrase error charge (decrease is best) throughout languages within the FLEURS transcription benchmark.

Audio playground.

Take a look at Voxtral Transcribe 2 immediately in Mistral Studio. Add as much as 10 audio information, toggle diarization, select timestamp granularity, and add context bias phrases for domain-specific vocabulary. Helps .mp3, .wav, .m4a, .flac, .ogg as much as 1GB every.

Reworking voice purposes.

Voxtral powers voice workflows in numerous purposes and industries.

Assembly intelligence.

Transcribe multilingual recordings with speaker diarization that clearly attributes who mentioned what and when. At Voxtral’s worth level, annotate massive volumes of assembly content material at industry-leading value effectivity.
Voice brokers and digital assistants.

Construct conversational AI with sub-200ms transcription latency. Join Voxtral Realtime to your LLM and TTS pipeline for responsive voice interfaces that really feel pure.
Contact middle automation.

Transcribe calls in actual time, enabling AI techniques to research sentiment, recommend responses, and populate CRM fields whereas conversations are nonetheless occurring. Speaker diarization ensures clear attribution between brokers and clients.
Media and broadcast.

Generate reside multilingual subtitles with minimal latency. Context biasing handles correct nouns and technical terminology that journey up generic transcription providers.
Compliance and documentation.

Monitor and transcribe interactions for regulatory compliance, with diarization offering clear speaker attribution and timestamps enabling exact audit trails.

Each fashions assist GDPR and HIPAA-compliant deployments by way of safe on-premise or personal cloud setups.

Get began.

Voxtral Mini Transcribe V2 is on the market now by way of API at $0.003 per minute. Strive it now within the new Mistral Studio audio playground or in Le Chat.

Voxtral Realtime is on the market by way of API at $0.006 per minute and as open weights on Hugging Face.

Discover documentation on Mistral’s audio and transcription capabilities.

We’re hiring.

For those who’re enthusiastic about constructing world-class speech AI and placing frontier fashions into the palms of builders in all places, we might love to listen to from you. Apply to affix our group.

Voxtral transcribes on the pace of sound.

Highlights.

Voxtral Realtime.

Voxtral Mini Transcribe V2.

Enterprise-ready options.

Speaker diarization.

Context biasing.

Phrase-level timestamps.

Expanded language assist.

Noise robustness.

Longer audio assist.

Audio playground.

Reworking voice purposes.

Assembly intelligence.

Voice brokers and digital assistants.

Contact middle automation.

Media and broadcast.

Compliance and documentation.

Get began.

We’re hiring.

LEAVE A REPLY Cancel reply

Editor Picks

Latest News

Popular Categories