AiNews.com
Posts
ElevenLabs Launches Scribe, a Cutting-Edge Speech-to-Text Model

ElevenLabs Launches Scribe, a Cutting-Edge Speech-to-Text Model

Alicia Shapiro
February 28, 2025 • Estimated Reading Time: 4 minutes

A futuristic AI-powered speech-to-text interface on a high-tech screen, transcribing multilingual audio in real time. The display shows waveforms of spoken words converting into text with precise timestamps and speaker identification. The sleek interface, illuminated with glowing blue elements, represents cutting-edge automatic speech recognition (ASR) technology. A microphone sits on the desk, suggesting a professional setting, such as a media studio or AI research lab.

Image Source: ChatGPT-4o

ElevenLabs Launches Scribe, a Cutting-Edge Speech-to-Text Model

ElevenLabs has launched Scribe, a cutting-edge automatic speech recognition (ASR) model that delivers the world’s most accurate transcriptions. Designed for real-world audio challenges, Scribe supports 99 languages and offers word-level timestamps, speaker diarization, and audio-event tagging for structured and seamless integration.

Industry-Leading Accuracy

In benchmark tests across FLEURS and Common Voice datasets, Scribe consistently outperforms top ASR models, including Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova-3. It delivers:

98.7% accuracy in Italian
96.7% accuracy in English
Improved performance in 97 other languages

Scribe also makes ASR more accessible by significantly reducing errors in traditionally underserved languages such as Serbian, Cantonese, and Malayalam, where competing models struggle with 40%+ word error rates.

A bar chart illustrating the performance of various AI automatic speech recognition (ASR) models in the FLEURS Benchmark, measuring word error rates (WER) across multiple languages. The models include Scribe V1 (white bars), Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova 2. Scribe V1 demonstrates superior accuracy, achieving 98.7% in Italian and 96.7% in English, while also improving transcription accuracy in 97 other languages. The title "Scribe V1 FLEURS Benchmark" is prominently displayed at the top against a black background.

Scribe Excels in FLEURS Benchmark, Outperforming Top ASR Models. Image Source: ElevenLabs

A bar chart comparing word error rates (WER) for different AI speech recognition models in the Common Voice Benchmark. The models tested include Scribe V1 (white bars), Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova 2. The chart displays WER percentages across multiple languages, showing Scribe V1 achieving lower error rates than competitors, particularly in traditionally underserved languages like Serbian, Cantonese, and Malayalam. The title "Scribe V1 Common Voice Benchmark" is displayed at the top, with a black background for contrast.

Scribe Leads in Common Voice Benchmark for Speech Recognition Accuracy. Image Source: ElevenLabs

How to Access Scribe

Scribe is available through:

API Integration – Developers can access structured JSON transcripts with speaker diarization, timestamps, and non-speech event markers (e.g., laughter).
ElevenLabs Dashboard – Users can upload audio or video files to generate formatted transcripts instantly.
Real-Time Version Coming Soon – A low-latency version for live transcription is in development.

Looking Ahead

With unmatched accuracy and multilingual capabilities, Scribe is poised to set a new standard in speech-to-text AI. Whether for meeting transcriptions, movie subtitles, or real-time speech applications, this model promises faster, more precise, and universally accessible ASR technology.

🔗 Get Started with Scribe: API Documentation | Try in the ElevenLabs Dashboard

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.