Applications
ElevenLabs’ new model transcribes speech in 90 languages faster than you can blink
TOKYO: ElevenLabs has just made real-time transcription scary good—and scary fast. The AI audio firm’s latest model, Scribe v2 Realtime, can transcribe speech in over 90 languages in less than 150 milliseconds, a speed that puts it ahead of most humans’ reaction times. Released on November 11th, it’s designed for the sort of applications where delay means disaster: voice assistants, live customer calls, medical dictation and streaming captions.
The model doesn’t just transcribe quickly—it anticipates. Through what ElevenLabs calls “negative latency prediction”, Scribe v2 Realtime can guess the next word and punctuation before a speaker finishes, keeping pace with natural conversation. It achieves 93.5 per cent accuracy on the Fleurs benchmark across 30 European and Asian languages, a figure that puts it at the top of the multilingual transcription heap.
India is central to ElevenLabs’ pitch. The model supports 11 Indian languages, including Hindi, Tamil, Malayalam, Telugu, Gujarati, Kannada, Odia, Bengali, Marathi, Punjabi and Sindhi. More importantly, it offers India data residency options, letting firms keep their audio data within national borders—a critical feature given India’s evolving data-protection rules.
The company, founded in 2022, already counts Meesho, Cars24, Apna, 99acres, TVS Motors, Mahindra and PocketFM among its Indian clients. Now it’s hoping Scribe v2 Realtime will help them—and others—build voice agents that sound and respond like humans, not automated phone menus.
For developers, the model offers streaming support, voice activity detection, custom vocabulary for industry-specific jargon, and speaker diarisation. There’s also a zero-retention mode for sensitive work, meaning audio never touches ElevenLabs’ servers. The model integrates with ElevenLabs Agents, the firm’s conversational AI platform, and is available now through its API.
If voice is the next interface, ElevenLabs has just made it a lot more fluent. And a lot faster