We are thrilled to announce that OpenAI's Whisper Large-V3 is now live in preview on SambaNova Cloud. This means that developers can now harness the power of one of the best open source speech recognition models, whether it's building real-time voicebots, transcribing podcasts, or making their applications more accessible through speech. With SambaNova Cloud, you can expect lightning-fast inference speeds, outperforming other platforms and enabling you to build applications that are not only more efficient but also more responsive. As measured by Artificial Analysis, our implementation of Whisper Large-V3 boasts a remarkable Speed Factor of 245X, which is powered by our RDU chips.
What is Whisper?
Whisper-Large V3 is OpenAI’s advanced automatic speech recognition (ASR) and speech translation model, designed to deliver high accuracy across a wide range of languages and audio conditions. It builds upon the architecture of previous Whisper versions with enhancements such as an increased spectrogram input size (128 Mel frequency bins). Trained on a massive dataset of 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio, Whisper excels in zero-shot transcription and translation tasks, generalizing well to diverse real-world scenarios.
In terms of accuracy, Whisper achieves a significant improvement over its predecessor, reducing transcription errors by 10% to 20%. This is reflected in its word error rate (WER), which averages around 10.3% across multiple benchmarks, demonstrating robust performance in multilingual and noisy environments. For example, on datasets like Common Voice and LibriSpeech, Whisper Large-V3 maintains competitive WERs, typically ranging from about 2% on clean speech to around 12% on more challenging audio, making it a reliable choice for both research and production use.
Beyond accuracy, Whisper also benefits from optimized deployment on platforms like SambaNova Cloud, which leverage specialized AI hardware to deliver lightning-fast inference speeds with minimal latency. This combination of low WER and high throughput enables developers to build real-time, responsive voice applications such as live transcription, multilingual translation, and voice-controlled devices, ensuring seamless user experiences even in demanding environments.
The Importance of Speed and Latency in Voice-Powered Applications
Speed and low latency are critical for delivering seamless, responsive experiences in voice-powered applications. With Whisper, these performance gains unlock a new level of capability across real-world use cases:
- Real-Time Customer Support: Instant transcription enables voicebots and call center agents to respond to customer queries without delay, improving satisfaction and enabling real-time guidance during ongoing calls.
- Voice-Controlled Devices: Smart home systems, automotive assistants, and IoT devices rely on rapid speech recognition to execute commands instantly, creating a natural, conversational user experience.
- Media and Content Creation: Journalists, podcasters, and video producers benefit from fast, accurate transcription of interviews, podcasts, and lectures, streamlining editing, analysis, and content repurposing.
Multilingual and Global Applications: Whisper V3 Large’s multilingual capabilities, combined with fast inference, enable real-time translation and transcription for international meetings, cross-border collaboration, and global customer engagement.
Experience the Power of Whisper in Action: Demos with LiveKit
Curious about the capabilities of Whisper on SambaNova Cloud? We built this demo using LiveKit to provide a real-time voice intelligence demonstration. Simply spin up a demo and start chatting and experience the low latency for yourself.
Get Started with Whisper on SambaNova Cloud Today
Ready to unlock the full potential of Whisper-V3 -Large? Here's what you can expect when you sign up for SambaNova Cloud:
- OpenAI-compatible endpoints for Whisper-Large-v3, ensuring seamless integration with your existing infrastructure
- Real-time transcription and translation capabilities in over 50 languages, empowering you to reach a global audience
- Support for uploading audio files (up to 25MB) in various formats, providing flexibility and convenience
- Customizable prompts and flexible output options (JSON or plain text), allowing you to tailor the output to your specific needs
- Seamless integration with your favorite tools and frameworks like Livekit, streamlining your development workflow
Whether you're working on a weekend project or scaling to millions of users, SambaNova Cloud + Whisper provides the perfect combination of power and speed to drive your success.
About SambaNova Cloud
SambaNova Cloud is available as a service for developers to easily integrate the best open-source models with the fastest inference speeds. These speeds are powered by our state-of-the-art AI Chip, the SN40L. Whether you are building AI agents or chatbots, fast inference speeds are a MUST for your end users to enable seamless real-time experiences. Get started in minutes with the latest and best open-source models, such as OpenAI Whisper, Llama 4 Maverick and DeepSeek R1 671B, on SambaNova Cloud for free today.
Join the Future of Voice: Sign Up for SambaNova Cloud Today
Ready to revolutionize the world of voice? Sign up for SambaNova Cloud and start leveraging the power of Whisper-V3 -Large today. We can't wait to see what you build!