We are thrilled to announce that OpenAI's Whisper Large-V3 is now live in preview on SambaNova Cloud. This means that developers can now harness the power of one of the best open source speech recognition models, whether it's building real-time voicebots, transcribing podcasts, or making their applications more accessible through speech. With SambaNova Cloud, you can expect lightning-fast inference speeds, outperforming other platforms and enabling you to build applications that are not only more efficient but also more responsive. As measured by Artificial Analysis, our implementation of Whisper Large-V3 boasts a remarkable Speed Factor of 245X, which is powered by our RDU chips.
Whisper-Large V3 is OpenAI’s advanced automatic speech recognition (ASR) and speech translation model, designed to deliver high accuracy across a wide range of languages and audio conditions. It builds upon the architecture of previous Whisper versions with enhancements such as an increased spectrogram input size (128 Mel frequency bins). Trained on a massive dataset of 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio, Whisper excels in zero-shot transcription and translation tasks, generalizing well to diverse real-world scenarios.
In terms of accuracy, Whisper achieves a significant improvement over its predecessor, reducing transcription errors by 10% to 20%. This is reflected in its word error rate (WER), which averages around 10.3% across multiple benchmarks, demonstrating robust performance in multilingual and noisy environments. For example, on datasets like Common Voice and LibriSpeech, Whisper Large-V3 maintains competitive WERs, typically ranging from about 2% on clean speech to around 12% on more challenging audio, making it a reliable choice for both research and production use.
Beyond accuracy, Whisper also benefits from optimized deployment on platforms like SambaNova Cloud, which leverage specialized AI hardware to deliver lightning-fast inference speeds with minimal latency. This combination of low WER and high throughput enables developers to build real-time, responsive voice applications such as live transcription, multilingual translation, and voice-controlled devices, ensuring seamless user experiences even in demanding environments.
Speed and low latency are critical for delivering seamless, responsive experiences in voice-powered applications. With Whisper, these performance gains unlock a new level of capability across real-world use cases:
Multilingual and Global Applications: Whisper V3 Large’s multilingual capabilities, combined with fast inference, enable real-time translation and transcription for international meetings, cross-border collaboration, and global customer engagement.
Curious about the capabilities of Whisper on SambaNova Cloud? We built this demo using LiveKit to provide a real-time voice intelligence demonstration. Simply spin up a demo and start chatting and experience the low latency for yourself.
Ready to unlock the full potential of Whisper-V3 -Large? Here's what you can expect when you sign up for SambaNova Cloud:
Whether you're working on a weekend project or scaling to millions of users, SambaNova Cloud + Whisper provides the perfect combination of power and speed to drive your success.
SambaNova Cloud is available as a service for developers to easily integrate the best open-source models with the fastest inference speeds. These speeds are powered by our state-of-the-art AI Chip, the SN40L. Whether you are building AI agents or chatbots, fast inference speeds are a MUST for your end users to enable seamless real-time experiences. Get started in minutes with the latest and best open-source models, such as OpenAI Whisper, Llama 4 Maverick and DeepSeek R1 671B, on SambaNova Cloud for free today.
Ready to revolutionize the world of voice? Sign up for SambaNova Cloud and start leveraging the power of Whisper-V3 -Large today. We can't wait to see what you build!