Hume AI delivers speech models on SambaCloud
Customers get realistic voice AI in real-time

Challenge:
Hume specializes in building the most realistic voice AI models for developers and enterprises. These models are based on LLMs, so they understand both language and a person’s voice at the same time. Their mission is to bring empathy to AI and to align AI with human well-being. To that end, the speech-LLMs they develop are capable of understanding both the tone and meaning of the spoken word. Applications for this include audio chatbots, customer service, and more.
They recently launched the highest quality speech-LLMs for text-to-speech (Octave) and speech-to-speech (EVI 3). Much of the quality comes from the models’ ability to understand language and to adjust its tone of voice naturally in response to the input. This enables a more natural conversation, which can improve user perception.
Most voice systems today have separate text-to-speech, speech-to-text, transcription, and other models connected together because they were better at each individual task, but with the latest advances in speech-language models this is no longer the case. Moreover, each of these steps adds latency to the process. Conversational human latency is 200 ms and anything longer than 1 second will sound less human. Hume AI and SambaNova have worked together to develop a solution that delivers the highest performance at the lowest latency possible.
Solution:
Hume and SambaNova have worked together to deploy Hume’s speech-language models on SambaCloud, enabling the best speech-to-speech and text-to-speech models in the world to run at conversational latency without any reduction in quality. Together, Hume AI and SambaNova provide enterprises with access to text-to-speech and speech-to-speech APIs with response times on the order of 100 ms to 300 ms, marrying hyperrealistic quality with human-like conversation latency.
For many enterprises, it is critical to deploy in private environments. Hume and SambaNova are providing Hume’s text-to-speech and speech-to-speech models through private deployments to meet these needs.
Most models get the right outcome the first time
Response Time
Quality Speech LLMs
Challenge:
Hume specializes in building the most realistic voice AI models for developers and enterprises. These models are based on LLMs, so they understand both language and a person’s voice at the same time. Their mission is to bring empathy to AI and to align AI with human well-being. To that end, the speech-LLMs they develop are capable of understanding both the tone and meaning of the spoken word. Applications for this include audio chatbots, customer service, and more.
They recently launched the highest quality speech-LLMs for text-to-speech (Octave) and speech-to-speech (EVI 3). Much of the quality comes from the models’ ability to understand language and to adjust its tone of voice naturally in response to the input. This enables a more natural conversation, which can improve user perception.
Most voice systems today have separate text-to-speech, speech-to-text, transcription, and other models connected together because they were better at each individual task, but with the latest advances in speech-language models this is no longer the case. Moreover, each of these steps adds latency to the process. Conversational human latency is 200 ms and anything longer than 1 second will sound less human. Hume AI and SambaNova have worked together to develop a solution that delivers the highest performance at the lowest latency possible.
Solution:
Hume and SambaNova have worked together to deploy Hume’s speech-language models on SambaCloud, enabling the best speech-to-speech and text-to-speech models in the world to run at conversational latency without any reduction in quality. Together, Hume AI and SambaNova provide enterprises with access to text-to-speech and speech-to-speech APIs with response times on the order of 100 ms to 300 ms, marrying hyperrealistic quality with human-like conversation latency.
For many enterprises, it is critical to deploy in private environments. Hume and SambaNova are providing Hume’s text-to-speech and speech-to-speech models through private deployments to meet these needs.
Response time
Highest quality speech LLMs
“In terms of scalability, cost, latency – without sacrificing on the quality of the voice so it actually sounds human – I think this is going to be the desired voice AI solution for enterprises.”
— Alan Cowen, CEO Hume AI
Related resources
