Hume AI delivers speech models on SambaCloud

Customers get realistic voice AI in real-time

hume-case-study

Challenge:

Hume specializes in building the most realistic voice AI models for developers and enterprises. These models are based on LLMs, so they understand both language and a person’s voice at the same time. Their mission is to bring empathy to AI and to align AI with human well-being. To that end, the speech-LLMs they develop are capable of understanding both the tone and meaning of the spoken word. Applications for this include audio chatbots, customer service, and more. 

They recently launched the highest quality speech-LLMs for text-to-speech (Octave) and speech-to-speech (EVI 3). Much of the quality comes from the models’ ability to understand language and to adjust its tone of voice naturally in response to the input. This enables a more natural conversation, which can improve user perception.

Most voice systems today have separate text-to-speech, speech-to-text, transcription, and other models connected together because they were better at each individual task, but with the latest advances in speech-language models this is no longer the case. Moreover, each of these steps adds latency to the process. Conversational human latency is 200 ms and anything longer than 1 second will sound less human. Hume AI and SambaNova have worked together to develop a solution that delivers the highest performance at the lowest latency possible. 

Solution:

Hume and SambaNova have worked together to deploy Hume’s speech-language models on SambaCloud, enabling the best speech-to-speech and text-to-speech models in the world to run at conversational latency without any reduction in quality. Together, Hume AI and SambaNova provide enterprises with access to text-to-speech and speech-to-speech APIs with response times on the order of 100 ms to 300 ms, marrying hyperrealistic quality with human-like conversation latency.

For many enterprises, it is critical to deploy in private environments. Hume and SambaNova are providing Hume’s text-to-speech and speech-to-speech models through private deployments to meet these needs.

15%

Most models get the right outcome the first time

100-300 ms

Response Time

Highest

Quality Speech LLMs

Challenge:

Hume specializes in building the most realistic voice AI models for developers and enterprises. These models are based on LLMs, so they understand both language and a person’s voice at the same time. Their mission is to bring empathy to AI and to align AI with human well-being. To that end, the speech-LLMs they develop are capable of understanding both the tone and meaning of the spoken word. Applications for this include audio chatbots, customer service, and more. 

They recently launched the highest quality speech-LLMs for text-to-speech (Octave) and speech-to-speech (EVI 3). Much of the quality comes from the models’ ability to understand language and to adjust its tone of voice naturally in response to the input. This enables a more natural conversation, which can improve user perception.

Most voice systems today have separate text-to-speech, speech-to-text, transcription, and other models connected together because they were better at each individual task, but with the latest advances in speech-language models this is no longer the case. Moreover, each of these steps adds latency to the process. Conversational human latency is 200 ms and anything longer than 1 second will sound less human. Hume AI and SambaNova have worked together to develop a solution that delivers the highest performance at the lowest latency possible. 

Solution:

Hume and SambaNova have worked together to deploy Hume’s speech-language models on SambaCloud, enabling the best speech-to-speech and text-to-speech models in the world to run at conversational latency without any reduction in quality. Together, Hume AI and SambaNova provide enterprises with access to text-to-speech and speech-to-speech APIs with response times on the order of 100 ms to 300 ms, marrying hyperrealistic quality with human-like conversation latency.

For many  enterprises, it is critical to deploy in private environments. Hume and SambaNova are providing Hume’s text-to-speech and speech-to-speech models through private deployments to meet these needs.

100-300 ms

Response time

Highest quality speech LLMs

“In terms of scalability, cost, latency – without sacrificing on the quality of the voice so it actually sounds human – I think this is going to be the desired voice AI solution for enterprises.”

 

— Alan Cowen, CEO Hume AI

 

Related resources

SambaNova Expands Deployment with SoftBank Corp. to Offer Fast AI Inference Across APAC

SambaNova Expands Deployment with SoftBank Corp. to Offer Fast AI Inference Across APAC

March 5, 2025
Qwen3 Is Here - Now Live on SambaNova Cloud

Qwen3 Is Here - Now Live on SambaNova Cloud

May 2, 2025
SambaNova Partners with Meta to Deliver Lightning Fast Inference on Llama 4

SambaNova Partners with Meta to Deliver Lightning Fast Inference on Llama 4

April 7, 2025
Let's Go!