Available today, Hugging Face developers can take advantage of the lightning fast inference speeds made possible with SambaNova RDU Hardware through their newly launched inference API. All SambaNova Cloud models will be available for developers to consume, which includes the whole Llama 3 series including Llama Guard as well as the latest Qwen Models, such as their 2.5 series models, QwQ reasoning model, and audio model.
For the millions of Hugging Face developers using their Inference API today, this integration makes it very easy to switch to a faster provider on these great open source models. Developers can login to SambaNova Cloud today and get their API Key, which they add to the new Inference Providers page as part of their billing. Alternatively, you can experience SambaNova’s fast inference speeds directly through Hugging Face’s Inference Client directly.
Once integrated, developers will be able to continue building in Gradio or with the Hugging Face inference Client as they have already been doing with minimal change to their code base. The only change that will be required is to add sambanova as the provider in the Inference Client. Read more about the integration in their documentation.
Fast and accurate AI inference is crucial in a wide range of applications, especially as the demand for more tokens at inference increases with test-time compute and Agentic AI. Open source models make it possible for SambaNova to optimize these models against the RDU and provide developers with fast and accurate inference at 10x the speed.
“We are excited to partner with SambaNova and bring faster inference on Open Source models directly to our developer community” - Julien Chaumond, CTO Hugging Face
To try it out, head over to any of the model cards on Hugging Face that SambaNova Cloud already supports. For example, you can try Llama 3.3 70B. Choose SambaNova as a provider and experience the fast inference directly through Hugging Face.