Blog

Run DeepSeek 671B on SambaNova Cloud - Get Access Today

by Vasanth Mohan

February 13, 2025

DeepSeek-R1 671B, the best open source reasoning model in the market, is now available on SambaNova Cloud running at speeds of 198 tokens/second/prompt. DeepSeek showed the world how to reduce the training costs for building reasoning models, but inference with GPUs has remained a challenge until today when SambaNova showed how a new hardware architecture with RDUs can achieve better inference performance. These speeds have been independently verified by Artificial Analysis and you can sign up for SambaNova Cloud today to try it in our playground.

Try DeepSeek-R1 671B now on SambaNova Cloud!

Being able to run the full DeepSeek-R1 671B model -- not a distilled version -- at SambaNova's blazingly fast speed is a game changer for developers. Reasoning models like R1 need to generate a lot of reasoning tokens to come up with a superior output, which makes them take longer than traditional LLMs. This makes speeding them up especially important. Andrew Ng, Founder, CEO, Landing AI

Developers who are looking to use this model via the API on the SambaNova Cloud Developer Tier can sign up today for our waitlist. We will be slowly rolling out access over the coming weeks as we rapidly scale out capacity for this model.

Join The Waitlist

aa-output-speed-r1

About DeepSeek-R1 (the real deal, not distilled)

DeepSeek-R1 caught the world by storm, offering higher reasoning capabilities at a fraction of the cost of its competitors and being completely open sourced. This groundbreaking model, built on a Mixture of Experts (MoE) architecture with 671 billion parameters, showcases superior performance in math and reasoning tasks, even outperforming OpenAI's o1 on certain benchmarks.

SambaNova is a US based company that runs the model on our RDU hardware in US data centers. Companies can also choose to work with SambaNova to deploy our hardware and the DeepSeek model on-premise in their own data centers for maximum data privacy and security. This is unlike the service run by the company DeepSeek (not the model), which runs their cloud service on GPUs, without providing any controls for data privacy.

More than 10 million users and engineering teams at Fortune 500 companies rely on Blackbox AI to transform how they write code and build products. Our partnership with SambaNova plays a critical role in accelerating our autonomous coding agent workflows. SambaNova’s chip capabilities are unmatched for serving the full R1 671B model, which provides significantly better accuracy than any of the distilled versions. We couldn’t ask for a better partner to work with to serve millions of users. Robert Rizk, Co-founder and CEO of BlackBox

Unlike the 70B distilled version of the model (also available today on the SambaNova Cloud Developer tier), DeepSeek-R1 uses reasoning to completely outclass the distilled versions in terms of accuracy. As a reasoning model, R1 uses more tokens to think before generating an answer, which allows the model to generate much more accurate and thoughtful answers. For example, it was able to reason and determine how to improve the efficiency of running itself (Reddit), which is not possible without reasoning capabilities.

100X the Global Inference Compute of DeepSeek-R1

There is no shortage of demand for R1 given its performance and cost, but given that DeepSeek-R1 is a reasoning model that generates more tokens during run time, developers unfortunately today are compute constrained to get enough access to R1 because of the inefficiencies of the GPU. GPU inefficiency is one of the main reasons why DeepSeek had to disable their own inference API service.

SambaNova RDU chips are perfectly designed to handle big Mixture of Expert models, like DeepSeek-R1, thanks to our dataflow architecture and three-tier memory design of the SN40L RDU. This design allows us to optimally deploy these types of models using just one rack to deliver large performance gains instead of the 40 racks of 320 GPUs that were used to power DeepSeek’s inference. To learn more about the RDU and our unique architectural advantage, read our blog.

Thanks to the efficiency of our RDU chips, SambaNova expects to be serving 100X the global demand for the DeepSeek-R1 model by the end of the year. This makes SambaNova RDU chips the most efficient inference platform for running reasoning models like DeepSeek-R1.

Improve Software Development with R1

Check out demos from our friends at Hugging Face and BlackBox showing the advantages of coding significantly better with R1. In CyberCoder, BlackBox is able to use R1 to significantly improve the performance of coding agents, which is one of the primary use cases for developers using the R1 Model.

AK from the Gradio team at Hugging Face has developed Anychat, which is a simple way to demo the abilities of various models with their Gradio components. Using Anychat integrated with R1 and Sambanova, he is able to build an application really quickly that recreates ChatGPT’s ad from the Super Bowl!

Show Us What You Are Building

To expedite access to the model, show us your cool use cases in the SambaNova Developer Community that would benefit from R1 just like the use cases from BlackBox and Hugging Face. We also recently launched our Developer Tier and the community is a great way to earn additional credits by participating in the community.

Our teams look forward to seeing the amazing AI applications built by the community and are excited to get R1 and other models into your hands with the fastest speeds so you can all continue to innovate and push the boundaries of what is possible.

← SambaNova Cloud Developer Tier Is Live!

Open-Source Deep Research Agents: Enterprise-Grade Speed, Security & Saving them Millions →