DeepSeek recently dropped yet another update to their V3 model architecture: DeepSeek-V3.1-Terminus! According to Artificial Analysis, this model is now one of the best open source reasoning models and SambaNova is running this fastest in the world over 200 tokens/second with the fastest time to first token! Just like the previous DeepSeek-V3.1 update, the model supports hybrid thinking, enabling developers to switch between reasoning and non-reasoning modes.
Developers who sign up on SambaCloud and join the Developer Tier by adding a payment method to their account will earn an additional $50 in credits to use on DeepSeek-V3.1-Terminus as well as all other models on SambaCloud! That translates to over 30 million FREE tokens of DeepSeek-V3.1– more than enough to finish vibe coding a few different applications. Limited to the first 50 new users.
DeepSeek-V3.1-Terminus has improved in two key areas and across several benchmarks:
Benchmark | DeepSeek-V3.1 | DeepSeek-V3.1-Terminus |
---|---|---|
Reasoning mode w/o tool use | ||
MMLU-Pro | 84.8 | 85.0 |
GPQA-Diamond | 80.1 | 80.7 |
Humanity's Last Exam | 15.9 | 21.7 |
LiveCodeBench | 74.8 | 74.9 |
Codeforces | 2091 | 2046 |
Aider-Polyglot | 76.3 | 76.1 |
Agentic tool use | ||
BrowseComp | 30.0 | 38.5 |
BrowseComp-zh | 49.2 | 45.0 |
SimpleQA | 93.4 | 96.8 |
SWE Verified | 66.0 | 68.4 |
SWE-bench Multilingual | 54.5 | 57.8 |
Terminal-bench | 31.3 | 36.7 |
Just like in the previous V3.1 version, DeepSeek-V3.1-Terminus is only capable of function calling in non-reasoning mode and it shows to be improved compared to the previous V3.1 model. This means it performs strongly in coding and will be best suited for coding agents, like in the case of Blackbox. Moreover, DeepSeek has improved the function calling capabilities of this model in non-thinking mode, which make it even better for use with agentic frameworks like CrewAI.
Because the model is stored on chip memory instead of the host memory, we have measured the time to hotswap at an average of 650 milliseconds, which no other system in the world can achieve today. Model bundling and hotswapping with this level of efficiency allows enterprises and data centers to maximize their utilization of each and every rack for inference with our SambaStack and SambaManaged products. This level of performance is required for many applications, especially for cloud service providers that are dynamically managing AI inference workloads in real-time across many racks.