Blog

Start Building with Lightning-Fast GPT-OSS 120B on SambaCloud

Written by SambaNova | September 8, 2025

OpenAI’s GPT-OSS 120B is now generally available for all developers on SambaCloud. We are running the model at the full 131K context length and at speeds over 700 tokens/seconds/user powered by the SambaNova RDU. Moreover, SambaNova hardware runs this model more efficiently than any other provider, allowing enterprises and data centers to maximize their revenue potential running GPT-OSS with SambaStack dedicated instances hosted in our cloud or on-premises.

 

 

Why enterprises and governments love OpenAI’s GPT-OSS 120B

OpenAI’s GPT-OSS 120B is a 120 billion parameter mixture-of-experts (MoE) model, designed for reasoning and agentic tasks. It delivers performance on par with OpenAI's o4-mini on core reasoning benchmarks, excelling at chain-of-thought tasks like coding, mathematical reasoning, and health-related queries with high accuracy and efficiency. Prior to release, OpenAI confirmed the safety and security of this model to ensure enterprises and governments deploying this model would not face any major risks. 

As a small MoE model, it can be run highly performantly and affordably. According to Artificial Analysis, this model provides the best price-to-intelligence ratio of any model out there. On SambaCloud, developers can start using this model at $0.22 per million input tokens and $0.59 per million output tokens.

And as a open-source U.S. model licensed under Apache 2.0, enterprises can use this model however they like. Whether they want to deploy the model directly on-premises with RAG or fine-tune it even further with their data, enterprises have full flexibility. 

In summary, GPT-OSS is the ideal model for enterprises and governments looking for:
  • A trusted open-source U.S. model with Apache 2.0 license for complete ownership
  • Exceptional value with the best price-to-intelligence offering across all models available to date, including against closed-source alternatives
  • Flexibility to deploy wherever they need and to fine-tune the model with their data

 

Advanced capabilities for enterprise workloads

GPT-OSS 120B delivers two advanced features essential for enterprise deployments:

  1. Reasoning Effort Control: Developers can optimize performance and cost by specifying reasoning intensity (low/medium/high) per query. This parameter dynamically adjusts computational resources - from rapid responses for simple tasks to maximum accuracy for complex chain-of-thought prompts. Our API defaults to the balanced medium intensity and can be easily changed.

  2. Chain of Thought Tool Calling: Unlike other open-source models, GPT-OSS supports real-time tool invocation during reasoning cycles. Developers can inject tool responses directly into the reasoning process, significantly enhancing output accuracy for agentic workflows and RAG implementations.

Developer quick start

SambaCloud is a powerful platform that enables developers to integrate easily the best open-source models with the fastest inference speeds. Get started today and experience the benefits of fast inference speeds, maximum accuracy, and an enhanced developer experience, in three easy steps!

  1. Head over to SambaCloud and create your own account.
  2. Get an API Key.
  3. Make your first API call with our open AI-compatible API.