Start Building with Lightning-Fast GPT-OSS 120B on SambaCloud
OpenAI’s GPT-OSS 120B is now generally available for all developers on SambaCloud. We are running the model at the full 131K context length and at speeds over 700 tokens/seconds/user powered by the SambaNova RDU. Moreover, SambaNova hardware runs this model more efficiently than any other provider, allowing enterprises and data centers to maximize their revenue potential running GPT-OSS with SambaStack dedicated instances hosted in our cloud or on-premises.
Why enterprises and governments love OpenAI’s GPT-OSS 120B
OpenAI’s GPT-OSS 120B is a 120 billion parameter mixture-of-experts (MoE) model, designed for reasoning and agentic tasks. It delivers performance on par with OpenAI's o4-mini on core reasoning benchmarks, excelling at chain-of-thought tasks like coding, mathematical reasoning, and health-related queries with high accuracy and efficiency. Prior to release, OpenAI confirmed the safety and security of this model to ensure enterprises and governments deploying this model would not face any major risks.
As a small MoE model, it can be run highly performantly and affordably. According to Artificial Analysis, this model provides the best price-to-intelligence ratio of any model out there. On SambaCloud, developers can start using this model at $0.22 per million input tokens and $0.59 per million output tokens.
And as a open-source U.S. model licensed under Apache 2.0, enterprises can use this model however they like. Whether they want to deploy the model directly on-premises with RAG or fine-tune it even further with their data, enterprises have full flexibility.
In summary, GPT-OSS is the ideal model for enterprises and governments looking for:- A trusted open-source U.S. model with Apache 2.0 license for complete ownership
- Exceptional value with the best price-to-intelligence offering across all models available to date, including against closed-source alternatives
- Flexibility to deploy wherever they need and to fine-tune the model with their data
RDUs deliver the best performance for GPT-OSS
SambaNova’s purpose-built RDU hardware is optimally designed for AI models like the GPT-OSS 120B model, delivering unmatched speed and scalability for enterprise deployments. Our architecture ensures consistently high performance, enabling you to serve more users faster and at a lower operational cost than any GPU-based alternative.
This translates directly into higher throughput, greater ROI, and a superior experience for your end-users – all delivered from a single SambaRack. These racks use just an average of 10 kW of power and are air-cooled making them easy to integrate into existing data centers.

Advanced capabilities for enterprise workloads
GPT-OSS 120B delivers two advanced features essential for enterprise deployments:
- Reasoning Effort Control: Developers can optimize performance and cost by specifying reasoning intensity (low/medium/high) per query. This parameter dynamically adjusts computational resources - from rapid responses for simple tasks to maximum accuracy for complex chain-of-thought prompts. Our API defaults to the balanced medium intensity and can be easily changed.
- Chain of Thought Tool Calling: Unlike other open-source models, GPT-OSS supports real-time tool invocation during reasoning cycles. Developers can inject tool responses directly into the reasoning process, significantly enhancing output accuracy for agentic workflows and RAG implementations.
Developer quick start
SambaCloud is a powerful platform that enables developers to integrate easily the best open-source models with the fastest inference speeds. Get started today and experience the benefits of fast inference speeds, maximum accuracy, and an enhanced developer experience, in three easy steps!
- Head over to SambaCloud and create your own account.
- Get an API Key.
- Make your first API call with our open AI-compatible API.
from openai import OpenAI
client = OpenAI(
api_key="<YOUR API KEY>",
base_url="https://api.sambanova.ai/v1",
)
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Hello"}],
reasoning_effort: ”high”,
temperature=0.1,
top_p=0.1
)
print(response.choices[0].message.content)