Blog

Start Building with Lightning-Fast GPT-OSS 120B on SambaCloud

by SambaNova

September 8, 2025

OpenAI’s GPT-OSS 120B is now available to all developers on the powerful SambaCloud. GPT-OSS 120B is a 120 billion parameter mixture-of-experts (MoE) model, designed for reasoning and agentic tasks. It delivers performance on par with OpenAI's o4-mini on core reasoning benchmarks, excelling at chain-of-thought tasks like coding, mathematical reasoning, and health-related queries with high accuracy and efficiency.

SambaCloud is running the model at the full 131K context length and at speeds over 700 tokens/seconds/user powered by the SambaNova RDU. Moreover, SambaNova hardware runs this model more efficiently than any other provider, allowing enterprises and data centers to maximize their revenue potential running GPT-OSS with SambaStack dedicated instances hosted in our cloud or on-premises.

Start Building with Relentless Intelligence

What makes GPT-OSS 120B ideal for enterprise and government applications

Governments and enterprises are excited about GPT-OSS 120B because, as a small MoE model, it can be run highly performantly and affordably. According to Artificial Analysis, this model provides the best price-to-intelligence ratio of any model out there. On SambaCloud, developers can start using this model at $0.22 per million input tokens and $0.59 per million output tokens. Additionally, prior to release, OpenAI confirmed the safety and security of this model to ensure enterprises and governments deploying this model would not face any major risks.

2025 09 04 — SN-BLOG OpenAI Blog Visuals v1.0

intelligence-vs-price

And as a open-source U.S. model licensed under Apache 2.0, enterprises can use this model however they like. Whether they want to deploy the model directly on-premises with RAG or fine-tune it even further with their data, enterprises have full flexibility.

In summary, GPT-OSS is the ideal model for enterprises and governments looking for:

A trusted open-source U.S. model with Apache 2.0 license for complete ownership
Exceptional value with the best price-to-intelligence offering across all models available to date, including against closed-source alternatives
Flexibility to deploy wherever they need and to fine-tune the model with their data

Advanced capabilities for enterprise workloads

GPT-OSS 120B delivers two advanced features essential for enterprise deployments:

Reasoning Effort Control: Developers can optimize performance and cost by specifying reasoning intensity (low/medium/high) per query. This parameter dynamically adjusts computational resources - from rapid responses for simple tasks to maximum accuracy for complex chain-of-thought prompts. Our API defaults to the balanced medium intensity and can be easily changed.
Chain of Thought Tool Calling: Unlike other open-source models, GPT-OSS supports real-time tool invocation during reasoning cycles. Developers can inject tool responses directly into the reasoning process, significantly enhancing output accuracy for agentic workflows and RAG implementations.

RDUs deliver the best performance for GPT-OSS

SambaNova’s purpose-built RDU hardware is optimally designed for AI models like the GPT-OSS 120B model, delivering unmatched speed and scalability for enterprise deployments. Our architecture ensures consistently high performance, enabling you to serve more users faster and at a lower operational cost than any GPU-based alternative.

This translates directly into higher throughput, greater ROI, and a superior experience for your end-users – all delivered from a single SambaRack. These racks use just an average of 10 kW of power and are air-cooled making them easy to integrate into existing data centers.

sambanova_Hardware_Chip2.1_1000x1000_72dpi_NoBG

Developer quick start

SambaCloud is a powerful platform that enables developers to integrate easily the best open-source models with the fastest inference speeds. Get started today and experience the benefits of fast inference speeds, maximum accuracy, and an enhanced developer experience, in three easy steps!

Head over to SambaCloud and create your own account.
Get an API Key.
Make your first API call with our open AI-compatible API.

from openai import OpenAI

client = OpenAI(
    api_key="<YOUR API KEY>",
    base_url="https://api.sambanova.ai/v1",
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Hello"}],
    reasoning_effort: ”high”,
    temperature=0.1,
    top_p=0.1
)

print(response.choices[0].message.content)

← SambaNova vs. Cerebras: The Ultimate AI Inference Comparison

SambaNova vs. Groq: The AI Inference Face-Off →