OpenAI’s GPT-OSS 120B is now generally available for all developers on SambaCloud. We are running the model at the full 131K context length and at speeds over 700 tokens/seconds/user powered by the SambaNova RDU. Moreover, SambaNova hardware runs this model more efficiently than any other provider, allowing enterprises and data centers to maximize their revenue potential running GPT-OSS with SambaStack dedicated instances hosted in our cloud or on-premises.
OpenAI’s GPT-OSS 120B is a 120 billion parameter mixture-of-experts (MoE) model, designed for reasoning and agentic tasks. It delivers performance on par with OpenAI's o4-mini on core reasoning benchmarks, excelling at chain-of-thought tasks like coding, mathematical reasoning, and health-related queries with high accuracy and efficiency. Prior to release, OpenAI confirmed the safety and security of this model to ensure enterprises and governments deploying this model would not face any major risks.
As a small MoE model, it can be run highly performantly and affordably. According to Artificial Analysis, this model provides the best price-to-intelligence ratio of any model out there. On SambaCloud, developers can start using this model at $0.22 per million input tokens and $0.59 per million output tokens.
And as a open-source U.S. model licensed under Apache 2.0, enterprises can use this model however they like. Whether they want to deploy the model directly on-premises with RAG or fine-tune it even further with their data, enterprises have full flexibility.
In summary, GPT-OSS is the ideal model for enterprises and governments looking for:
GPT-OSS 120B delivers two advanced features essential for enterprise deployments:
SambaCloud is a powerful platform that enables developers to integrate easily the best open-source models with the fastest inference speeds. Get started today and experience the benefits of fast inference speeds, maximum accuracy, and an enhanced developer experience, in three easy steps!