We’re excited to announce that Qwen3, the latest generation of large language models from the Qwen team, is now available on SambaNova Cloud — starting with the Qwen3-32B dense model, delivering blazing-fast inference at 281 tokens/second as measured by Artificial Analysis.
Qwen3 introduces a flexible new paradigm for how language models think, reason, and respond. Whether you're building intelligent agents, scalable coding tools, or multi-turn conversational systems, Qwen3 gives you performance control on your terms — all in a single model family.
Qwen3’s dual-mode reasoning engine revolutionizes how developers balance computational efficiency with task complexity, offering unprecedented control over AI behavior. Here’s a deeper look at its mechanics and advantages:
/think Mode: Deliberative Reasoning Activated via /think tags or API parameters, this mode engages the model’s full analytical capabilities:
/nothink Mode: Instant Response Generation Triggered via /no_think tags, this mode prioritizes speed:
This architecture enables granular compute budgeting - only activate deep reasoning when necessary, reducing costs without sacrificing capability.
Qwen3 sets a new standard for global AI applications with native support for 119 languages and dialects, spanning Indo-European, Sino-Tibetan, Afro-Asiatic, and Dravidian language families. Its enhanced multilingual capabilities include:
This makes Qwen3-32B ideal for building multilingual chatbots, localization tools, and real-time translation pipelines - all while maintaining high reasoning performance across languages
We're launching with Qwen3-32B, a dense model designed for high performance across a wide range of tasks. It’s built to deliver strong results in. Now running on SambaNova Cloud at unmatched speed:
This is just the beginning for Qwen3. If you're building tools, deploying agents, or scaling next-gen apps, Qwen3-32B gives you the control to think smart and move fast.
👉 Start Using Qwen3-32B on SambaNova Cloud.