We’re excited to announce that Qwen3, the latest generation of large language models from the Qwen team, is now available on SambaNova Cloud — starting with the Qwen3-32B dense model, delivering blazing-fast inference at 281 tokens/second as measured by Artificial Analysis.
Qwen3 introduces a flexible new paradigm for how language models think, reason, and respond. Whether you're building intelligent agents, scalable coding tools, or multi-turn conversational systems, Qwen3 gives you performance control on your terms — all in a single model family.
Hybrid Thinking Architecture: Precision Control
Qwen3’s dual-mode reasoning engine revolutionizes how developers balance computational efficiency with task complexity, offering unprecedented control over AI behavior. Here’s a deeper look at its mechanics and advantages:
/think Mode: Deliberative Reasoning Activated via /think tags or API parameters, this mode engages the model’s full analytical capabilities:
- Step-by-step problem-solving: Breaks down tasks like mathematical proofs or code debugging into intermediate reasoning steps
- Latent Chain-of-Thought: Internally generates reasoning paths before delivering final answers, improving accuracy on benchmarks
- Complex task specialization: Optimized for multi-hop analysis, scientific reasoning, and agentic decision-making
/nothink Mode: Instant Response Generation Triggered via /no_think tags, this mode prioritizes speed:
- Sub-second latency: Delivers answers in 300ms-range for real-time applications
- Streamlined processing: Bypasses intermediate reasoning steps for queries like autocomplete or FAQ responses
- Cost-optimized inference: Reduces active parameter usage through architectural optimizations
This architecture enables granular compute budgeting - only activate deep reasoning when necessary, reducing costs without sacrificing capability.
Multilingual Mastery & Translation Excellence
Qwen3 sets a new standard for global AI applications with native support for 119 languages and dialects, spanning Indo-European, Sino-Tibetan, Afro-Asiatic, and Dravidian language families. Its enhanced multilingual capabilities include:
- Advanced translation quality with context-aware localization
- Cross-lingual instruction following for seamless international deployments (e.g., "Explain quantum computing 量子コンピュータとは")
- Low-resource language optimization for underrepresented dialects
This makes Qwen3-32B ideal for building multilingual chatbots, localization tools, and real-time translation pipelines - all while maintaining high reasoning performance across languages
The Fastest Inference. Most Competitive Pricing.
We're launching with Qwen3-32B, a dense model designed for high performance across a wide range of tasks. It’s built to deliver strong results in. Now running on SambaNova Cloud at unmatched speed:
- 280 tokens/sec - Full-model, high-speed inference
- $0.40 / million input tokens
- $0.80 / million output tokens
- No distillation. No quantization. Just the full model - fast and affordable.
🚀 Try Qwen3 Today on SambaNova Cloud
This is just the beginning for Qwen3. If you're building tools, deploying agents, or scaling next-gen apps, Qwen3-32B gives you the control to think smart and move fast.
👉 Start Using Qwen3-32B on SambaNova Cloud.