Qwen3 Is Here - Now Live on SambaNova Cloud

Written by SambaNova | May 2, 2025

We’re excited to announce that Qwen3, the latest generation of large language models from the Qwen team, is now available on SambaNova Cloud — starting with the Qwen3-32B dense model, delivering blazing-fast inference at 281 tokens/second as measured by Artificial Analysis.

Qwen3 introduces a flexible new paradigm for how language models think, reason, and respond. Whether you're building intelligent agents, scalable coding tools, or multi-turn conversational systems, Qwen3 gives you performance control on your terms — all in a single model family.

Hybrid Thinking Architecture: Precision Control

Qwen3’s dual-mode reasoning engine revolutionizes how developers balance computational efficiency with task complexity, offering unprecedented control over AI behavior. Here’s a deeper look at its mechanics and advantages:

/think Mode: Deliberative Reasoning Activated via /think tags or API parameters, this mode engages the model’s full analytical capabilities:

Step-by-step problem-solving: Breaks down tasks like mathematical proofs or code debugging into intermediate reasoning steps
Latent Chain-of-Thought: Internally generates reasoning paths before delivering final answers, improving accuracy on benchmarks
Complex task specialization: Optimized for multi-hop analysis, scientific reasoning, and agentic decision-making

/nothink Mode: Instant Response Generation Triggered via /no_think tags, this mode prioritizes speed:

Sub-second latency: Delivers answers in 300ms-range for real-time applications
Streamlined processing: Bypasses intermediate reasoning steps for queries like autocomplete or FAQ responses
Cost-optimized inference: Reduces active parameter usage through architectural optimizations

This architecture enables granular compute budgeting - only activate deep reasoning when necessary, reducing costs without sacrificing capability.

Multilingual Mastery & Translation Excellence

Qwen3 sets a new standard for global AI applications with native support for 119 languages and dialects, spanning Indo-European, Sino-Tibetan, Afro-Asiatic, and Dravidian language families. Its enhanced multilingual capabilities include:

Advanced translation quality with context-aware localization
Cross-lingual instruction following for seamless international deployments (e.g., "Explain quantum computing 量子コンピュータとは")
Low-resource language optimization for underrepresented dialects

This makes Qwen3-32B ideal for building multilingual chatbots, localization tools, and real-time translation pipelines - all while maintaining high reasoning performance across languages

The Fastest Inference. Most Competitive Pricing.

We're launching with Qwen3-32B, a dense model designed for high performance across a wide range of tasks. It’s built to deliver strong results in. Now running on SambaNova Cloud at unmatched speed:

280 tokens/sec - Full-model, high-speed inference
$0.40 / million input tokens
$0.80 / million output tokens
No distillation. No quantization. Just the full model - fast and affordable.

🚀 Try Qwen3 Today on SambaNova Cloud

This is just the beginning for Qwen3. If you're building tools, deploying agents, or scaling next-gen apps, Qwen3-32B gives you the control to think smart and move fast.

👉 Start Using Qwen3-32B on SambaNova Cloud.

View full post