BACK TO RESOURCES

Blog

Qwen3 Is Here - Now Live on SambaNova Cloud

by SambaNova

May 2, 2025

We’re excited to announce that Qwen3, the latest generation of large language models from the Qwen team, is now available on SambaNova Cloud — starting with the Qwen3-32B dense model, delivering blazing-fast inference at 281 tokens/second as measured by Artificial Analysis.

Qwen3 introduces a flexible new paradigm for how language models think, reason, and respond. Whether you're building intelligent agents, scalable coding tools, or multi-turn conversational systems, Qwen3 gives you performance control on your terms — all in a single model family.

2025_04_29_SN+Qwen3_1600x900_144dpi_v3.0

Hybrid Thinking Architecture: Precision Control

Qwen3’s dual-mode reasoning engine revolutionizes how developers balance computational efficiency with task complexity, offering unprecedented control over AI behavior. Here’s a deeper look at its mechanics and advantages:

/think Mode: Deliberative Reasoning Activated via /think tags or API parameters, this mode engages the model’s full analytical capabilities:

Step-by-step problem-solving: Breaks down tasks like mathematical proofs or code debugging into intermediate reasoning steps
Latent Chain-of-Thought: Internally generates reasoning paths before delivering final answers, improving accuracy on benchmarks
Complex task specialization: Optimized for multi-hop analysis, scientific reasoning, and agentic decision-making

/nothink Mode: Instant Response Generation Triggered via /no_think tags, this mode prioritizes speed:

Sub-second latency: Delivers answers in 300ms-range for real-time applications
Streamlined processing: Bypasses intermediate reasoning steps for queries like autocomplete or FAQ responses
Cost-optimized inference: Reduces active parameter usage through architectural optimizations

This architecture enables granular compute budgeting - only activate deep reasoning when necessary, reducing costs without sacrificing capability.

Multilingual Mastery & Translation Excellence

Qwen3 sets a new standard for global AI applications with native support for 119 languages and dialects, spanning Indo-European, Sino-Tibetan, Afro-Asiatic, and Dravidian language families. Its enhanced multilingual capabilities include:

Advanced translation quality with context-aware localization
Cross-lingual instruction following for seamless international deployments (e.g., "Explain quantum computing 量子コンピュータとは")
Low-resource language optimization for underrepresented dialects

This makes Qwen3-32B ideal for building multilingual chatbots, localization tools, and real-time translation pipelines - all while maintaining high reasoning performance across languages

The Fastest Inference. Most Competitive Pricing.

We're launching with Qwen3-32B, a dense model designed for high performance across a wide range of tasks. It’s built to deliver strong results in. Now running on SambaNova Cloud at unmatched speed:

280 tokens/sec - Full-model, high-speed inference
$0.40 / million input tokens
$0.80 / million output tokens
No distillation. No quantization. Just the full model - fast and affordable.

The Fastest Inference. Most Competitive Pricing

🚀 Try Qwen3 Today on SambaNova Cloud

This is just the beginning for Qwen3. If you're building tools, deploying agents, or scaling next-gen apps, Qwen3-32B gives you the control to think smart and move fast.

👉 Start Using Qwen3-32B on SambaNova Cloud.

Previous story

← Meta Llama 4 Maverick & Llama 4 Scout on SambaNova Cloud

Next story

Introducing Whisper Large-V3 to SambaNova Cloud →