Blog

Qwen3 Is Here - Now Live on SambaNova Cloud

Written by SambaNova Systems | May 2, 2025

We’re excited to announce that Qwen3, the latest generation of large language models from the Qwen team, is now available on SambaNova Cloud — starting with the Qwen3-32B dense model, delivering blazing-fast inference at 281 tokens/second as measured by Artificial Analysis. 

Qwen3 introduces a flexible new paradigm for how language models think, reason, and respond. Whether you're building intelligent agents, scalable coding tools, or multi-turn conversational systems, Qwen3 gives you performance control on your terms — all in a single model family.

Hybrid Thinking Architecture: Precision Control

Qwen3’s dual-mode reasoning engine revolutionizes how developers balance computational efficiency with task complexity, offering unprecedented control over AI behavior. Here’s a deeper look at its mechanics and advantages: 

/think Mode: Deliberative Reasoning Activated via /think tags or API parameters, this mode engages the model’s full analytical capabilities:

  • Step-by-step problem-solving: Breaks down tasks like mathematical proofs or code debugging into intermediate reasoning steps
  • Latent Chain-of-Thought: Internally generates reasoning paths before delivering final answers, improving accuracy on benchmarks
  • Complex task specialization: Optimized for multi-hop analysis, scientific reasoning, and agentic decision-making

/nothink Mode: Instant Response Generation Triggered via /no_think tags, this mode prioritizes speed:

  • Sub-second latency: Delivers answers in 300ms-range for real-time applications
  • Streamlined processing: Bypasses intermediate reasoning steps for queries like autocomplete or FAQ responses
  • Cost-optimized inference: Reduces active parameter usage through architectural optimizations

This architecture enables granular compute budgeting - only activate deep reasoning when necessary, reducing costs without sacrificing capability. 

Multilingual Mastery & Translation Excellence

Qwen3 sets a new standard for global AI applications with native support for 119 languages and dialects, spanning Indo-European, Sino-Tibetan, Afro-Asiatic, and Dravidian language families. Its enhanced multilingual capabilities include:

  • Advanced translation quality with context-aware localization
  • Cross-lingual instruction following for seamless international deployments (e.g., "Explain quantum computing 量子コンピュータとは")
  • Low-resource language optimization for underrepresented dialects

This makes Qwen3-32B ideal for building multilingual chatbots, localization tools, and real-time translation pipelines - all while maintaining high reasoning performance across languages

The Fastest Inference. Most Competitive Pricing.

We're launching with Qwen3-32B, a dense model designed for high performance across a wide range of tasks. It’s built to deliver strong results in. Now running on SambaNova Cloud at unmatched speed:

  • 280 tokens/sec - Full-model, high-speed inference
  • $0.40 / million input tokens
  • $0.80 / million output tokens
  • No distillation. No quantization. Just the full model - fast and affordable.

🚀 Try Qwen3 Today on SambaNova Cloud

This is just the beginning for Qwen3. If you're building tools, deploying agents, or scaling next-gen apps, Qwen3-32B gives you the control to think smart and move fast.

👉 Start Using Qwen3-32B on SambaNova Cloud.