Answer
Qwen3 (April 2026) is Alibaba's most capable open-source series to date—Qwen3-30B-A3B (MoE, 3B active) matches GPT-4o-level benchmarks; Qwen3-235B-A22B leads on AIME and LiveCodeBench. All models support hybrid thinking/non-thinking modes and are Apache 2.0 licensed.
Key points
- Qwen3-30B-A3B (3B active parameters via MoE) achieves GPT-4o parity at a fraction of the compute cost.
- Qwen3-235B-A22B tops AIME 2025 and LiveCodeBench—currently the strongest open-weight coding model.
- Hybrid thinking mode: toggle chain-of-thought reasoning per request, same model.
- Apache 2.0 license on all Qwen3 open-weight models—commercial use allowed.
- Local deployment: Qwen3-30B-A3B runs on ~24GB VRAM with 4-bit quantization.
What changed recently
- Qwen3 series released April 28–29, 2026: 8 models from 0.6B to 235B, including dense and MoE variants.
- Qwen3-235B-A22B sets new open-source SOTA on coding and math benchmarks (April 2026).
- Hybrid thinking/non-thinking mode introduced—first in the Qwen series.
Explanation
The MoE architecture in Qwen3-30B-A3B activates only 3B of 30B parameters per token, enabling near-frontier performance on consumer hardware. This is a meaningful shift for builders who want strong models without cloud API costs.
Hybrid thinking mode lets developers use the same model for both fast responses (non-thinking) and deliberate reasoning (thinking), reducing the need to manage separate model endpoints.
Tools / Examples
- Run Qwen3-30B-A3B locally via Ollama or vLLM on a single A100 (80GB) or two 3090s with quantization.
- Use Qwen3-235B-A22B via Alibaba Cloud Bailian API for production coding agents requiring SOTA accuracy.
Evidence timeline
Qwen3-235B-A22B and Qwen3-30B-A3B released open-weight; 235B leads AIME 2025 and LiveCodeBench benchmarks.
Official release post: hybrid thinking mode, MoE efficiency, Apache 2.0 licensing confirmed.
Sources
FAQ
Can I use Qwen3 models commercially?
Yes—all Qwen3 open-weight models are Apache 2.0 licensed, including the 235B variant. Check attribution requirements in the license.
What hardware do I need for local deployment?
Qwen3-30B-A3B requires ~24GB VRAM with 4-bit quantization (e.g., two RTX 3090s). The 235B model requires multi-GPU or cloud inference.
How does hybrid thinking mode work?
Pass a flag at inference time (enable_thinking: true/false) to switch between chain-of-thought reasoning and direct response—no separate model needed.
Related
Last updated: 2026-05-12 · Policy: Editorial standards · Methodology