Topics

Qwen updates (what to watch)

Evergreen topic pages updated with new evidence

Last reviewed: 2026-05-12 · Policy: Editorial standards · Methodology

Answer

Qwen3 (April 2026) is Alibaba's most capable open-source series to date—Qwen3-30B-A3B (MoE, 3B active) matches GPT-4o-level benchmarks; Qwen3-235B-A22B leads on AIME and LiveCodeBench. All models support hybrid thinking/non-thinking modes and are Apache 2.0 licensed.

Key points

  • Qwen3-30B-A3B (3B active parameters via MoE) achieves GPT-4o parity at a fraction of the compute cost.
  • Qwen3-235B-A22B tops AIME 2025 and LiveCodeBench—currently the strongest open-weight coding model.
  • Hybrid thinking mode: toggle chain-of-thought reasoning per request, same model.
  • Apache 2.0 license on all Qwen3 open-weight models—commercial use allowed.
  • Local deployment: Qwen3-30B-A3B runs on ~24GB VRAM with 4-bit quantization.

What changed recently

  • Qwen3 series released April 28–29, 2026: 8 models from 0.6B to 235B, including dense and MoE variants.
  • Qwen3-235B-A22B sets new open-source SOTA on coding and math benchmarks (April 2026).
  • Hybrid thinking/non-thinking mode introduced—first in the Qwen series.

Explanation

The MoE architecture in Qwen3-30B-A3B activates only 3B of 30B parameters per token, enabling near-frontier performance on consumer hardware. This is a meaningful shift for builders who want strong models without cloud API costs.

Hybrid thinking mode lets developers use the same model for both fast responses (non-thinking) and deliberate reasoning (thinking), reducing the need to manage separate model endpoints.

Tools / Examples

  • Run Qwen3-30B-A3B locally via Ollama or vLLM on a single A100 (80GB) or two 3090s with quantization.
  • Use Qwen3-235B-A22B via Alibaba Cloud Bailian API for production coding agents requiring SOTA accuracy.

Evidence timeline

Sources

FAQ

Can I use Qwen3 models commercially?

Yes—all Qwen3 open-weight models are Apache 2.0 licensed, including the 235B variant. Check attribution requirements in the license.

What hardware do I need for local deployment?

Qwen3-30B-A3B requires ~24GB VRAM with 4-bit quantization (e.g., two RTX 3090s). The 235B model requires multi-GPU or cloud inference.

How does hybrid thinking mode work?

Pass a flag at inference time (enable_thinking: true/false) to switch between chain-of-thought reasoning and direct response—no separate model needed.

Related

Last updated: 2026-05-12 · Policy: Editorial standards · Methodology