Qwen updates (what to watch)

Answer

Qwen3 (April 2026) is Alibaba's most capable open-source series to date—Qwen3-30B-A3B (MoE, 3B active) matches GPT-4o-level benchmarks; Qwen3-235B-A22B leads on AIME and LiveCodeBench. All models support hybrid thinking/non-thinking modes and are Apache 2.0 licensed.

Key points

Qwen3-30B-A3B (3B active parameters via MoE) achieves GPT-4o parity at a fraction of the compute cost.
Qwen3-235B-A22B tops AIME 2025 and LiveCodeBench—currently the strongest open-weight coding model.
Hybrid thinking mode: toggle chain-of-thought reasoning per request, same model.
Apache 2.0 license on all Qwen3 open-weight models—commercial use allowed.
Local deployment: Qwen3-30B-A3B runs on ~24GB VRAM with 4-bit quantization.

What changed recently

Qwen3 series released April 28–29, 2026: 8 models from 0.6B to 235B, including dense and MoE variants.
Qwen3-235B-A22B sets new open-source SOTA on coding and math benchmarks (April 2026).
Hybrid thinking/non-thinking mode introduced—first in the Qwen series.

Explanation

The MoE architecture in Qwen3-30B-A3B activates only 3B of 30B parameters per token, enabling near-frontier performance on consumer hardware. This is a meaningful shift for builders who want strong models without cloud API costs.

Hybrid thinking mode lets developers use the same model for both fast responses (non-thinking) and deliberate reasoning (thinking), reducing the need to manage separate model endpoints.

Tools / Examples

Run Qwen3-30B-A3B locally via Ollama or vLLM on a single A100 (80GB) or two 3090s with quantization.
Use Qwen3-235B-A22B via Alibaba Cloud Bailian API for production coding agents requiring SOTA accuracy.

Evidence timeline

Qwen3 Series Released: 8 models, Apache 2.0

2026-04-29

Qwen3-235B-A22B and Qwen3-30B-A3B released open-weight; 235B leads AIME 2025 and LiveCodeBench benchmarks.

Qwen3-30B-A3B MoE architecture details

2026-04-22

Official release post: hybrid thinking mode, MoE efficiency, Apache 2.0 licensing confirmed.

Sources

FAQ

Can I use Qwen3 models commercially?

Yes—all Qwen3 open-weight models are Apache 2.0 licensed, including the 235B variant. Check attribution requirements in the license.

What hardware do I need for local deployment?

Qwen3-30B-A3B requires ~24GB VRAM with 4-bit quantization (e.g., two RTX 3090s). The 235B model requires multi-GPU or cloud inference.

How does hybrid thinking mode work?

Pass a flag at inference time (enable_thinking: true/false) to switch between chain-of-thought reasoning and direct response—no separate model needed.

Last updated: 2026-05-12 · Policy: Editorial standards · Methodology