Feb 18 AI Briefing · Issue #41
The Qwen 3.5 series—including the 397B-A17B and Plus variants—is triggering explosive, full-stack ecosystem adoption across leading hardware platforms and developer toolchains—from NVIDIA NeMo and AMD Instinct GPUs to Ollama Cloud, ZenMux, and mlx-vlm—with first-day support now live. Meanwhile, LlamaIndex is accelerating its evolution toward a token economy, restructuring API access around the $LLAMA token.
Editorial standards and source policy: Editorial standards, Team. Content links to primary sources; see Methodology.
## 🔍 Key Insights
**The Qwen 3.5 series**—including the **397B-A17B** and **Plus** variants—is driving explosive, full-stack ecosystem adoption: from **NVIDIA NeMo**, **AMD Instinct GPUs**, **Ollama Cloud**, **ZenMux**, and **mlx-vlm**, mainstream hardware platforms and development toolchains have achieved **day-one support**. Concurrently, **LlamaIndex** is advancing toward a token-based economy, redefining API access via the **$LLAMA token**.
## 🚀 Top Updates
- **LlamaIndex launches $LLAMA Token as a universal API key**: Officially retiring its monthly subscription model, it now unifies agent invocation permissions through tokenized access.
- **Qwen 3.5-397B-A17B debuts on LMSYS Chatbot Arena**: As a natively multimodal open-weight model, it's now benchmarked across three dedicated leaderboards—text, vision, and code.
- **NVIDIA delivers immediate development support for Qwen 3.5**: Offering free APIs plus deep integration with the **NeMo framework**, significantly lowering enterprise deployment barriers.
- **AMD announces day-one support for Qwen 3.5 on Instinct GPUs**: Leveraging the **SGLang/vLLM software stack**, enabling high-performance inference optimization.
- **LlamaCloud rolls out enhanced PDF parsing**: Now supports one-click conversion of complex documents—including tables and charts—into structured **Markdown/JSON**.
- **Qwen 3.5 Plus launches on ZenMux**: Built on a **Gated DeltaNet + Sparse MoE** architecture, claiming performance approaching that of GPT-5.2.
- **mlx-vlm v0.3.12 adds local support for Qwen 3.5**: Enabling the first-ever local execution of Qwen's vision-language models on **Mac devices**.
- **Simon Willison unveils two new Showboat tools—Chartroom and datasette-showboat**: Respectively enhancing CLI-based chart visualization and providing real-time streaming monitoring of AI agent execution progress.
The Qwen 3.5 series—including the 397B-A17B and Plus variants—is driving explosive, full-stack ecosystem adoption: from NVIDIA NeMo, AMD Instinct GPUs, Ollama Cloud, ZenMux, and mlx-vlm, mainstream hardware platforms and development toolchains have achieved day-one support. Concurrently, LlamaIndex is advancing toward a token-based economy, redefining API access via the $LLAMA token.
🚀 Top Updates
- LlamaIndex launches $LLAMA Token as a universal API key: Officially retiring its monthly subscription model, it now unifies agent invocation permissions through tokenized access.
- Qwen 3.5-397B-A17B debuts on LMSYS Chatbot Arena: As a natively multimodal open-weight model, it's now benchmarked across three dedicated leaderboards—text, vision, and code.
- NVIDIA delivers immediate development support for Qwen 3.5: Offering free APIs plus deep integration with the NeMo framework, significantly lowering enterprise deployment barriers.
- AMD announces day-one support for Qwen 3.5 on Instinct GPUs: Leveraging the SGLang/vLLM software stack, enabling high-performance inference optimization.
- LlamaCloud rolls out enhanced PDF parsing: Now supports one-click conversion of complex documents—including tables and charts—into structured Markdown/JSON.
- Qwen 3.5 Plus launches on ZenMux: Built on a Gated DeltaNet + Sparse MoE architecture, claiming performance approaching that of GPT-5.2.
- mlx-vlm v0.3.12 adds local support for Qwen 3.5: Enabling the first-ever local execution of Qwen's vision-language models on Mac devices.
- Simon Willison unveils two new Showboat tools—Chartroom and datasette-showboat: Respectively enhancing CLI-based chart visualization and providing real-time streaming monitoring of AI agent execution progress.