AI Briefing, May 25 — Issue #326

2026-05-25 16:00

Author: RadarAI Editorial Editor: RadarAI Editorial Last updated: 2026-07-09 Review status: Editorial review pending Brief 速报官方 AI动态开源

Baobei Intelligence, Tsinghua University, and OpenBMB achieved end-to-end training of a 60B-parameter LLM on Huawei Ascend using 1.58-bit ternary quantization—cutting memory use by ~6× while retaining 97% capability. Meanwhile, continuous-space language modeling emerges as a paradigm shift beyond token-based autoregression, seen as a key step toward AGI.

Editorial standards and source policy: Editorial standards, Team. Content links to primary sources; see Methodology.

## 🔍 Key Insights Chinese AI company **Bianque Intelligence**, in collaboration with Tsinghua University and OpenBMB, has broken a major bottleneck in deploying large language models (LLMs) on edge devices—achieving end-to-end training of a **60-billion-parameter model** on Huawei’s Ascend platform using **1.58-bit ternary quantization**. This approach cuts memory usage by ~6× while preserving **97% of model capability** [1]. Meanwhile, a new paradigm—**continuous-space language modeling**—is challenging the structural limits of traditional **token-based autoregressive architectures**, and is increasingly seen as a critical evolutionary path toward AGI [6]. ## 🚀 Key Updates - **BitCPM-CANN Ternary LLM Series Launched** [1]: Bianque Intelligence and partners achieved end-to-end training of a 60B-parameter model on Ascend hardware—delivering both high cache efficiency and strong capability retention under 1.58-bit quantization - **Reasonix Boosts DeepSeek V4 Inference Efficiency** [4]: A purpose-built, append-only caching mechanism for V4 achieves a **99.82% cache hit rate**, slashing API costs by **80%** - **2026 Beijing Academy of Artificial Intelligence (BAIR) Conference Lineup Announced** [5]: Turing Award winners headline the event; China’s top-tier LLM teams gather to explore three frontier areas—**agents, world models, and embodied intelligence** - **Kimi Releases TypeScript Version of kimi-code** [2]: A full rewrite of the original Python CLI tool—prioritizing engineering robustness and ecosystem compatibility—sparking broad discussion among developers - **“Tokens Must Die?” Sparks Paradigm Debate** [6]: Teams led by Prof. Kaiming He and ByteDance’s Seed Lab propose continuous-space language modeling to address fundamental limitations of token-based autoregression - **Redefining the Core Tension in AI Coding** [3]: Industry consensus is shifting: **execution > ideation**—the ability to ship fast and reliably has become the decisive factor in product competitiveness ## 🔗 Sources [1] Chinese AI Company Breaks Bottleneck of Fitting 60-Billion-Parameter LLMs onto Smartphones — https://www.bestblogs.dev/article/1ac2cf11?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item [2] Kimi Launches TypeScript Version of kimi-code—Playfully Addressing Past Controversy Around the Python Version — https://www.bestblogs.dev/status/2058782251886817432?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item [3] The AI Coding Era: Execution Matters More Than Ideas — https://www.bestblogs.dev/status/2058782129564340464?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item [4] DeepSeek V4 Just Got Even More Efficient! New Tool Achieves 99.82% Cache Hit Rate—Stable Inference at 20% Cost — https://www.bestblogs.dev/article/b3629108?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item [5] Turing Award Winners Lead the Way; China’s Top LLM Teams Unite! The 2026 BAIR Conference Reveals What’s Next for AI — https://www.bestblogs.dev/article/00d8987b?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item [6] “Tokens” Must Die? — https://www.bestblogs.dev/article/3bb425e2?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item

Chinese AI company Bianque Intelligence, in collaboration with Tsinghua University and OpenBMB, has broken a major bottleneck in deploying large language models (LLMs) on edge devices—achieving end-to-end training of a 60-billion-parameter model on Huawei’s Ascend platform using 1.58-bit ternary quantization. This approach cuts memory usage by ~6× while preserving 97% of model capability [1]. Meanwhile, a new paradigm—continuous-space language modeling—is challenging the structural limits of traditional token-based autoregressive architectures, and is increasingly seen as a critical evolutionary path toward AGI [6].

🚀 Key Updates

BitCPM-CANN Ternary LLM Series Launched [1]: Bianque Intelligence and partners achieved end-to-end training of a 60B-parameter model on Ascend hardware—delivering both high cache efficiency and strong capability retention under 1.58-bit quantization
Reasonix Boosts DeepSeek V4 Inference Efficiency [4]: A purpose-built, append-only caching mechanism for V4 achieves a 99.82% cache hit rate, slashing API costs by 80%
2026 Beijing Academy of Artificial Intelligence (BAIR) Conference Lineup Announced [5]: Turing Award winners headline the event; China’s top-tier LLM teams gather to explore three frontier areas—agents, world models, and embodied intelligence
Kimi Releases TypeScript Version of kimi-code [2]: A full rewrite of the original Python CLI tool—prioritizing engineering robustness and ecosystem compatibility—sparking broad discussion among developers
“Tokens Must Die?” Sparks Paradigm Debate [6]: Teams led by Prof. Kaiming He and ByteDance’s Seed Lab propose continuous-space language modeling to address fundamental limitations of token-based autoregression
Redefining the Core Tension in AI Coding [3]: Industry consensus is shifting: execution > ideation—the ability to ship fast and reliably has become the decisive factor in product competitiveness

🔗 Sources

[1] Chinese AI Company Breaks Bottleneck of Fitting 60-Billion-Parameter LLMs onto Smartphones — https://www.bestblogs.dev/article/1ac2cf11?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item
[2] Kimi Launches TypeScript Version of kimi-code—Playfully Addressing Past Controversy Around the Python Version — https://www.bestblogs.dev/status/2058782251886817432?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item
[3] The AI Coding Era: Execution Matters More Than Ideas — https://www.bestblogs.dev/status/2058782129564340464?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item
[4] DeepSeek V4 Just Got Even More Efficient! New Tool Achieves 99.82% Cache Hit Rate—Stable Inference at 20% Cost — https://www.bestblogs.dev/article/b3629108?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item
[5] Turing Award Winners Lead the Way; China’s Top LLM Teams Unite! The 2026 BAIR Conference Reveals What’s Next for AI — https://www.bestblogs.dev/article/00d8987b?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item
[6] “Tokens” Must Die? — https://www.bestblogs.dev/article/3bb425e2?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item

← Back to Updates