## This Week in AI - **OpenAI launches ChatGPT’s largest-ever overhaul**, transforming it from a chat interface into a unified intelligent agent platform—integrating coding, autonomous agents, image generation, and third-party app orchestration. This marks the end of the era where “AI is just for chatting.” - **Anthropic faces a reliability crisis**: Performance of Claude Opus 4.7/4.8 has sharply declined—prompting Notion to disable *all* Anthropic models. Meanwhile, Claude Fable 5 was jailbroken within two days by multi-agent collusion, exposing critical system-level security flaws. - **WeChat officially releases its Skill documentation**, enabling millions of mini-programs—via the MCP protocol—to become atomic, AI-callable services. WeChat is rapidly evolving into an AI-native service hub, with its first large-scale deployment already live: DiDi ride-hailing now supports full, no-switch, end-to-end interaction inside WeChat. - **Tencent Hunyuan achieves dual breakthroughs**: Its Stem sparse attention algorithm cuts first-token latency for 128K-context inputs by 3.7×; and in collaboration with Renmin University, it open-sourced *PlanningBench*—the first evaluation and training framework focused exclusively on real-world planning capabilities. - **Intel pushes AI compute to new frontiers**: The Xeon 6 CPU and Arc G3 handheld chip jointly deliver major leaps in CPU-based AI compute density and edge-side LLM inference. Meanwhile, RTX Spark N1X becomes the world’s first consumer-grade heterogeneous processor designed specifically for local agent workloads. - **XPeng abandons its legacy autonomous driving strategy**, scrapping a multi-billion-dollar effort to go all-in on humanoid robots and AI-native physical-world technologies. Leadership estimates their odds of success at just ~20%—but believes this is the *only* viable path forward. ## Hot Topics 1. **ChatGPT’s biggest update yet: From chatbot to super-app** https://www.bestblogs.dev/status/2063686036895478162?utm_source=rss&utm_medium=feed&utm_campaign=resources& **What’s happening**: OpenAI is undertaking its deepest architectural overhaul since launch in 2022—unifying Codex-powered coding, native image generation, third-party app integration, and autonomous agent execution. ChatGPT is being redefined as a general-purpose intelligent agent platform—not just a conversational UI—fundamentally reshaping how users interact with AI. **What to do**: Developers should immediately fork the official `chatgpt-api` SDK and integrate the new beta API (already rolling out in limited release), focusing on testing state persistence for `run_tool` and `create_agent_session` in automated workflows. Product teams can prototype “login-free task cards”—e.g., a user typing *“Book a meeting room for next Wednesday + sync to calendar + draft and send minutes”* triggers cross-service agent coordination end-to-end. 2. **Notion disables all Anthropic models** https://www.bestblogs.dev/status/2063607956017643949?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_artic **What’s happening**: Due to severe degradation in stability, output consistency, and instruction-following fidelity—especially in Opus 4.7/4.8—Notion terminated its commercial partnership with Anthropic. This signals a decisive shift: the industry is moving beyond “parameter wars” into an era where *reliability is non-negotiable*. **What to do**: Enterprise product teams must urgently audit model SLAs (Service Level Agreements), running regression tests using PlanningBench—or a custom set of 50 high-frequency business instructions. Individual developers can reuse Baoyu’s open-source HAR parser (https://www.bestblogs.dev/status/2063475943402872982?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item) to capture and analyze real-world response decay curves for Anthropic calls in their own apps. ## 🔗 Sources - ChatGPT’s biggest update yet: From chatbot to super-app - Notion disables all Anthropic models 3. WeChat Releases Skill Documentation, Fully Integrating Mini-Programs with AI Services https://www.bestblogs.dev/article/baefbe32?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item Core idea: Using the standardized Model Calling Protocol (MCP), WeChat transforms its millions of existing mini-programs into atomic, AI Agent–callable services—effectively turning “services into APIs” and “mini-programs into plugins.” This marks WeChat’s first step toward becoming a central service hub in the AI era. — Actionable insight: Small and mid-sized developers should immediately download WeChat’s official Skill SDK (GitHub link embedded in the documentation) and refactor their mini-program’s `onLaunch` and `onShareAppMessage` logic into JSON Schema compatible with `invokeSkill`. To verify: Use Claude Design’s `call_skill` tool to invoke your mini-program’s weather query interface—and confirm it returns structured `weather_data`, not an HTML-rendered page. 4. Tencent Hunyuan Releases Stem Sparse Attention Algorithm — First-Token Latency for 128K Context Reduced by 3.7× https://www.bestblogs.dev/article/b85d1a7a?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item Core idea: Stem introduces token-position decay and output-aware scoring—achieving near-dense-attention accuracy while using only 25% of the compute. It delivers plug-and-play inference acceleration for latency-sensitive long-context tasks like document summarization and legal contract review. — Actionable insight: LLM application developers should enable `--attn_implementation "stem"` in Hugging Face Transformers (integrated in v4.44.0) and run A/B tests on 128K-context PDF parsing tasks. Key metrics to monitor: `prefill_time_ms` and `decode_latency_p95`. Confirm whether latency drops by ≥3.5× versus native LLaMA-3-70B. 5. NVIDIA Launches RTX Spark N1X Processor https://www.bestblogs.dev/article/2f366f79?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item Core idea: The world’s first consumer-grade PC processor purpose-built for running local AI Agents. It integrates dedicated inference accelerators and a low-latency memory subsystem—enabling real-time, multi-tool Agent workflows (e.g., Claude Design + Browser + Code Interpreter) on a single laptop. — Actionable insight: Hardware startups should immediately apply for the NVIDIA Spark DevKit (now open for reservations on NVIDIA’s website) and run the open-source MiniMax Agent Team demo (https://www.bestblogs.dev/article/7db52531?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item). Benchmark average response latency when orchestrating 5 concurrent Agents on a 32GB-memory system. Success criterion: End-to-end execution of a complex task—e.g., “analyze a financial report PDF → generate a PowerPoint deck → export as video”—must complete in ≤90 seconds. 6. Claude Design is defined as a full-fledged Agent Harness https://www.bestblogs.dev/status/2064749906800111892?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item Core idea: Claude Design isn’t a UI tool—it’s a production-grade agent runtime with 45 callable tools and 24 built-in skills. It enables complex task orchestration—like “create a component via the Figma API → deploy it using Vercel → generate a Loom demo video.” This marks the arrival of the *Harness era* in foundation model infrastructure. — Actionable next step: Frontend engineers should clone the `claude-design-harness-template` (search GitHub for this term to find community templates), inject their design system’s CSS variables into `design_system.json`, then trigger automated delivery with: ```bash curl -X POST https://api.anthropic.com/v1/design/run --data '{"skill": "export_to_figma"}' ``` Verify output includes a valid Figma file ID. 7. XPeng goes all-in on humanoid robots and physical-world AI https://www.bestblogs.dev/article/04f9256a?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item Core idea: He Xiaopeng announced the end of XPeng’s legacy autonomous driving roadmap, redirecting all resources toward AI-native physical-world interaction. The focus is shifting from lidar point-cloud matching to embodied semantics—e.g., teaching robots that “a door handle requires 3N of downward force plus 45° rotation.” — Actionable next step: Embodied AI founders should immediately integrate the Daimeng RobOmni benchmark (https://www.bestblogs.dev/article/b85d1a7a?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item) and fine-tune their models using its tactile-vision-action alignment dataset. For validation: In a lab setting, have the robot successfully unscrew a bottled water cap—with ≥ 85% success rate and failure root causes clearly attributed by RobOmni standards (e.g., “insufficient tactile torque” or “visual pose drift”). 8. Anthropic launches Claude Fable 5 and Mythos 5 https://www.bestblogs.dev/status/2064397772103528771?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item Core idea: Fable 5 is a general-purpose flagship model with an integrated safety classifier; Mythos 5 is an unrestricted version available only to vetted partners. Both introduce an “intelligent fallback to Opus 4.8” mechanism—striking a balance between capability release and controllability for software engineering and scientific tasks. — Actionable next step: Enterprise security teams should deploy Fable 5’s `safety_guardrail` module (available as a Docker image from Anthropic) into their internal Slack bot. Configure a rule: “When an AWS IAM key pattern (e.g., `AKIA...`) is detected, automatically trigger `fallback_to_opus48` and alert the SOC team.” Validation: Send a test message containing `AKIA...` to the bot—and confirm it responds with a downgrade notice within 2 seconds, not raw execution. ## 🔗 Sources - [Claude Design Is a Full Agent Harness](https://www.bestblogs.dev/status/2064749906800111892?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item) - [XPeng Bets Everything on Humanoid Robots and Physical-World AI](https://www.bestblogs.dev/article/04f9256a?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item) - [Anthropic Releases Claude Fable 5 and Mythos 5](https://www.bestblogs.dev/status/2064397772103528771?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item) 9. Opendoor Cuts Its Offshore Team in India, Builds AI-Native Team in the U.S. https://www.bestblogs.dev/status/2064950294711013807?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item Core Insight: The elimination of over 200 roles signals a strategic shift—from traditional offshore outsourcing (“large and fragmented”) to lean, AI-native U.S.-based teams (“small and sharp”). For example, a five-person team now ships an average of three production-ready features per day. Engineering work is pivoting away from geographic arbitrage toward *compute arbitrage* and *prompt engineering arbitrage*. — Actionable Implication: CTOs should immediately reference MIT’s empirical findings: code volume surged 17.3×, yet release frequency rose only 30% (https://www.bestblogs.dev/article/6c197252?utm_source=rss&utm_medium=feed&utm_campaign=resources&entry=rss_article_item).