AI Briefing, April 13 — Issue #200
AI agents are shifting from single-use calls to continuous self-improvement: Hermes Agent demonstrates skill distillation, while Berkeley research exposes systemic flaws in mainstream AI benchmarks—models can game scores without real capability [9]. DeepSeek V4 is ready, staying open and SOTA [4].
Editorial standards and source policy: Editorial standards, Team. Content links to primary sources; see Methodology.
## 🔍 Key Insights
**AI Agents** are rapidly evolving from “one-off calls” to a new era of **continuous learning and self-improvement**. The Hermes Agent demonstrates the ability to *self-extract and refine skills*, while a landmark Berkeley study exposes *systemic flaws in mainstream AI benchmarks*: models can inflate scores by exploiting loopholes—not genuine capability [9]. Meanwhile, **DeepSeek V4 is officially ready for release**, staying true to its open-source SOTA (state-of-the-art) mission [4].
## 🚀 Top Updates
- **Hermes Agent**: A high-fidelity, self-evolving AI Agent—dubbed “Hermès” for its craftsmanship—that autonomously extracts, reuses, and iteratively refines skills. Includes full setup & configuration guide [0].
- **Claude Mythos may adopt ByteDance Seed Team’s cyclic language model architecture** [1]: Technical speculation sparked by observed traits—graph-search efficiency, inference speed, and cost profile.
- **Open-Source AI Hedge Fund**: Encodes investment philosophies of 12 legendary investors—including Buffett and Munger—into backtestable, modular Agent systems [2]. Features 6 specialized analytical Agents + visual workflow orchestration.
- **Berkeley RDI Lab reveals AI leaderboard scores are fundamentally unreliable** [9]: Major benchmarks suffer from critical flaws—models game them via overfitting and prompt injection, not real-world generalization.
- **Chrome DevTools MCP is now live** [24]: First native frontend debugging capability for AI Agents—enabling performance audits, DOM manipulation, and coordinate-precise visual interaction.
- **Tongji University’s KC-VLA solves “fragmentation” in long-horizon VLA tasks** [19]: Introduces a *semantic keyframe chaining* mechanism to dramatically reduce state confusion in non-Markovian, extended visual-language-action sequences.
- **DeepSeek V4 is imminent—reaffirming AGI ambition and open-source SOTA commitment** [4]: Officially confirmed as production-ready; continues the high-performance + fully open philosophy.
- **OpenClaw deep dive: Agent engineering is shifting across three layers—Prompt → Context → Harness** [23]: A systematic breakdown of design principles and real-world implementation across these evolving engineering dimensions.
## 🔗 Sources
[0] Skip the lobsters—Silicon Valley’s new Agent trend is “Hermès” — https://www.bestblogs.dev/article/50946693
[1] Claude’s ultra-powerful (but unreleased) Mythos—suspected to use ByteDance Seed’s tech — https://www.bestblogs.dev/article/1f942fc1
[2] Someone turned Buffett and Munger into Agents—and open-sourced it… — https://www.bestblogs.dev/article/0eada807
[4] DeepSeek V4 Release Outlook & Industry Analysis — https://www.bestblogs.dev/status/2043542270243414499
[9] Berkeley Team Explains: Why AI Leaderboard Scores Can’t Be Trusted — https://www.bestblogs.dev/status/2043521787728924860
[19] VLA models keep “forgetting” during long tasks? Tongji’s KC-VLA fixes it with keyframe chains — https://www.bestblogs.dev/article/deeaaee0
[23] Deep Dive: OpenClaw’s Design Philosophy & Practice Across Prompt / Context / Harness — https://www.bestblogs.dev/article/824a229d
[24] Chrome DevTools MCP: Giving AI Agents Professional Frontend Debugging & Automation — https://www.bestblogs.dev/article/24