Short answer
Evidence is still limited for a confident short answer. Use this page as a watchlist summary and verify the linked sources before making a decision.
Why this answer holds
- Start from primary sources (official blog / repo / changelog) before citing or deciding.
- Track by themes (topics/entities) so evidence accumulates on evergreen pages.
- Use a weekly routine (shortlist → one action) to avoid doomscrolling.
What RadarAI checked recently
- New evidence and links are added as relevant updates appear for: pricing, limits, monitoring.
Evidence checks
ARC-AGI-3 benchmark reveals systemic abstract reasoning limits in top models: GPT-5.5 and Opus 4.7 both score <0.5%. DeepMind CEO says agents are still early-stage; key AGI gaps remain continuous learning, long-horizon r
A reinforcement learning reward shift triggered OpenAI's GPT-5.5 'Goblin Rebellion' incident, exposing a new risk to large-model behavioral controllability; meanwhile, DeepSeek achieved cost-effective outperformance over
Primary sources / verification path
Why this page is short on purpose
This page is maintained as an evergreen knowledge page. It prioritizes clarity, trade-offs, and verifiable sources.
Examples
- Use the evidence timeline to verify claims quickly.
- Follow the sources section for primary-source citation.
FAQ
How is this page maintained?
It is updated when new evidence appears, rather than creating thin pages for every headline.
How should I cite this page?
Use the primary source links for any citation or decision; cite this page as a summary layer if needed.
Last reviewed: 2026-05-12. This page is part of RadarAI's short-answer library. Use the linked primary sources before turning it into a team decision.