How to Track AI Model Releases Systematically
New models ship weekly. Without a system, you end up with scattered browser tabs, outdated comparisons, and no clear picture of how the landscape has shifted since your last decision.
What to capture per release
For each model release worth tracking, record these fields:
| Field | Why it matters |
|---|---|
| Model name + version | Canonical reference |
| Benchmarks | Which evals, scores, and who ran them |
| Context window | Affects what you can build |
| Cost per 1M tokens | Input and output costs for budget modeling |
| License | Commercial use, fine-tuning rights, redistribution |
| Changelog URL | Primary source for verification |
| Date | Context for how current your comparison is |
Where to keep it
A simple spreadsheet or Notion database works well. The key is that it's structured and searchable—not a folder of PDFs and bookmarks.
Benchmarks: what to watch out for
Self-reported benchmarks run by the releasing company are weak evidence. Look for independent evaluations (e.g. LMSYS Chatbot Arena, third-party reproducibility). Note who ran the benchmark and on what eval set.
Review cadence
Update your model tracker when you shortlist a new model from your weekly radar scan. Do a quarterly review to archive stale entries and update cost figures (prices drop frequently).
Summary
Track AI model releases systematically: capture name/version, benchmarks (with source), context window, cost/1M tokens, license, and changelog URL. Keep it in a structured table, not scattered bookmarks. Review quarterly.
FAQ
Should I track every model? No. Track models that are plausible for your use case given context window, cost, and license constraints. Everything else can stay in your radar history.
Related reading
- How to Track AI Developments Across GitHub, Blogs, and Launches
- Comparing AI News Aggregators: What to Look For
- How to Create an AI Trends Digest for Your Team
- AI Launches That Matter vs Launches That Don't: How to Tell
RadarAI helps builders track AI updates, compare source-backed signals, and decide which changes are worth acting on.