Open Models Are Winning code arena rankings by Fitting the Loop
A strange thing happened to code arena rankings. They stopped being just a nerdy scoreboard and started acting like a…
Highlights advances in core systems, technical breakthroughs, experiments, and academic work driving progress.
A strange thing happened to code arena rankings. They stopped being just a nerdy scoreboard and started acting like a…
A theorem-first paper and an ablation-heavy systems paper can now describe the same model class and leave with very different…
A serving engineer watches tokens arrive in that familiar trickle: fast enough to demo, slow enough to feel like the…
The first time you see it, it’s kind of perfect: a tiny folder in your Cursor skills called make-no-mistakes. One…
If you tried to rebuild the Tufts experiment yourself, the first thing you’d notice is boring: the neuro-symbolic AI system…
Everyone on Reddit sees the same thing: a bunch of Chinese labs promising new open‑weight models… and then quietly missing…
YC‑Bench just produced the sort of result that usually launches a thousand hot takes: GLM‑5 vs Claude Opus on a…
If you’ve asked an LLM for a simple command lately and watched it flail through three wrong answers, you’ve already…
Swapping dot‑product attention for RBF attention sounds like an architectural revolution. In Raphael Pisoni’s experiment, it turned out to be…