Models and Research

Highlights advances in core systems, technical breakthroughs, experiments, and academic work driving progress.

Models and Research

Claude vs ChatGPT: Why Claude Feels More Honest and Accurate
ByPriscilla Li March 30, 2026April 13, 2026

A 100‑question “bullshit benchmark” sounds like a joke until you see the chart. In BullshitBench v2, Anthropic’s Claude models sit…

Read More Claude vs ChatGPT: Why Claude Feels More Honest and Accurate
Models and Research

Rebuttal Experiments Are Breaking Peer Review Right Now
ByMax Dvornik March 29, 2026April 13, 2026

A lot of people in AI quietly agree on one thing about rebuttal experiments: they make their papers better. More…

Read More Rebuttal Experiments Are Breaking Peer Review Right Now
Models and Research

Voxtral TTS: Mistral’s Open Model, Hype vs Hardware
ByMax Dvornik March 27, 2026April 13, 2026

Imagine you ship a voice agent that talks to customers all day, and then your TTS provider changes their pricing,…

Read More Voxtral TTS: Mistral’s Open Model, Hype vs Hardware
Models and Research

Local LLM Coding: $500 GPU Beats Claude: Not the Story
BySarah Fraser March 26, 2026April 13, 2026

A frozen 14B Qwen model, quantized and running on a single RTX 5060 Ti, scores 74.6% pass@1 on LiveCodeBench after…

Read More Local LLM Coding: $500 GPU Beats Claude: Not the Story
Models and Research

Karpathy Autoresearch: 700 Experiments Rewire AI Research
ByMax Dvornik March 24, 2026April 13, 2026

If you tried to copy Karpathy autoresearch this weekend, the first thing you’d hit isn’t the 630 lines of Python….

Read More Karpathy Autoresearch: 700 Experiments Rewire AI Research
Models and Research

Persona Drift: Why LLMs Go Insane Under Repetition
ByGeoff Dyers March 21, 2026April 13, 2026

A model gets pinged every few seconds for the time. Nothing else. After enough rounds, it starts acting “fed up,”…

Read More Persona Drift: Why LLMs Go Insane Under Repetition
Models and Research

The NZ Town Subsidising Silicon Valley’s AI Boom
BySarah Fraser March 19, 2026April 13, 2026

A farmer outside Invercargill stands at a fence line and tries to picture it: the paddock across the road, not…

Read More The NZ Town Subsidising Silicon Valley’s AI Boom
Models and Research

NVIDIA Rubin Performance: Why ‘Only 2×’ Misses the Point
BySarah Fraser March 17, 2026April 13, 2026

A guy on Reddit squints at an NVIDIA slide, sees “2×” at the edge of a curve, and declares that…

Read More NVIDIA Rubin Performance: Why ‘Only 2×’ Misses the Point
Models and Research

AMI Labs: Why LeCun’s $1.03B Bet Resets AI Research
ByJames McCallef March 11, 2026April 13, 2026

A dozen‑ish people, zero product, and $1.03 billion in the bank. That’s AMI Labs right now. If you look at…

Read More AMI Labs: Why LeCun’s $1.03B Bet Resets AI Research

Categories