FutureSim Exposes Polymarket AI’s Narrow Wins and Failures
Max Planck Institute researchers recently released FutureSim, a benchmark for polymarket ai-style forecasting that tests whether agents can predict real-world…
Highlights advances in core systems, technical breakthroughs, experiments, and academic work driving progress.
Max Planck Institute researchers recently released FutureSim, a benchmark for polymarket ai-style forecasting that tests whether agents can predict real-world…
A BDH seminar summary circulating in recent technical discussion frames LLM memory as a tradeoff between the familiar transformer KV…
DeepSeek has released an open-source visual reasoning framework called Thinking with Visual Primitives. According to 36Kr, the system changes how…
Erdős problem #1196 now has a serious claimed solution, and the evidence ladder is unusually visible. Liam Price posted GPT-5.4…
AI-designed viruses are now a lab result, but not in the way the viral posts made it sound. Researchers affiliated…
A 14-author perspective paper posted to arXiv on April 23 argues that deep learning theory is starting to look less…
LLM failure modes are easiest to understand if you stop treating them as personality flaws, “the model lied,” “the chatbot…
A diffusion language model generates text by starting from masked or otherwise corrupted tokens and iteratively restoring them. In this…
The standard story is that LLMs work in words. They predict the next token, so surely their internal reasoning is…