This week we discuss The Illusion of Thinking, a new paper from researchers at Apple that challenges today’s evaluation methods and introduces a new benchmark: synthetic puzzles with controllable complexity and clean logic.
Their findings? Large Reasoning Models (LRMs) show surprising failure modes, including a complete collapse on high-complexity tasks and a decline in reasoning effort as problems get harder.
Dylan and Parth dive into the paper's findings as well as the debate around it, including a response paper aptly titled "The Illusion of the Illusion of Thinking."
Podden och tillhörande omslagsbild på den här sidan tillhör
Arize AI. Innehållet i podden är skapat av Arize AI och inte av,
eller tillsammans med, Poddtoppen.