Today's clip is from Episode 157 featuring Stefan Radev. In this conversation, Alex and Stefan dig into one of the hardest open problems in simulation-based inference — hierarchical models.
The core idea: when you move from flat to hierarchical models, you're no longer estimating one set of parameters. You have local parameters that vary by location (or subject, or city) and global parameters that capture what's shared across all of them. And you don't just want each separately — you want the full joint posterior, because that's where the Bayesian magic of shrinkage actually lives.
Stefan builds the problem from the ground up. Start with the simplest hierarchical case: a two-level model. He uses electoral forecasting in France as the example — cities nested inside departments nested inside the whole country.
Now your simulator has to cover all three levels. If that simulator is slow (think: brain emulators, minutes per sample), scaling to hundreds of groups becomes completely intractable. Memory issues, specialized network requirements, the works.
The key insight: this problem has structure you can exploit. The joint posterior factorizes in a particularly nice way — each local parameter depends on its own local data and on the global parameters. That means instead of cramming everything into one giant high-dimensional vector and hoping a neural network figures it out, you can decompose the problem. Estimate local parameters conditioned on local data and the globals. Use composition.
The takeaway: hierarchical models aren't just "harder flat models" - they have a geometry that demands a different architecture. Respecting that structure is what makes amortized inference scale.
Podden och tillhörande omslagsbild på den här sidan tillhör
Alexandre Andorra. Innehållet i podden är skapat av Alexandre Andorra och inte av,
eller tillsammans med, Poddtoppen.