If you’re Google or Netflix, and you have a recommendation or search system as part of your bread and butter, what’s the best way to test improvements to your algorithm? A/B testing is the canonical answer for testing how users respond to software changes, but it gets tricky really fast to think about what an A/B test means in the context of an algorithm that returns a ranked list. That’s why we’re talking about interleaving this week—it’s a simple modification to A/B testing that makes it much easier to race two algorithms against each other and find the winner, and it allows you to do it with much less data than a traditional A/B test.

Relevant links:

https://medium.com/netflix-techblog/interleaving-in-online-experiments-at-netflix-a04ee392ec55

https://www.microsoft.com/en-us/research/publication/predicting-search-satisfaction-metrics-with-interleaved-comparisons/

https://www.cs.cornell.edu/people/tj/publications/joachims_02b.pdf

Podden och tillhörande omslagsbild på den här sidan tillhör Ben Jaffe and Katie Malone. Innehållet i podden är skapat av Ben Jaffe and Katie Malone och inte av, eller tillsammans med, Poddtoppen.