When you are working on a data pipeline for ML ... you are never dealing with a single table. It always demands different tables for different reasons that all have to be mashed together in order to have something that you can learn from. But if that is the case, why do we spend so much time talking about ML pipelines that only work on a single table? Madelon Hulsebos has a Phd on the topic and so we figured that we might ask her.

As mentioned in the podcast, here is the link to Madelon's homepage. https://www.madelonhulsebos.com/

Some links to interesting articles from Madelon, as well as her homepage, can be found below. https://www.madelonhulsebos.com/assets/dataset_search_survey.pdf

https://dl.acm.org/doi/pdf/10.1145/3654975

https://dl.acm.org/doi/pdf/10.1145/3588710

Podden och tillhörande omslagsbild på den här sidan tillhör probabl. Innehållet i podden är skapat av probabl och inte av, eller tillsammans med, Poddtoppen.