In this epsiode, we are joined by Letitia Parcalabescu, PhD candidate at Heidelberg University's Department of Computational Linguistics, to share her insights on the fascinating world of Multimodal Learning. As a researcher and science communicator, Letitia has been thinking about the intersection between vision and text, a frontier of machine learning that has seen immense growth in recent years, for several years.

We explore her journey from physics to machine learning, unpack the influence of large language models (LLMs) on our understanding of linguistics, and delve into the relevance of vision and language interplay in machine learning. We discuss the key developments in multimodal learning, including joint embeddings, diffusion models, and LLMs, and shares her perspective on how these advancements relate to Artificial General Intelligence (AGI).

Alongside her research, we discuss the value of benchmarks and performance metrics in machine learning, as well as her own research projects. Letitia offers a glimpse into a typical research day in her field, and shares her motivations and learnings from her successful YouTube channel, AI Coffee Break: https://youtube.com/@AICoffeeBreak

Podden och tillhörande omslagsbild på den här sidan tillhör Manuel Brenner. Innehållet i podden är skapat av Manuel Brenner och inte av, eller tillsammans med, Poddtoppen.