Today we’re joined by Joseph Gonzalez, Assistant Professor in the EECS department at UC Berkeley.

In our conversation, we explore Joseph’s paper “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers,” which looks at compute-efficient training strategies for models. We discuss the two main problems being solved; 1) How can we rapidly iterate on variations in architecture? And 2) If we make models bigger, is it really improving any efficiency?

Podden och tillhörande omslagsbild på den här sidan tillhör Sam Charrington. Innehållet i podden är skapat av Sam Charrington och inte av, eller tillsammans med, Poddtoppen.