This is Part 2 of our conversation with Professor Philipp Koehn of Johns Hopkins University.  Professor Koehn is one of the world’s leading experts in the field of Machine Translation & NLP.  

In this episode we delve into commercial applications of machine translation, open source tools available and also take a look into what to expect in the field in the future.

Episode Summary:

 

  • Typical datasets used for training models
  • The role of infrastructure and technology in Machine Translation
  • How the academic research in Machine Translation has manifested into industry applications
  • Overview of what’s available in Open source tools for Machine Translation

 

  • The Future of Machine Translation and can it pass a Turing test

 

Resources:

 

Philipp Koehn latest book - Neural Machine Translation - Amazon link: 

 

https://www.amazon.com/Neural-Machine-Translation-Philipp-Koehn/dp/1108497322

 

Omniscien Technologies - Leading Enterprise Provider of machine translation services:

 

https://omniscien.com/

 

Open Source tools:

 

- Fairseq https://fairseq.readthedocs.io/en/latest/

- Marian https://marian-nmt.github.io/

- OpenNMT https://opennmt.net/

- Sockeye https://awslabs.github.io/sockeye/

 

Translated texts (parallel data) for training:

 

- OPUS http://opus.nlpl.eu/

- Paracrawl https://paracrawl.eu/

 

Two papers mentioned about excessive use of computing power to train NLP models:

 

- GPT-3 https://arxiv.org/abs/2005.14165

- Roberta https://arxiv.org/abs/1907.11692

Podden och tillhörande omslagsbild på den här sidan tillhör Damien Deighan and Philipp Diesinger. Innehållet i podden är skapat av Damien Deighan and Philipp Diesinger och inte av, eller tillsammans med, Poddtoppen.