Machine learning and language technology in minority language contexts


Colloquium talk.

Abstract: Techniques based on machine learning and neural networks have led to huge advances in technologies such as machine translation and speech recognition. Generally speaking, very large text and speech corpora or annotated datasets are required to employ these techniques, and smaller language communities face a number of challenges in trying to produce suitable datasets for machine learning. I will discuss a number of approaches that we have used in the three Gaelic language communities to overcome these challenges, including crowdsourcing, transfer learning from better-resourced languages, and mining of historical archives for data.