Neural language technology in an under-resourced setting
Date:
Colloquium talk.
Slides: wustl19.pdf
Abstract: The state-of-the-art for a number of important English-language NLP tasks has improved rapidly with the introduction of neural network methods over the last 5-10 years. While these approaches have been successfully applied to many other languages, progress in the field as a whole has been measured by advancing the state-of-the-art for English. This has led to models that require huge amounts of training data in order to achieve reasonable performance, and that can present difficulties for languages which are typologically very different from English. This talk will cover recent developments in NLP for the Irish language which is both endangered (less than 70,000 daily speakers) and under-resourced in terms of language technology. Our focus will be on the foundational problem of probabilistic language modeling, although we will discuss several applications to machine translation, text normalization, and speech technology.