Improving full-text search results on dúchas.ie using language technology
Published in Proceedings of the 3rd Celtic Language Technology Workshop at MT Summit XVII, 2019
Recommended citation: Brian Ó Raghallaigh, Kevin Scannell, and Meghan Dowling. Improving full-text search results on dúchas.ie using language technology. In Proceedings of the Third Celtic Language Technology Workshop, pages 63–69, Dublin, Ireland, 2019. European Association for Machine Translation. https://kevinscannell.com/files/duchas.pdf
Abstract: In this paper, we measure the effectiveness of using language standardisation, lemmatisation, and machine translation to improve full-text search results on dúchas.ie, the web interface to the Irish National Folklore Collection. Our focus is the Schools’ Collection, a scanned manuscript collection which is being transcribed by members of the public via a crowdsourcing initiative. We show that by applying these technologies to the manuscript page transcriptions, we obtain substantial improvements in search engine recall over a test set of actual user queries, with no appreciable drop in precision. Our results motivate the inclusion of this language technology in the search infrastructure of this folklore resource.