Improving full-text search results on dúchas.ie using language technology

Published in Proceedings of the 3rd Celtic Language Technology Workshop at MT Summit XVII, 2019

Recommended citation: Brian Ó Raghallaigh, Kevin Scannell, and Meghan Dowling. Improving full-text search results on dúchas.ie using language technology. In Proceedings of the Third Celtic Language Technology Workshop, pages 63–69, Dublin, Ireland, 2019. European Association for Machine Translation. https://kevinscannell.com/files/duchas.pdf

Abstract: In this paper, we measure the effectiveness of using language standardisation, lemmatisation, and machine translation to improve full-text search results on dúchas.ie, the web interface to the Irish National Folklore Collection. Our focus is the Schools’ Collection, a scanned manuscript collection which is being transcribed by members of the public via a crowdsourcing initiative. We show that by applying these technologies to the manuscript page transcriptions, we obtain substantial improvements in search engine recall over a test set of actual user queries, with no appreciable drop in precision. Our results motivate the inclusion of this language technology in the search infrastructure of this folklore resource.

Slides (presented by Brian Ó Raghallaigh)