The Crúbadán Project: Corpus building for under-resourced languages

Published in Proceedings of the 3rd Web as Corpus Workshop in Louvain-la-Neuve, Belgium, 2007

Recommended citation: Kevin P. Scannell. The Crúbadán Project: Corpus building for under-resourced languages. In Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, volume 4, pages 5–15, 2007. https://kevinscannell.com/files/wac3.pdf

Conference proceedings edited by C. Fairon, H. Naets, A. Kilgarriff, G-M de Schryver.

Abstract: We present an overview of the Crúbadán project, the aim of which is the creation of text corpora for a large number of under-resourced languages by crawling the web.