Publications

Diachronic Parsing of Pre-Standard Irish

Published in Proceedings of the 4th Celtic Language Technology Workshop, 2022

This paper introduces a Universal Dependencies treebank covering a range of Irish dialects and time periods since 1600. We also establish baselines for lemmatization, tagging, and dependency parsing on this corpus by experimenting with a variety of machine learning approaches.

Recommended citation: Kevin P. Scannell. Diachronic Parsing of Pre-Standard Irish. In Proceedings of the 4th Celtic Language Technology Workshop at LREC 2022, pages 7–13, Marseille, France. European Language Resources Association. https://kevinscannell.com/files/dppsi.pdf

Managing Data from Social Media: The Indigenous Tweets Project

Published in The Open Handbook of Linguistic Data Management, 2022

This chapter is a case study in collecting and managing linguistic data from Twitter as part of the Indigenous Tweets project, which was founded in 2011 as a way of promoting the use of Indigenous and minority languages in social media. Our principal aim in this chapter is to describe our data management procedures in sufficient detail that linguists, sociolinguists, lexicographers, or community language activists with some programming skills can begin experimenting with Twitter data themselves.

Recommended citation: Kevin P. Scannell. Managing Data from Social Media: The Indigenous Tweets Project. In Andrea Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren Collister, editors, The Open Handbook of Linguistic Data Management. MIT Press, 2022. https://direct.mit.edu/books/oa-edited-volume/5244/chapter-standard/3537409/Managing-Data-from-Social-Media-The-Indigenous

Tiomsú Corpais don Taighde Foclóireachta: Corpas Foclóireachta na Gaeilge (CFG2020)

Published in TEANGA, the Journal of the Irish Association for Applied Linguistics, 2021

Leagtar amach sa pháipéar seo na céimeanna a leanadh le Corpas Foclóireachta na Gaeilge 2020 (CFG2020), corpas aonteangach 77.3 milliún focal, a thiomsú.

Recommended citation: Mícheál J. Ó Meachair, Brian Ó Raghallaigh, Úna Bhreathnach, Gearóid Ó Cleircín, and Kevin Scannell. Tiomsú Corpais don Taighde Foclóireachta: Corpas Foclóireachta na Gaeilge (CFG2020). TEANGA, the Journal of the Irish Association for Applied Linguistics, 28:278–305, 2021. https://kevinscannell.com/files/dcu2021.pdf

Universal Dependencies for Manx Gaelic

Published in Proceedings of the Universal Dependencies Workshop at COLING 2020, 2020

We present a new Universal Dependencies treebank for Manx Gaelic consisting of 291 sentences and about 6000 tokens, and evaluate several parsing models trained on this corpus.

Recommended citation: Kevin P. Scannell. Universal Dependencies for Manx Gaelic. In Proceedings of the Fourth Workshop on Universal Dependencies at COLING 2020, pages 152–157, 2020. https://kevinscannell.com/files/ud-final.pdf

Neural Models for Predicting Celtic Mutations

Published in Proceedings of the 1st Joint SLTU and CCURL Workshop at LREC 2020, Marseille, France, 11–12 May 2020, 2020

In this paper we describe and evaluate neural network models for predicting mutations in Irish and Scottish Gaelic. We also discuss applications of these models to grammatical error detection and language modeling.

Recommended citation: Kevin P. Scannell. Neural Models for Predicting Celtic Mutations. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pages 1–8, 2020. https://kevinscannell.com/files/lrec2020.pdf

An Cúinne Gaeilge

Published in Irish Fulbright Alumni Association Newsletter, 2020

Short piece about my time as a Fulbright scholar in Carna, Co. Galway

Recommended citation: Kevin Scannell. An Cúinne Gaeilge. Irish Fulbright Alumni Association Newsletter, 47:7, Spring/Summer 2020. https://kevinscannell.com/files/ifaa2020.pdf

Improving full-text search results on dúchas.ie using language technology

Published in Proceedings of the 3rd Celtic Language Technology Workshop at MT Summit XVII, 2019

In this paper, we measure the effectiveness of using language standardisation, lemmatisation, and machine translation to improve full-text search results on dúchas.ie, the web interface to the Irish National Folklore Collection.

Recommended citation: Brian Ó Raghallaigh, Kevin Scannell, and Meghan Dowling. Improving full-text search results on dúchas.ie using language technology. In Proceedings of the Third Celtic Language Technology Workshop, pages 63–69, Dublin, Ireland, 2019. European Association for Machine Translation. https://kevinscannell.com/files/duchas.pdf

Code-switching in Irish tweets: A preliminary analysis

Published in Proceedings of the 3rd Celtic Language Technology Workshop at MT Summit XVII, 2019

This paper reports on the annotation of (English) code-switching in a corpus of 1496 Irish tweets and provides a computational analysis of the nature of code-switching amongst Irish-speaking Twitter users.

Recommended citation: Teresa Lynn and Kevin Scannell. Code-switching in Irish tweets: A preliminary analysis. In Proceedings of the Third Celtic Language Technology Workshop, pages 32–40, Dublin, Ireland, 2019. European Association for Machine Translation. https://kevinscannell.com/files/codeswitch.pdf

Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh)

Published in Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing, 2017

CorCenCC is an inter-disciplinary and multiinstitutional project that is creating a large-scale, open-source corpus of contemporary Welsh.

Recommended citation: Dawn Knight, Tess Fitzpatrick, Steve Morris, Jeremy Evas, Paul Rayson, Irena Spasić, Mark Stonelake, Enlli Môn Thomas, Steven Neale, Jennifer Needs, Scott Piao, Mair Rees, Gareth Watkins, Laurence Anthony, Thomas Michael Cobb, Margaret Deuchar, Kevin Donnelly, Michael McCarthy, and Kevin Scannell. Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes — The National Corpus of Contemporary Welsh). In Piotr Bański, Marc Kupietz, et al., editors, Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing, pages 13–14, 2017. https://kevinscannell.com/files/knightetal.pdf

lemonGAWN: WordNet Gaeilge as Linked Data

Published in LDL 2016 — 5th Workshop on Linked Data in Linguistics: Managing, Building and Using Linked Language Resources, 2016

We introduce lemonGAWN, a conversion of WordNet Gaeilge, a wordnet for the Irish language, with synset relations projected from EuroWordNet.

Recommended citation: Jim O’Regan, Kevin Scannell, and Elaine Uí Dhonnchadha. lemonGAWN: WordNet Gaeilge as Linked Data. In John P. McCrae, Christian Chiarcos, et al., editors, LDL 2016–5th Workshop on Linked Data in Linguistics: Managing, Building and Using Linked Language Resources, pages 36–40, 2016. https://kevinscannell.com/files/lemon.pdf

Teangacha Mionlaigh sa Ré Dhigiteach: Tionchar na Meán Sóisialta

Published in Is ar scáth a chéile a mhaireann na daoine: Éire sa 21ú Aois idir Indibhidiúlacht agus Phobal, 2016

Sa pháipéar seo, déanaim cur síos staitistiúil ar phobal na Gaeilge sna meáin shóisialta, go háirithe ar Twitter.

Recommended citation: Kevin Scannell. Teangacha Mionlaigh sa Ré Dhigiteach: Tionchar na Meán Sóisialta. In Breandán Mac Cormaic, editor, Is ar scáth a chéile a mhaireann na daoine: Éire sa 21ú Aois idir Indibhidiúlacht agus Phobal, pages 112–123. Coiscéim, 2016. https://kevinscannell.com/files/arscathalt.pdf

Teanga dhomhanda

Published in The Irish Times Magazine, 2016

“Ní caitheamh aimsire é seo dúinn. Ba mhaith linn páirt iomlán a ghlacadh i saol na Gaeilge mar theanga bheo”

Recommended citation: Kevin Scannell. Teanga dhomhanda. The Irish Times Magazine, 14–15, 10 March 2016. https://kevinscannell.com/files/irishtimes.pdf

Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets

Published in Proceedings of the Workshop on Noisy User-generated Text (W-NUT) at ACL 2015, 2015

We report on the development of a part-of-speech annotation scheme and annotated corpus for Irish language tweets. We also report on state-of-the-art tagging results by training and testing three existing POS taggers on our new dataset.

Recommended citation: Teresa Lynn, Kevin Scannell, and Eimear Maguire. Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets. In Proceedings of the ACL 2015 Workshop on Noisy User-generated Text, pages 1–8, 2015. https://kevinscannell.com/files/wnut.pdf

Statistical models for text normalization and machine translation

Published in Proceedings of the 1st Celtic Language Technology Workshop at COLING 2014, 2014

We present a statistical model that is effective for standardization of Irish texts, as well as translation from Scottish Gaelic to Irish.

Recommended citation: Kevin Scannell. Statistical models for text normalization and machine translation. In Proceedings of the First Celtic Language Technology Workshop, pages 33–40, Dublin, Ireland, 2014. Association for Computational Linguistics and Dublin City University. https://kevinscannell.com/files/coling14.pdf

Corpas na Gaeilge (1882–1926): Integrating Historical and Modern Irish Texts

Published in Proceedings of the Workshop “Language resources and technologies for processing and linking historical documents and archives” at LREC 2014, 2014

This paper describes the processing of a corpus of seven million words of Irish texts from the period 1882–1926.

Recommended citation: Elaine Uí Dhonnchadha, Kevin Scannell, Ruairí Ó hUiginn, Eilís Ní Mhearraí, Máire Nic Mhaoláin, Brian Ó Raghallaigh, Gregory Toner, Séamus Mac Mathúna, Déirdre D’Auria, Eithne Ní Ghallchobhair, and Niall O’Leary. Corpas na Gaeilge (1882–1926): Integrating Historical and Modern Irish Texts. In Kristín Bjarnadóttir, Mathew Driscoll, et al., editors, Language Resources and Technologies for Processing and Linking Historical Documents and Archives – Deploying Linked Open Data in Cultural Heritage, pages 12–18, 2014. https://kevinscannell.com/files/ria.pdf

Translating Facebook into Endangered Languages

Published in Language Endangerment in the 21st Century: Globalisation, Technology and New Media. Proceedings of the 16th Foundation for Endangered Languages Conference, 2012

This paper describes an approach to localizing the Facebook interface into unsupported languages, originally due to Neskie Manuel.

Recommended citation: Kevin Scannell. Translating Facebook into Endangered Languages. In Tania Ka’ai, Muiris Ó Laoire, et al., editors, Language Endangerment in the 21st Century: Globalisation, Technology and New Media. Proceedings of the 16th Foundation for Endangered Languages Conference, pages 106–110, 2012. https://kevinscannell.com/files/fel12.pdf

Statistical Unicodification of African Languages

Published in Language Resources and Evaluation, 2011

This paper describes an open source package that performs automatic unicodification, implementing a variant of an algorithm described in previous work of De Pauw, Wagacha, and de Schryver.

Recommended citation: Kevin P. Scannell. Statistical Unicodification of African Languages. Language Resources and Evaluation, 45(3):375–386, 2011. https://kevinscannell.com/files/lre.pdf

Application of the Inventory of Biodiversity Information and Social Networking Based Collaboration: An Implementation of Software Framework at Web Application Level

Published in Proceedings of the 2010 International Conference on Computational Intelligence and Software Engineering, 2010

We apply the idea of a software framework at the web application level and implement it as practical web application: Cypriniformes Commons (CypsCom), a solution to store biodiversity information for the fish Order Cypriniformes.

Recommended citation: Hsin-Hui Wu, Richard Mayden, and Kevin Scannell. Application of the Inventory of Biodiversity Information and Social Networking Based Collaboration: An Implementation of Software Framework at Web Application Level. In 2010 International Conference on Computational Intelligence and Software Engineering, pages 1–4, 2010. https://kevinscannell.com/files/wu-mayden.pdf

Implementing NLP Projects for Non-Central Languages: Instructions for Funding Bodies, Strategies for Developers

Published in Machine Translation, 2007

We argue in favor of an open-source approach to the development of language technology for under-resourced languages.

Recommended citation: Oliver Streiter, Kevin P. Scannell, and Mathias Stuflesser. Implementing NLP Projects for Non-Central Languages: Instructions for Funding Bodies, Strategies for Developers. Machine Translation, 20(4):267–289, 2006. https://kevinscannell.com/files/mt.pdf

The Crúbadán Project: Corpus building for under-resourced languages

Published in Proceedings of the 3rd Web as Corpus Workshop in Louvain-la-Neuve, Belgium, 2007

We present an overview of the Crúbadán project, the aim of which is the creation of text corpora for a large number of under-resourced languages by crawling the web.

Recommended citation: Kevin P. Scannell. The Crúbadán Project: Corpus building for under-resourced languages. In Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, volume 4, pages 5–15, 2007. https://kevinscannell.com/files/wac3.pdf

Notes on a paper of Mess

Published in Geometriae Dedicata, 2007

These notes are a companion to the paper “Lorentz spacetimes of constant curvature” by Geoffrey Mess.

Recommended citation: Lars Andersson, Thierry Barbot, Riccardo Benedetti, Francesco Bonsante, William M. Goldman, François Labourie, Kevin P. Scannell, and Jean-Marc Schlenker. Notes on a paper of Mess. Geometriae Dedicata, 126(1):47–70, 2007. https://kevinscannell.com/files/intro.pdf

The generalized cuspidal cohomology problem

Published in Canadian Journal of Mathematics, 2006

We generalize the famous bending construction to the context of branched totally geodesic surfaces, and apply this to the Bianchi groups and finite-index subgroups, performing calculations in a finite range.

Recommended citation: Anneke Bart and Kevin P. Scannell. The generalized cuspidal cohomology problem. Canadian Journal of Mathematics, 58(4):673–690, 2006. https://kevinscannell.com/files/bianchi.pdf

Machine translation for closely related language pairs

Published in Proceedings of the Workshop “Strategies for developing machine translation for minority languages” at LREC 2006, 2006

We exploit the close linguistic relationship between Irish and Scottish Gaelic to develop a robust machine translation system, despite the lack of full parsing technology or pre-existing bilingual lexical resources.

Recommended citation: Kevin P. Scannell. Machine translation for closely related language pairs. In Proceedings of the Workshop Strategies for developing machine translation for minority languages, pages 103–109, 2006. https://kevinscannell.com/files/ga2gd.pdf

New perspectives on self-linking

Published in Advances in Mathematics, 2005

We initiate the study of classical knots through the homotopy class of the nth evaluation map of the knot, which is the induced map on the compactified n-point configuration space.

Recommended citation: Ryan Budney, James Conant, Kevin P. Scannell, and Dev Sinha. New perspectives on self-linking. Advances in Mathematics, 191(1):78–113, 2005. https://kevinscannell.com/files/selflink.pdf

Applications of parallel corpora to the development of monolingual language technologies

Published in N/A, 2005

We describe the development of an aligned parallel corpus of English and Irish texts, along with a simple application enabling the standardization of documents written in prestandard or dialect forms of Irish.

Recommended citation: Kevin P. Scannell. Applications of parallel corpora to the development of monolingual language technologies. Unpublished manuscript. 2005. https://kevinscannell.com/files/ccgb.pdf

A one-dimensional embedding complex

Published in Journal of Pure and Applied Algebra, 2002

We give the first explicit computations of rational homotopy groups of spaces of “long knots” in Euclidean spaces.

Recommended citation: Kevin P. Scannell and Dev P. Sinha. A one-dimensional embedding complex. Journal of Pure and Applied Algebra, 170(1):93–107, 2002. https://kevinscannell.com/files/jpaa.pdf