Implementing NLP Projects for Non-Central Languages: Instructions for Funding Bodies, Strategies for Developers

Published in Machine Translation, 2007

Recommended citation: Oliver Streiter, Kevin P. Scannell, and Mathias Stuflesser. Implementing NLP Projects for Non-Central Languages: Instructions for Funding Bodies, Strategies for Developers. Machine Translation, 20(4):267–289, 2006. https://kevinscannell.com/files/mt.pdf

DOI: doi:10.1007/s10590-007-9026-x

Abstract: This research begins by distinguishing a small number of central languages from the non-central languages, where centrality is measured by the extent to which a given language is supported by natural language processing tools and research. We proceed to analyse the conditions under which non-central language projects (NCLPs) and central language projects (CLPs) are conducted. We establish a number of important differences which have far-reaching consequences for NCLPs. In order to overcome the difficulties inherent in NCLPs, traditional research strategies have to be reconsidered. Successful styles of scientific cooperation, such as those found in open-source software development or in the development of the Wikipedia, provide alternative views of how NCLPs might be designed. We elaborate the concepts of free software and software pools and argue that NCLPs, in their own interests, should embrace an open-source approach for the resources they develop and pool these resources together with other similar open-source resources. The expected advantages of this approach are so important that we suggest that funding organizations put it as condicio sine qua non into project contracts.

Abridged version of the paper (12 pages)