Crowd-sourcing and Language documentation: September LIP

Ruth Singer recaps some of the interesting points of the last week’s Linguistics in the Pub, an informal gathering of linguists and language activists that is held monthly in Melbourne

The most recent LIP included a demonstration of the Ma! Iwaidja phrasebook and dictionary app developed by the Minjilang Endangered Languages Publication project (publishing as Iwaidja Inyman). The app and its planned stage 2 development to include crowd-sourcing of linguistic data, were used as the starting point for a discussion of the possibilities of crowd-sourcing in language documentation. The discussion was led by Bruce Birch of Iwaidja Inyman. Software developer Matthew de Moiser also took part, from the company Pollen Interactive, which developed the software. There was also a good mix of staff and students from all over: the Centre for Research into Linguistic Diverstiy (CRLD), La Trobe; Monash, RNLD staff and also two participants all the way from NSW: Rob Mailhammer (UWS) and Tiger Webb (Macquarie/ABC radio).

Crowd-sourcing is a way of gathering content for websites. The most familiar example of crowd-sourcing to most readers will probably be Wikipedia. The content is provided by willing volunteers and in the case of Wikipedia, it is also moderated by a volunteer community. Now that speakers of endangered languages are becoming more connected to the internet, via mobile phones, laptops and other devices, the possibilities for using crowd-sourcing to gather language data are becoming apparent.

A summary of the potential benefits of crowd-sourcing methods for language documentation was given in the initial LIP announcement (taken from an abstract of Bruce Birch’s):

The opportunities for language documentation presented by smartphone apps are that they allow people to record, annotate and upload language data as well as metadata in the form of audio, video, images or text. The process allows users of the apps to take advantage of the spontaneous opportunities for data collection which frequently arise, but which are often missed in the context of traditional fieldwork tools and methods. Crowd-sourcing facilitates the involvement of large numbers of native speakers of all ages in the documentation process without the need for high levels of literacy and computer literacy.

Bruce introduced some of the challenges and benefits of crowd-sourcing through a discussion of stage 2 of the development of the Ma! Iwaidja app, which he is currently working on. The aim is to create an interface which minimises the need for text literacy and technological literacy as much as possible. The current app, now available in iTunes, provides a phrasebook and 850 word dictionary. This is searchable and there are sound files for each phrase and word. It is also possible to record new phrases and words and store them on your own device.

The aim for stage 2 is to enable users to upload new words and phrases and share them with one another. It is hoped that speakers will be able to record themselves speaking for a few minutes about the meaning and use of any new words; essential information for a new dictionary entry. After being uploaded to a server, the new data would be curated and available as updates to users. Note that this model is a little different to Wikipedia as there would be a committee (the current Iwaidja language team) who would make decisions about what new data to incorporate into the app. One of the challenges would be how to fund this ongoing work, curating the new uploaded data.

Everybody was very impressed with the app as it is currently available. And so of course those working on their own dictionaries were keen to find out whether they would be able to use the software to make apps with their own language data. The good news is that the data was imported from Toolbox format. At some stage it should be possible to either buy the software (and put the data in yourself), or pay to have an app for another language built using the existing architecture. Bruce mentioned that it should be possible to get a phrasebook for under $5000. He and Matthew de Moiser definitely seemed interested in producing similar apps for more languages. There is a Mawng phrasebook app in production already and plans to create apps for other languages of north-west Arnhem Land (Australia) as well. There is also interest from the Northern Territory interpreters centre to produce a phrasebook with 100 phrases in 17 languages.

It was clear from the discussion that there is a lot of interest in the potentials of crowd-sourcing technology and apps for fieldwork and language documentation in general, so no doubt we will be hearing more about them from Bruce and other’s working on apps in LIPs to come.
The iOS version of the Ma! Iwaidja app is now available for download from iTunes for free (android to follow):


