Technology and language documentation: LIP discussion

Lauren Gawne recaps last night’s Linguistics in the Pub, a monthly informal gathering of linguists in Melbourne to discuss topical areas in our field.

This week at Linguistics in the Pub it was all about technology, and how it impacts on our practices. The announcement for the session briefly outlined some of the ways technology has shaped expectations for language documentation:

The continual developments in technology that we currently enjoy are inextricably connected to the development of our field. Most would agree that technology has changed language documentation for the better. But while nobody is advocating a return to paper and pen, most would concur that technology has changed the way we work in unexpected ways. The focus is usually on the materials we produce such as video, audio and annotation files as well as particular types of computer-aided analysis. In a recent ELAC post, ‘Hammers and nails‘ Peter Austin claims that metadata is not what it was, in the days of good old reel-to-reel tape recorders. The volume of comments suggests that this topic is ripe for discussion. This session of Linguistics in the Pub will give us a chance to reflect on how our practices change with advances in technology.

There are a (very) few linguists who advocate that researchers should go to the field with nothing beyond a spiral-bound notebook and a pen, though no one at the table was quite willing to go that far; all of us, it seems, go to the field with a good quality audio recorder at the very least. Without the additional recordings (be they audio or video) the only output of the research becomes the final papers written by the linguist, which are in no way verifiable. The recording of verifiable data, and the slowly increasing practice of including audio recordings in the final research output are allowing us to further stake our claim as an empirical and verifiable field of scientific inquiry. Many of us shared stories of how listening back to a recording that we had made enriched the written records that we have, or allow us to focus on something that wasn’t the target of our inquiry at the time of the initial recording. The task of trying to do the level of analysis that is now expected for even the lowliest sketch grammar is almost impossible without the aid of recordings, let alone trying to capture the subtleties present in naturalistic narrative or conversation.

This means that now those in the business of field linguistics and language documentation are also corpora builders. We are creating resources to help us in our work, but these enduring records of a project are taking up more time in their management and are becoming the work. This means that we need to decide how to evaluate them as part of our research output. While there is currently an ALS-driven committee trying to figure out how we measure the research worth of a corpus this is proving to be a difficult question to answer. Some corpora are large, but scant in metadata and therefore not of great use, while others are small but include time aligned transcriptions of naturalistic video and audio data, photographs, scanned notes and accompanying lexicon files from toolbox.

All of this additional material takes time, often considerable time, especially when we also need to spend time entering this information into archives as is now the basic standard of good practice. However, this time is an investment that we recoup many times later in the process of analysis and working with our data. There is a tension between the need for analysis and the need to build the corpus, and more than ever we have to make a choice about what we can realistically do (or find money for RAs, which is not feasible in all cases, e.g. a PhD project), we also need to make or find ways of working more efficiently.

Technology is only useful if we know how to harness it correctly. As one person pointed out, there is often an expectation among less-experienced researchers that a tool will give you the analysis. Praat won’t tell you how many tones there are in a language, and Toolbox doesn’t magically create a lexicon for you to interlinearise with out of nothing. You can only get out of the tools what you put into them. It means that technology training also needs to be done in a less ad hoc way.

Having spent the first half of the discussion focusing mainly on software, we then turned to the technology we use while in the field. As the range of affordable technologies has grown the basic work kit of a linguist has too – the shrinking size of audio recorders is counteracted by the inclusion of video cameras as part of the standard kit, the improvement in batteries just means that now people take solar panels with them as well. We now try to run sessions with multiple video cameras, an audio recorder and try to coordinate all of these things while ensuring that the people we work with are comfortable with the whole process. This means that the field worker must become a jack/jill of an even greater array of trades. Aidan Wilson suggested that perhaps we are finally getting to the point where fieldwork cannot be a solo endeavour, but a team project between people with complementary skill sets.

Florian Hanke asked what the ideal field technology would be like. There was a general consensus that it would be less bulky, less demanding and allow us to focus on the main job of interacting with people and learning something about their language. We all agree that taking notes in the field, using a spiral-bound notebook and a pen, no less, is still absolutely critical in our workflow and no form of technology will ever make that obsolete.

Florian’s question then opened up to a general discussion about the ideal workflow for a language documentation project. There were the usual grumblings about how there was a lack of integration between various components of our software toolkit. There is no standardised accepted work flow for language documentation, and very few university programs offer formal training in these practices. While this is slowly changing, there are still many whose work practices lead to poor metadata outcomes, inconsistent file processing and idiosyncratic conventions. In many ways we still think of our jobs as language processing, not data processing.

We also talked about how technology has helped in creating resources beyond the linguist’s usual triptych of grammar, dictionary and interlinearised texts. Now it is easier for us to return recordings, and include subtitles, create story books and produce dictionaries from already existing data which makes them much less time consuming. It also means that communities have more chance to participate in recording their language and extending it to new domains. Florian talked about a mobile phone recording project in PNG that he and Steven Bird are working on, but there are also other interesting initiatives and online communities like Indigenous Tweets. There is also the new Google Endangered Languages website – this is not really a tool for linguists as all the data available here is accessible through OLAC or Ethnologue anyway – but it will be interesting to see how language communities interact with the site (well, those with computers/internet/English literacy anyway).

The increase in technology in our fieldwork doesn’t always naturally lead to better ways of interacting with the community. There are some times when the increase in the use of technology can conflict with the attitudes of the community, which is something many of us are aware of in our work. Also, while it is great that there are more initiatives where indigenous language speakers are being trained to use new technology we can’t always expect them to use the technology the same way that we do in our practice. For example, I’ve seen groups use video for word-list elicitation, which is probably not the best use of video resources but at the end of the day we have to decide if it is better they’re doing anything at all and encourage that.

While the general consensus was that advancements in technology have had good implications for our work practice, it is still worth taking the time to critically appraise our practices. We need to start thinking more critically about how we collect linguistic data, how we process that data in the course of our analysis and how we can derive the best output for our work, for the communities we work with and for archiving these materials for future access.

July’s Linguistics in the Pub will be held on the 17th, and will focus on a recent research paper that discusses terminology used by linguists and the media when covering language documentation efforts. The paper is called Zombie Linguistics. Stay tuned to the RNLD website for more info.