A report on the Linguistics in the Pub discussion Tuesday 11th March, Prince Alfred Hotel, Grattan St, Melbourne.
This Linguistics in the Pub discussion brought together fieldworkers who do research in Indigenous Australia, Africa, South Asia, Papua New Guinea and Nepal, as well as a computational linguist who has developed software to automate language documentation. The linguists were not all Australian, in fact we were lucky to have four participants who identify as European who are living in Australia, temporarily or permanently. The linguists’ experience in language documentation ranged from between 6-30 years and between them had deposited in the digital archives: DoBeS, Paradisec and ELAR. The timeliness of this discussion is demonstrated by David Nathan’s very recent ELAC post on the same topic.
Some background on why I suggested this topic for Linguistics in the Pub
The early impetus for digital archiving as we know it was to save recordings which might otherwise deteriorate and be unusable for the benefit of researchers, speakers and their descendants. The image of rotting reel to reel tapes in an old suitcase in the garage was often evoked. Thankfully, those documenting endangered languages now have a range of functional digital archives where the material that they collect can be safely stored and accessed. Of course not all fieldworkers do archive their materials, but this discussion aims to put forward the viewpoint of those who do: a group of hard-working linguists who take seriously both their responsibilities to the communities of speakers that they work with and the community of academics that they work with.
The focus of digital archives, however, is now on making collections as accessible as possible. Archives of endangered languages allow depositors to select from a range of access options which typically include: unrestricted access (often referred to as open-access), access after clicking on an agreement that the materials will not be used in certain ways, access using a password, access only by request to depositor and closed access. While accessibility is one of the great benefits of digital archiving, field linguists are under increasing pressure to make online access to their collections fairly unrestricted. At least two digital archives have been regularly contacting depositors and request that restrictions on accessing their collections be reduced.
While the role of digital archives seems to have shifted, the kinds of material that linguists are recording has also changed. In fact the contemporary field linguist is now likely to create hundreds of hours of digital recordings during their career, rather than a box of tapes, mini-discs etc. Linguists are also likely to be recording in intimate settings in small communities, where anonymity of participants is not possible. The desire for more naturalistic data means that many linguists want to record more than just tellings of myths, word lists and elicited sentences. Increasingly they are interested in recording informal interactions between intimates: family members and close friends.
Pressure on fieldworkers to make their recordings more accessible is coming from a number of quarters. Unrestricted access to primary research data is seen by some lobby groups as something that is in some obvious way best for everyone. There are also reports that language archives are under pressure to make data more easily accessible, to justify their existence to funding bodies. In addition, many researchers within linguistics would like to have digital access to corpora for smaller languages in the same way that they do for languages such as English, French, German, Italian and Japanese.
There are a number of reasons that linguists and the communities they work with might not want to give unrestricted access to their data online. This may be due to lack of anonymity in small communities and the informal, intimate settings in which data was recorded (Travis and Cacoullos 2013). The research may focus on how people gossip, which means that most recordings are defamatory in nature (Haviland 1977). Negotiating access with communities can also take time – particularly if a collection spans a number of decades and involves a large number of participants.
The Linguistics in the pub discussion
The discussion at Linguistics in the Pub (LIP) on Tuesday night ranged as usual around a large of topics from which I have selected a few here. There was generally a surprising amount of agreement among the group. Most of those who had deposited materials with digital archives converged on the conclusion that it not ideal to mandate that access to entire collections be unrestricted. Most saw the best approach for those requiring access to contact the depositor. There are a number of valid reasons for this. The most important reason is so that the depositor can consult with the community where the recordings were made. The second reason is that most researchers are interested in sharing ideas with others who are working on the language data they collected. For this reason, they would like to know who is working on the collection.
An example was given of an honours student who was given access to a collection to research intonation. The honours student did not communicate with the depositor about which part of the collection to use and chose a session which had very poor sound quality and intonation by the speakers was affected by their drunkenness. The honours student did not complete, possibly because of the difficulty of working with the data. Numerous other examples of this kind of problem, easily prevented by a little communication between researchers, has occurred. However conversational data in particular tends to raise issues. It is very difficult to analyse in the absence of ample contextual information – not all of this can be captured in metadata. Another reason not to give unrestricted access to materials on the internet is that it is not always possible to get informed consent for this from communities and speakers. In many communities, speakers do not have a good grasp of how the internet enables the flow of information. This is particularly the case among older speakers and in areas where there is no internet access.
Those at LIP agreed with David Nathan’s point that if pressure to reduce restrictions on access increases we will be left with a very poor representation of endangered languages in the digital archives. One participant felt that only edited videos from their collection should be made available in a relatively unrestricted way. The community I work with is happy to make word lists, elicitation sessions and narratives available in an unrestricted way online but not recordings involving children or conversational data recorded in informal settings. This said it should be possible for the community to restrict access to any recording, should it be deemed sensitive at some point in the future.
The main issue with people having to contact the depositor to access recordings is what to do when the depositor is unwell or dead. We discussed here the idea of stewardship that is developing among the open source software community. The idea is that the writer of the open source code can pass on to somebody else, the responsibility to look after that code. Those present agreed that there are no easy answers to the problem of what to do when the depositor is not available. However when they are, they are the best person to be making negotiations that relate to the data they deposited.
A few more reflections on discussions with linguists at LIP and elsewhere
We have only recently left an era in which it was assumed that researchers’ activities contribute to the advancement of scientific knowledge and thus for the good of all mankind. In that era, not surprisingly, concern for the rights of Indigenous communities were completely absent. Ideas about what knowledge is and who has a right to access what information vary greatly from culture to culture. See Franchetto (2010) for example, for an insightful discussion of how she and the Amazonian community she works with in Brazil gradually came to understand one anothers’ points of view about matters of knowledge and rights to access it.
It is not at all clear that Western culture has all the answers about questions of knowledge and new digital technology – there is much discussion of how we should manage our digital footprints. Take sexting by teenagers or reality TV for example. A survey of Australian Big Brother participants found that many regretted having appeared on the show. No matter how old they get, many will always know them as a drunk 19 year old. Should we be exposing the communities we work with in the same way, when they have even less chance of understanding the consequences of making recordings of themselves than Big Brother participants?
As fieldworkers we try to make bridges between Western academic culture and communities which inevitably have quite different views of what we are doing when we do fieldwork. The longer the depth of experience a fieldworker has, the more they seem to err on the side of caution – this is not a coincidence. These experienced, principled and responsible fieldworkers have contributed many insights into the nature of language. Pressuring fieldworkers to lower access restrictions is disrespectful, unneccessary and unhealthy for the field of language documentation and research in linguistics in general. Fieldworkers already play a delicate balancing act with the responsibilities they have to the academic community and the community they do research in.
In response to Martin Haspelmath’s comment on David Nathan’s ELAC post, I would like to point out that there is no evidence that the concerns raised here are restricted to Australian and North American contexts. It would be worth doing a study of the main digital archives to see whether there is a correlation between the continent in which data was collected and the access levels set for the data. As mentioned earlier, nearly half the Linguistics in the pub identified as European and the participants’ fieldwork experience included Papua New Guinea, South Asia and Africa as well as Australia. Haspelmath is right that some communities are happy for access to their recordings to be unrestricted but some are not. There is ample literature written by fieldworkers in many parts of the world to suggest that attitudes to endangered language data differs greatly among endangered language communities (see for example Franchetto 2010 mentioned earlier).
To be completely clear, this discussion is about unrestricted access to raw language documentation data online. All the fieldworkers at LIP agreed that open access to metadata is invaluable, but that unrestricted access should not be mandated by archives for primary fieldwork data. The LIP discussion on Tuesday night and the random survey of fieldworkers that I have carried out over the past month, suggests to me that there is a growing divide between fieldworkers and archive managers on this issue. We need discuss these issues more so that the valuable digital archives we have developed over the last decade continue to grow in a way that best serves endangered language communities and linguists of all kinds.
Franchetto, B. 2010. ‘Bridging Linguistic Research and Linguistic Documentation’. (Ed.) Flores Farfán, J. A. & R. Fernando. New Perspectives on Endangered Languages: Bridging gaps between sociolinguistics, documentation and language revitalization 1: 49.
Haviland, J. 1977. Gossip, Reputation, and Knowledge in Zinacantan. Chicago: Chicago University Press.
Travis, C. E. and R. T. Cacoullos. 2013. ‘Making Voices Count: Corpus Compilation in Bilingual Communities’. Australian Journal of Linguistics 33.2. http://www.tandfonline.com/eprint/AczMKmB66tiHQanZwHjf/full
Acknowledgements (or… a rather weak attempt at humour)
Thanks to all the linguists who have allowed me to repeat their anecdotes. Your names have been omitted but since the language documentation field is a small community it is likely that others will guess your identity. To enable free and spontaneous discussion, we never record Linguistics in the Pub, although a number of people who do not live in Melbourne have contacted me and requested this kind of access. So since we did not record it, we will not need to discuss setting levels of access to the archived recording session of the 11th March discussion at the next meeting of Linguistics in the Pub.