Open access and intimate fieldwork

A report on the Linguistics in the Pub discussion Tuesday 11th March, Prince Alfred Hotel, Grattan St, Melbourne.

This Linguistics in the Pub discussion brought together fieldworkers who do research in Indigenous Australia, Africa, South Asia, Papua New Guinea and Nepal, as well as a computational linguist who has developed software to automate language documentation. The linguists were not all Australian, in fact we were lucky to have four participants who identify as European who are living in Australia, temporarily or permanently. The linguists’ experience in language documentation ranged from between 6-30 years and between them had deposited in the digital archives: DoBeS, Paradisec and ELAR. The timeliness of this discussion is demonstrated by David Nathan’s very recent ELAC post on the same topic.

Some background on why I suggested this topic for Linguistics in the Pub

The early impetus for digital archiving as we know it was to save recordings which might otherwise deteriorate and be unusable for the benefit of researchers, speakers and their descendants. The image of rotting reel to reel tapes in an old suitcase in the garage was often evoked. Thankfully, those documenting endangered languages now have a range of functional digital archives where the material that they collect can be safely stored and accessed. Of course not all fieldworkers do archive their materials, but this discussion aims to put forward the viewpoint of those who do: a group of hard-working linguists who take seriously both their responsibilities to the communities of speakers that they work with and the community of academics that they work with.

The focus of digital archives, however, is now on making collections as accessible as possible. Archives of endangered languages allow depositors to select from a range of access options which typically include: unrestricted access (often referred to as open-access), access after clicking on an agreement that the materials will not be used in certain ways, access using a password, access only by request to depositor and closed access. While accessibility is one of the great benefits of digital archiving, field linguists are under increasing pressure to make online access to their collections fairly unrestricted. At least two digital archives have been regularly contacting depositors and request that restrictions on accessing their collections be reduced.

While the role of digital archives seems to have shifted, the kinds of material that linguists are recording has also changed. In fact the contemporary field linguist is now likely to create hundreds of hours of digital recordings during their career, rather than a box of tapes, mini-discs etc. Linguists are also likely to be recording in intimate settings in small communities, where anonymity of participants is not possible. The desire for more naturalistic data means that many linguists want to record more than just tellings of myths, word lists and elicited sentences. Increasingly they are interested in recording informal interactions between intimates: family members and close friends.

Pressure on fieldworkers to make their recordings more accessible is coming from a number of quarters. Unrestricted access to primary research data is seen by some lobby groups as something that is in some obvious way best for everyone. There are also reports that language archives are under pressure to make data more easily accessible, to justify their existence to funding bodies. In addition, many researchers within linguistics would like to have digital access to corpora for smaller languages in the same way that they do for languages such as English, French, German, Italian and Japanese.

There are a number of reasons that linguists and the communities they work with might not want to give unrestricted access to their data online. This may be due to lack of anonymity in small communities and the informal, intimate settings in which data was recorded (Travis and Cacoullos 2013). The research may focus on how people gossip, which means that most recordings are defamatory in nature (Haviland 1977). Negotiating access with communities can also take time – particularly if a collection spans a number of decades and involves a large number of participants.

The Linguistics in the pub discussion

The discussion at Linguistics in the Pub (LIP) on Tuesday night ranged as usual around a large of topics from which I have selected a few here. There was generally a surprising amount of agreement among the group. Most of those who had deposited materials with digital archives converged on the conclusion that it not ideal to mandate that access to entire collections be unrestricted. Most saw the best approach for those requiring access to contact the depositor. There are a number of valid reasons for this. The most important reason is so that the depositor can consult with the community where the recordings were made. The second reason is that most researchers are interested in sharing ideas with others who are working on the language data they collected. For this reason, they would like to know who is working on the collection.

An example was given of an honours student who was given access to a collection to research intonation. The honours student did not communicate with the depositor about which part of the collection to use and chose a session which had very poor sound quality and intonation by the speakers was affected by their drunkenness. The honours student did not complete, possibly because of the difficulty of working with the data. Numerous other examples of this kind of problem, easily prevented by a little communication between researchers, has occurred. However conversational data in particular tends to raise issues. It is very difficult to analyse in the absence of ample contextual information – not all of this can be captured in metadata. Another reason not to give unrestricted access to materials on the internet is that it is not always possible to get informed consent for this from communities and speakers. In many communities, speakers do not have a good grasp of how the internet enables the flow of information. This is particularly the case among older speakers and in areas where there is no internet access.

Those at LIP agreed with David Nathan’s point that if pressure to reduce restrictions on access increases we will be left with a very poor representation of endangered languages in the digital archives. One participant felt that only edited videos from their collection should be made available in a relatively unrestricted way. The community I work with is happy to make word lists, elicitation sessions and narratives available in an unrestricted way online but not recordings involving children or conversational data recorded in informal settings. This said it should be possible for the community to restrict access to any recording, should it be deemed sensitive at some point in the future.

The main issue with people having to contact the depositor to access recordings is what to do when the depositor is unwell or dead. We discussed here the idea of stewardship that is developing among the open source software community. The idea is that the writer of the open source code can pass on to somebody else, the responsibility to look after that code. Those present agreed that there are no easy answers to the problem of what to do when the depositor is not available. However when they are, they are the best person to be making negotiations that relate to the data they deposited.

A few more reflections on discussions with linguists at LIP and elsewhere

We have only recently left an era in which it was assumed that researchers’ activities contribute to the advancement of scientific knowledge and thus for the good of all mankind. In that era, not surprisingly, concern for the rights of Indigenous communities were completely absent. Ideas about what knowledge is and who has a right to access what information vary greatly from culture to culture. See Franchetto (2010) for example, for an insightful discussion of how she and the Amazonian community she works with in Brazil gradually came to understand one anothers’ points of view about matters of knowledge and rights to access it.

It is not at all clear that Western culture has all the answers about questions of knowledge and new digital technology – there is much discussion of how we should manage our digital footprints. Take sexting by teenagers or reality TV for example. A survey of Australian Big Brother participants found that many regretted having appeared on the show. No matter how old they get, many will always know them as a drunk 19 year old. Should we be exposing the communities we work with in the same way, when they have even less chance of understanding the consequences of making recordings of themselves than Big Brother participants?

As fieldworkers we try to make bridges between Western academic culture and communities which inevitably have quite different views of what we are doing when we do fieldwork. The longer the depth of experience a fieldworker has, the more they seem to err on the side of caution – this is not a coincidence. These experienced, principled and responsible fieldworkers have contributed many insights into the nature of language. Pressuring fieldworkers to lower access restrictions is disrespectful, unneccessary and unhealthy for the field of language documentation and research in linguistics in general. Fieldworkers already play a delicate balancing act with the responsibilities they have to the academic community and the community they do research in.

In response to Martin Haspelmath’s comment on David Nathan’s ELAC post, I would like to point out that there is no evidence that the concerns raised here are restricted to Australian and North American contexts. It would be worth doing a study of the main digital archives to see whether there is a correlation between the continent in which data was collected and the access levels set for the data. As mentioned earlier, nearly half the Linguistics in the pub identified as European and the participants’ fieldwork experience included Papua New Guinea, South Asia and Africa as well as Australia. Haspelmath is right that some communities are happy for access to their recordings to be unrestricted but some are not. There is ample literature written by fieldworkers in many parts of the world to suggest that attitudes to endangered language data differs greatly among endangered language communities (see for example Franchetto 2010 mentioned earlier).

To be completely clear, this discussion is about unrestricted access to raw language documentation data online. All the fieldworkers at LIP agreed that open access to metadata is invaluable, but that unrestricted access should not be mandated by archives for primary fieldwork data. The LIP discussion on Tuesday night and the random survey of fieldworkers that I have carried out over the past month, suggests to me that there is a growing divide between fieldworkers and archive managers on this issue. We need discuss these issues more so that the valuable digital archives we have developed over the last decade continue to grow in a way that best serves endangered language communities and linguists of all kinds.

Franchetto, B. 2010. ‘Bridging Linguistic Research and Linguistic Documentation’. (Ed.) Flores Farfán, J. A. & R. Fernando. New Perspectives on Endangered Languages: Bridging gaps between sociolinguistics, documentation and language revitalization 1: 49.

Haviland, J. 1977Gossip, Reputation, and Knowledge in Zinacantan. Chicago: Chicago University Press.

Travis, C. E. and R. T. Cacoullos2013. ‘Making Voices Count: Corpus Compilation in Bilingual Communities’. Australian Journal of Linguistics 33.2.

Vienne, E. de and R. Guirardello-Damian. 2008. ‘Working Together The Interface between Researchers and Native People: The Trumai Case’. (Ed.) Harrison, K. D., D. S. Rood & Arienne Dwyer. Lessons from Documented Endangered Languages 78: 43.

Acknowledgements (or… a rather weak attempt at humour)

Thanks to all the linguists who have allowed me to repeat their anecdotes. Your names have been omitted but since the language documentation field is a small community it is likely that others will guess your identity. To enable free and spontaneous discussion, we never record Linguistics in the Pub, although a number of people who do not live in Melbourne have contacted me and requested this kind of access. So since we did not record it, we will not need to discuss setting levels of access to the archived recording session of the 11th March discussion at the next meeting of Linguistics in the Pub.




5 thoughts on “Open access and intimate fieldwork”

  1. Thanks for this summary of what must have been an interesting and lively discussion.

    On your point that “there is a growing divide between fieldworkers and archive managers” it might be interesting to readers of this blog to know that following David Nathan’s retirement last month, management of ELAR has been taken over by the Director of ELDP, the granting arm of the Endangered languages Project at SOAS. Also, the Digital Content Curator at ELAR will be leaving in April. It was announced to depositors last Friday that the new management is “also rethinking our policies in certain areas”. Whether or not this includes policies on “open access” remains to be seen, although ELDP declared in the 2013 HRELP Annual Report (p11) that it “supports an open access policy”.

  2. There is no question that giving unrestricted access to intimate recordings is unacceptable.

    But was there no discussion of the extent to which it is acceptable to even record intimate situations, and store the recordings? One possible (radical) position would be that if a speech event cannot be shared with everyone, then it should not be recorded in the first place, let alone put in an archive where it could potentially be shared with others.

    As we now know, the NSA and the GCHQ have access to everything that exists in electronic form. “In many communities, speakers do not have a good grasp of how the internet enables the flow of information.” That would seem to include the community of academic fieldworkers. So to “err on the side of caution” may mean that no intimate situations are recorded (except perhaps on paper, in a form that is accessible only to intimates).

    Of course, this would restrict the possibilities for research, but the responsibility for our research collaborators has to take precedence.

  3. A further thought on Martin’s (possible) radical position: “if a speech event cannot be shared with everyone, then it should not be recorded in the first place”. Interpreting “recording” broadly, perhaps to include writing down after the fact, remembering and (possibly but not necessarily) sharing it with others, and so on, does this stance not verge on absurdity by practically outlawing communication itself? Speech events are not usually shared with everyone. Sometimes they are private, and generally they are directed to a single interlocutor. I do not want the world to broadcast what I tell my best friend. Surely I am accountable for what I say, but there is still a difference between on the record and off the record speech.

    It isn’t too difficult to come up with kinds of recordings that need to be made, even though they mustn’t (normally) be shared with others. For example, consider an “audio will”. I take this as a recording that can be legitimately shared with only a few people before the provider’s death, and then with only a few more people after the provider’s death. Another example would be a recorded promise. There would be no reason to share the recording unless the promise is broken. I take it that Martin’s (possible) position would also exclude the archiving of home movies. Again, many people want to make home movies, but not as many want to share them with the whole world.

    The danger, I believe, in adopting Martin’s (possible) position is that if it becomes commonplace to assume that linguists and others enact their ethical responsibility to their research collaborators by assuming that they are not recording intimate situations, and therefore that they are by definition doing work of a non-sensitive nature, then there will be little or no motivation to implement the kinds of safeguards that this and the previous post have tried to highlight.

  4. Thanks for having this discussion, it remains a difficult topic and I very much agree with your conclusion that more discussion of this kind is needed. Let me add a few remarks from an archive managers’ perspective.

    I don’t think any archive of endangered languages will mandate raw documentation materials to be made publicly accessible across the board, that would simply be unacceptable both ethically as well as legally (unless you follow Martin’s suggestion and have the speaker’s consent for those materials that you do archive to be made public). Many language archives offer a range of access levels such that a suitable level can be defined on a file by file basis. It is true though that we ask our depositors from time to time to make their materials as openly available as possible. What this means is that we ask them to make the materials available either completely open or after (automated) user registration and agreeing to a license agreement – unless there are serious reasons to keep access more restricted. Protecting the privacy or respecting the wishes of the members of the language communities can obviously be such reasons, as well as the fact that a PhD student may still be writing up a thesis about the materials.

    Language archives are under pressure from funders to show that they are actually being used by academics other than the depositors themselves, and current usage statistics in this respect are well below par (I can only speak for our archive but I’m guessing this is true for most). Access restrictions are of course only a part of the story, but I do believe that a lot of potential users are scared away even by the need to register alone and more so by the need to contact the depositor to ask for permission. If the usage of our archives does not increase, there may not be very many of them to choose from in the not too distant future. Language documenters might then have no other option than to deposit their materials with general purpose research data archives, with less sophisticated access management options.

  5. The weekend uproar over Nicholas Rothwell’s article on the Ngintaka exhibition at South Australian Museum:

    and the response by the curator Diana James:

    are a good example of why archives should take notice of requested restrictions on materials. The risk is putting entire continents of cultures off side through the perceived disregard for indigenous knowledge systems. The discourse of ‘linguists stealing language’ is already entrenched in many areas in Australia and has prevented potentially rich collaborations between language communities and linguists. It’s quite simple – archives and linguists need to give due respect to the wishes of indigenous people, their language and culture, and stop using science as a defense for propagating the effects of marginalisation.

