“… You cannot sue us for libel, though we have exposed your characters, your secrets, and your private lives. Our protection lies in your unworldliness.”

(Mary and Elizabeth Durack. 1935. All-About: The story of a Black Community on Argyle Station, Kimberley. Sydney: The Bulletin)

The issue of Open Access has recently been brought to the attention of many language documenters. For example, ELDP, funded by Arcadia, recently announced an Open Access policy in its guidelines, and the same policy may eventually be applied to ELAR. But is Open Access relevant and appropriate for language documentation and its archiving? I offer the following initial thoughts:

A conflation of genres. Open Access (OA) is a movement promoting the free availability of publicly-funded scholarly research, as a reaction to the commercial control exerted by established academic publishers (see Arcadia and Wikipedia on the topic). However, language documentation materials, as recordings and representations of the range of “the linguistic practices characteristic of a given speech community” (Himmelmann 1998) are neither scholarly research nor completely reducible to the data that informs that research. Documenters know that the most linguistically authentic resources – recordings of spontaneous language events in social contexts – often raise questions of privacy, safety, or of controversial or secret/sacred content. Every major archiving software implements access control mechanisms (eg DSpace, Fedora, as well as language archives such as ELAR, AILLA, DoBeS, ANLA, California Language Archive, Paradisec, MURA etc). The literature of our field abounds with over a decade of discussion of the need to “respect the speakers’ confidentiality, anonymity, and privacy” (Tsunoda 2002). And every substantive language archive publicly recognises in one way or another that access to some language documentation materials needs access management. Arcadia also acknowledges that OA is not applicable across-the-board: “Open Access is solely concerned with the literature that is intended for publication so research outputs that are classified or commercial-in-confidence are exempt from OA”.

OA aims in the wrong direction. Current research and practice in language documentation and archiving is towards more – not less – source community and individual control over resources which they originated. While such community control is only recently acknowledged by (some) language archives, it has been part of language documentation and fieldwork theory, practice and training for nearly 20 years. As far back as 1997, the Indigenous workshop “The bush Track Meets the Information Superhighway“, while clearly recognising the benefits of new media, stated that the issue “of greatest concern to all communities … was the lack of moral rights and intellectual property protection for Indigenous Australians”. The most recent thinking supports originating community and individual curation, representation, and access moderation of resources (see the forthcoming papers in D. Nathan and P. Austin (eds) Language Documentation and Description 12).

OA is but one perspective on access. First, we see the rapid rise of “as permissive as possible” Creative Commons licencing. This system takes a nuanced line between sharing on the one hand, and recognition of intellectual and distribution rights on the other, and has been innovatively applied to traditional knowledge including language. See also this legal discussion of the issues. Secondly, even more broadly, recent revelations about NSA and other agencies’ intrusions on individual privacy have highlighted sensitivities towards the ordinary person’s right to privacy. OA is not a fresh new idea; while it is a worthy reaction to commercialised monopoly by big publishers, it is inherent in the long-established idea of the public library and institutions such as 19th century workmen’s institutes (I am grateful to Tony Woodbury for this point). Personal control of an individual’s data and its usage are today’s new ideas.

OA will lead, paradoxically, to less access. I have spoken informally to many documenters and archivists who believe that if informed consent were required to be obtained from speakers for unmoderated access to language recordings, then only a very limited and highly skewed range of materials could be collected and archived. This in turn would completely negate the goals and methods of documentary linguistics in the first place. People will be wary of depositing anything that possibly has sensitive (or unknown or unchecked) content. Language documentations typically consist of hundreds or thousands of files, so there are highly likely to be some which have potentially sensitive or unchecked/uncleared content. It is easy enough to make safe and unambiguous statements about access to 3 or 4 files, and the fear is that this is what would be deposited under an enforced OA regime.

There are already agreements in place. Attempts to impose Open Access retrospectively may breach existing agreements (such as archive’s deposit agreements), or laws. Systems of “graded access” (such as used at AILLA ) and “negotiated access” (used at ELAR) are necessary policy and technical infrastructure for the legal deposit of documentation materials according to US law (pc, Tony Woodbury) unless specific permission has been given for dissemination. Renegotiating access agreement properly (ethically, legally) may be prohibitively expensive.

‘Tis better to err on the side of safety. British doctor and journalist Ben Goldacre recently apologised in the press for his former support of the sharing of NHS patient medical records. In recanting, he wrote that public approval of OA to private data is not just a matter of “better PR”. Rather, “it’s like nuclear power. Medical data presents huge power to do good, but it also presents huge risks. When leaked, it cannot be unleaked; when lost, public trust will take decades to regain”. (Source: “ is in chaos”, The Guardian, 1st March 2014; online version).

Recording the unrecorded. OA’s goal is for public release of published materials, or materials created for publication. We often hear claims along the lines that in 2013, more information was produced in 10 minutes than the whole of human history up till 2002 (Nick Clegg, “Security oversight must be fit for the internet age”, The Guardian 4th March 2014; online version). Presumably this claim relates to information that has been recorded somehow; linguists know that it is unlikely to be literally true, as humans have been speaking, signing (and communicating in other ways) for a couple of hundred thousand years, even if only a tiny fraction was recorded (including as oral history, song etc). In fact, it is precisely the typically unrecorded exchanges that are highly valued in language documentation. Furthermore, social regulation of access to information has always been part and parcel of this vast tide of human communications.

A distraction. As I have written elsewhere, “access” is a complex matter going far beyond whether a file can be downloaded or not, or whether it is charged for or not. It also includes the effectiveness of the repository’s discovery, search, browse, navigation, means of negotiating, and the packaging and accessibility of the content. In actual practice, providing effective access for a variety of audiences requires addressing quite a range of issues, so that a push for (or against!) OA is little more than a distraction, especially in the case of archives that are legendarily difficult to use. (In this context, I also note some archives making alarmingly non-scientific claims about audiences they serve, and assessing the value of an archive on how “bespoke” its underlying technology is [1]).

Access is not binary. The idea that a resource is either “open” or closed” is an outdated and unnecessary concept. Like access to human knowledge over the ages, and so many other parts of our contemporary life (including access to personal information though social media), access is something that can be negotiated. One might argue that language documentation is too important for the survival of languages and its contribution to linguistics for simplistic fundamentalisms to rule the way that archives work. Several archives already implement forms of “graded access”. Consider this example: ELAR uses a form of “negotiated access” through its Subscriber category (which implements communication between the requesting user and the depositor/delegate). This means that an initially inaccessible resource can be permitted to be accessed once a user provides sufficient context, motivation etc – whatever satisfies the depositor or the people s/he represents. Now, given that on average 95% of such Subscriber requests at ELAR are granted, how closed is that? Does granting access to 95% of those who want to use a resource make it “closed” (or “open”)? Does not such a mechanism also help develop trust, provide additional safety (see above), and provide useful channels for information exchange between user and depositor which can further enhance the usefulness of the resource?

A terminological confusion. We may be likely to find the term OA (misleadingly) familiar because it has long been used been used by archives and libraries in the phrase “on open access”; but this phrase referred to a simple access setting (unrestricted access), not a policy or publication agenda.

In conclusion, pushing endangered language archives, their depositors and their contributors towards Open Access represents a serious confusion of (i) a worthy challenge to large commercial publishers’ distribution monopoly of publicly funded research, with (ii) unmoderated access to recordings which may contain personal, identifying, or sensitive content. However, there is no moral or scientific equivalence between challenging distribution monopolies and revealing of personal or sensitive information provided by language community members. An across-the-board policy of Open Access to endangered languages recordings is a new form of pseudo-scientific colonialism.


[1] ELAR – in its present guise, anyway – is based on Drupal, a free and open-source content management system increasingly used by archives and used by over a million websites worldwide. Other significant language archives are actually “bespoke”.


Himmelmann, Nikolaus (1998) “Documentary and descriptive linguistics”. Linguistics 36:166. Berlin: de Gruyter

Tsunoda, Tasaku (2002) “Documentation of Endangered Languages: methods and problems”, in Conference handbook on Endangered Languages, Kyoto


  1. Martin Haspelmath says:

    This post does not consider the possibility that community members might actually be very happy that their language is displayed prominently on the web. They might want to share their texts (or texts produced by their friends or relatives) via e-mail or Facebook, and they might be happy about the thought that somewhere on a distant continent some linguist might find it fascinating to study their language.

    The general approach may be unduly influenced by the special situation in Australia, where indigenous people often seem to be afraid that something is being taken away from them. I’d think that in many other parts of the world (Africa, Asia, Oceania, Europe, much of Latin America), people are far less apprehensive, and are happy to see their language widely appreciated. Maybe the Australian and North American experience should not be generalized too quickly.

  2. Martine Mazaudon says:

    Dear Martin, it seems to me that David Nathan’s post does “consider the possibility that community members might actually be very happy that their language is displayed prominently on the web.” I do not read this post as saying that nothing should be open but that not everything can be open. and certainly,one of the most interesting type of data for our research is spontaneous conversation, which our friends allow us to record because they trust that we are planning to use it for our own personnal research, which is language analysis (not gossip analysis as village neighbours might use it for, if it was made public to all).
    Our Pangloss site at the Lacito is all open, and we support this idea very strongly, but we are also extremely cautious to exclude anything that could be even remotely objectionable to the speaker.
    It would probably be useful to have some separate site to keep texts that risk being marginally private, and which could be shared with authorized other researchers, or released under a hundred year protection rule. The absence of confidence in institutions to keep their word about previously set protections would totally ruin any such site. The idea of “Attempts to impose Open Access retrospectively” should be abhorent to any responsible researcher: without mentioning cases where actual endangerement is possible, how would anyone of us feel if our aunt so and so knew that we made a tasteless joke about her new dress in an unguarded conversation? It seems to me that only the depositor or the speaker should be able to change a level of protection once set.

