Back in the old days when some of us were younger and starting out on our language documentation and description careers (for me in 1972, as described in this blog post) the world was pretty much analogue and we didn’t have digital hardware or software to think about.
Back then recordings were made with reel-to-reel tape recorders, like the Uher Report, or if you had really fancy kit a Nagra. For those of us working in Australia on Aboriginal languages you could archive your tapes at the Australian Institute of Aboriginal Studies (AIAS), as it then was, later the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS). They would copy your tapes onto their archive masters and return the originals to you and all you, as a depositor, had to do was fill in tape deposit sheets. You were supplied with a book of these, alternately white and green, with a sheet of carbon paper to be placed between them. For each tape you had to complete a white sheet listing basic metadata and a summary of the contents of the tape, tear off the white copies (keeping the green carbon copy) and submit them to the AIAS archive. In addition, the Institute encouraged the preparation of tape audition sheets where the content of the tapes was summarised alongside time codes (in minutes and seconds) starting from the beginning of the tape. Sometimes these were created by the depositor and sometimes by the resident linguist (at that time Peter Sutton).
So, if you wanted to find out where in your stack of tapes you could find Story X by Speaker Y you simply had to look at the deposit sheets and/or the audition sheets.
Alas, those days are gone and we are in the digital world, where our experience is mediated via software interfaces that can fool us into seeing the world the way the interface presents it. For language documenters Toolbox is often the software tool of analytical choice (along with ELAN)1 for the processing and value adding analysis and annotation of recordings. As I claimed in a previous post, the existence of Toolbox means that for many documenters annotational value adding only means interlinear glossing, and alternatives such as overview or summary annotation (like the old tape audition sheets) are not part of their tool set. I have two pieces of evidence for this:
- the Endangered Languages Archive (ELAR) at SOAS has so far received around 100 deposits comprising roughly 800,000 files. Among these deposits there are many that are made up entirely of media files (plus basic cataloguing metadata) with no textual representation of the content of the files beyond a short description in the cataloguing metadata. When asked about annotations, depositors typically respond that they “are working on transcription and glossing” but because of the time needed they cannot provide anything now. They do not seem to consider an alternative, namely time-coded overview annotation which can (and probably should) be done for all the media files, only some of which would then be selected and given priority for interlinear glossing. Why? One reason might be because there is no dedicated software tool designed and set up to do this in an easy and simple manner (interestingly a tool that can be so used, and that produces nice time-coded XML output is Transcriber, though it is generally thought of as a tool for transcription annotation only — it also does not have a “reader mode” that would allow for easy viewing and searching across a set of overview annotations created with it);
- during training courses and public presentations over the past couple of years I have been warning that current approaches to language documentation risk the creation of “data dumps” (which I have also called “data middens”) because researchers are not well trained in corpus and workflow management and additionally suffer from ILGB or “interlinear gloss blindness” which drives them to see textual value adding annotation in terms of the interlinear glossing paradigm2 The most recent example of such a presentation was during last months grantee training course at SOAS (the Powerpoint slides from my presentation are available on Slideshare). All but one of the grantees attending the training had never heard of, or considered creating, overview summary annotation before launching (selectively) into transcription and interlinear glossing of their recordings.
I may be wrong about the source of the current ILGB and perhaps Toolbox is not (solely) to blame, but I do believe that it plays a part in a narrowing of conceptual thinking about annotation in language documentation, and hence the behaviour of language documenters.
NB: Thanks to Andrew Garrett for his comments on my previous post that caused me to think more deeply about these issues and attempt to explicate and exemplify them more clearly here.
- ELAN is a tool designed for time-aligned transcription and annotation of media files, and is also widely used by language docunenters, bringing with it its own kind of window on the world that I do not discuss here
- There may be a separate further dimension to be concerned about that results from the shift from analogue to digital hardware, rather than being a software issue. In the old days tapes were expensive and junior researchers in particular only had access to a rationed supply and therefore had to think seriously about what and how much to record. Today, with digital storage being so cheap and easy to use (especially for copying and file transfer), there is a temptation to “record everything” on multiple machines (one or more video cameras plus one or more audio recorders) and not write much down because “you can always listen to it later”. This can easily and quickly give rise to megabytes of files to be managed and processed. I saw this temptation among the students taking my Fieldmethods course this year — they learned after a few sessions of working with the consultant this way about the pain that then comes from the need to search through hours of digital recordings for which they had few fieldnotes or metadata annotations.