I had a message from the ‘pop up archive‘ to say they are closing down and I should download my data. They were a website that allowed users to upload audio files that were then meant to be prepared for searching via automated recognition of features in the file.

Leaving aside the functionality of the site (I admit I did not get it to work with my files), I want to reiterate my frustration with websites that call themselves archives (ok, so in this case the title ‘pop up’ should have been a giveaway), only to disappear at the end of a funding cycle or the retirement of the researcher.

In part this frustration is also motivated by a recent project in which I compared languages that have little representation in the OLAC listing (see the earlier discussion of this here) of holdings in the world’s language archives but have had a grammar written recently. If a linguist has worked on a language in the past thirty or so years then it would be reasonable to expect that some primary records were produced, and that they should be in an archive. They may be in a repository that is not part of OLAC, in which case we can create a record to point to that collection. If they are not in any archive, the task is to ask the linguist if they need help to get the records into an archive. At PARADISEC we have been doing this, partly through our ‘Lost and Found’ survey, which has resulted in a number of collections of analog tapes being digitised and made available.

When I sent out a message to each of the authors of these grammars asking about the location of primary records, the responses were split between those who have made provision for their records in an archive that may or may not be in OLAC, those who have put some examples into an online website (and apparently consider that to be an archive), and those who do not think they need to do anything at all. The vast majority did not respond at all. The problem seems to be that most people involved in documenting languages do not prioritise archiving of their primary records.

The following is a useful guide to archives, produced by Susan Kung at the Digital Endangered Languages and Musics Archives Network (DELAMAN)Finding an Archive for your (Endangered) Language Research Data

The PARADISEC Deposit page also discusses archival formats for files.


Archives curate files by:
– applying standards for data formats both to ensure longevity and to migrate files to new formats over time
– using community-agreed metadata standards that export to the Open Language Archives Community (to increase findability)
– providing backups in several locations
– providing access conditions for the contents of the collection as specified by the depositor in a deposit agreement
– providing persistent identification of the parts of the collection
– making items available in formats suitable for web-delivery (downsampled versions)
– providing a catalog that uses language identifiers and other terms for finding participant names, their roles, the place associated with the records, when it was produced, and may also allow for parts of the catalog to be written in the language in question.

Leave a Reply