More searching

In a previous post I discussed ways in which it is possible to search for materials on endangered languages in various archives around the world (see also Nick Thieberger’s post on how much material doesn’t make it into archives). There is now another tool, namely the Virtual Language Observatory developed by the Max Planck Institute for Pyscholinguistics in Nijmegen (MPI-Nijmegen) under the multinational Clarin initiative.

The Virtual Language Observatory enables searching and visualisation in various ways, including via a Virtual Language World that uses Google Earth to locate languages on a world globe. It also has what it calls a Faceted Browser, which is essentially a structured catalogue of descriptive metadata harvested from the various collections that it indexes (including collections like DoBeS and Paradisec, as well as AILLA (though there are only 100 records from AILLA, all with the title “no name”, and all with broken hyperlinks)). The meta-metadata categories are:

Collection
Continent
Country
Organisation
Data Provider
Language
Genre
Subject
Resource Type

Interestingly, the categories listed under Genre and Subject, for example, show the same kind of non-standardised “relativist metadata mush” that Nick Thieberger has complained about in relation to ELAR at SOAS. Thus, under Genre we find:

narration
narration/description
narrative
narrative; discourse

and

conversation
converstion [sic]
discourse; conversation
natural conversation

and so on. Similarly Subject has an eclectic mix, including:

australain languages [sic]
australia–languages
australian aborigines–languages
australian languages

to take just one example (at least these are more or less adjacent in the listing, unlike some other terms).

Clicking on terms within the categories narrows down the search over the 117446 items indexed on the Faceted Browser. For “Collection: endangered languages” there are 17526 items listed, however for most of these the hyperlinks given under “Results” are broken, giving rise to messages like:

“No resources found”

“The requested URL /qfs1/version-archive/2011-07/11206/v1384773__.001-ache_kudja.wav was not found on this server.”

or, if you are lucky, a Shiboleth Identity Provider login screen (for use by those registered with the DoBeS archive). Even if materials are publicly available they cannot be directly played or viewed from the search interface.

Alexander König of MPI-Nijemgen gave a presentation about the Virtual Language Observatory at a workshop on How to make your language resources discoverable held at Oxford University Computing Services on Friday 24th June as part of the JISC-funded Discovering Babel project. Alex’s slides are available here.