More searching

In a previous post I discussed ways in which it is possible to search for materials on endangered languages in various archives around the world (see also Nick Thieberger’s post on how much material doesn’t make it into archives). There is now another tool, namely the Virtual Language Observatory developed by the Max Planck Institute for Pyscholinguistics in Nijmegen (MPI-Nijmegen) under the multinational Clarin initiative.

The Virtual Language Observatory enables searching and visualisation in various ways, including via a Virtual Language World that uses Google Earth to locate languages on a world globe. It also has what it calls a Faceted Browser, which is essentially a structured catalogue of descriptive metadata harvested from the various collections that it indexes (including collections like DoBeS and Paradisec, as well as AILLA (though there are only 100 records from AILLA, all with the title “no name”, and all with broken hyperlinks)). The meta-metadata categories are:

Collection
Continent
Country
Organisation
Data Provider
Language
Genre
Subject
Resource Type

Interestingly, the categories listed under Genre and Subject, for example, show the same kind of non-standardised “relativist metadata mush” that Nick Thieberger has complained about in relation to ELAR at SOAS. Thus, under Genre we find:

narration
narration/description
narrative
narrative; discourse

and

conversation
converstion [sic]
discourse; conversation
natural conversation

and so on. Similarly Subject has an eclectic mix, including:

australain languages [sic]
australia–languages
australian aborigines–languages
australian languages

to take just one example (at least these are more or less adjacent in the listing, unlike some other terms).

Clicking on terms within the categories narrows down the search over the 117446 items indexed on the Faceted Browser. For “Collection: endangered languages” there are 17526 items listed, however for most of these the hyperlinks given under “Results” are broken, giving rise to messages like:

“No resources found”

“The requested URL /qfs1/version-archive/2011-07/11206/v1384773__.001-ache_kudja.wav was not found on this server.”

or, if you are lucky, a Shiboleth Identity Provider login screen (for use by those registered with the DoBeS archive). Even if materials are publicly available they cannot be directly played or viewed from the search interface.

Alexander König of MPI-Nijemgen gave a presentation about the Virtual Language Observatory at a workshop on How to make your language resources discoverable held at Oxford University Computing Services on Friday 24th June as part of the JISC-funded Discovering Babel project. Alex’s slides are available here.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately. We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham. In addition to the above, we ask that you please observe the Gricean maxims: Be relevant That is, stay reasonably on topic. Be truthful This goes without saying; don’t give us any nonsense. Be concise Say as much as you need to without being unnecessarily long-winded. Be perspicuous This last one needs no explanation. We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification. All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions. Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment