Finding language material, Web2 or Wikipedia?

[From Nick Thieberger, University of Melbourne]
On the topic of trying to locate material in a small language, I was reading Kaisa Maliniemi’s 2009 article on the discovery of new linguistic material in Kven and Sámi in Norway’s public records archives. She discusses the fact that the records have been publicly available for some time and that a number of researchers must have worked with them in the past, but there was no trace in that activity of the fact that the records included considerable amounts of information in these two minority languages. She argues that archives can make available to ‘the other’ those voices and knowledge marginalized by the western-dominated global mainstream. But the point that the article made strongly for me is that we should be able to provide a means for tagging such collections so that they can be located by others interested in those languages (this was also a topic at the ELIIP conference reported on by Jane Simpson here and here ).
The suggestion that we can use Wikipedia [in Peter Austin’s reply to Jane’s blog] is only part of a solution. I have put links to South Efate material into a Wikipedia entry here as a way to make the information available. We can, however, do better than an unstructured language page that is made by hand, as in the Wikipedia approach, rather than being automatically populated by web-based information in Web2 style. Using Web2 technologies, the Open Language Archives Community (OLAC) harvests information from participating collections and then establishes a page for every language represented in those collections, like this one, where the three-letter language code (ISO-639-3) designates the language, in this case ‘erk’ = South Efate (Vanuatu). Of course there are languages without ISO standard codes and they need to be brought into the system too.
A focus of our archive, PARADISEC, is to make previously unlocatable material available, and we have done this in several ways. The first, and most straightforward, is to provide an online catalog of material in our own collection. The catalog, using standard terms like country names, language names and the metadata given by the Open Language Archives Community, allows depositors to enter their own metadata. For many, this is the first time they have actually systematised their collection. Because the catalog is part of the OLAC federation, it is accessible via their search mechanisms, and is also locatable via Google.
Second we have made material available by taking scans of around 14,000 pages of notes and placing them online, with enough contextual information to allow them to be located [see Arthur Capell’s notes here, or Stephen Wurm’s notes here, or Calvin Roesler’s notes here]. If you look at the OLAC page with South Efate material listed you will also find a number of references and links to Arthur Capell’s notes which we put online.
Third, we can enter a record in our catalog to make an existing resource more widely available, and, as our catalog is harvested by the Open Language Archives Community, it will then be more generally locatable. For example, George Grace is a linguist who has worked in various parts of the western Pacific, and his fieldnotes have been scanned and put online at the University of Hawai’i (UH) library. If you know that it is there and you search for his name, then you can find it in Google. However, there is no provision made by UH for standardising language names by use of the three-letter code (or ISO-639-3) that reduces ambiguity in searching. The UH library catalog currently does not list these items, nor does their ‘Online resources’ catalog. By entering a record into the PARADISEC catalog (here) the information is then propagated through to OLAC:
.
A Google search for one of the languages mentioned in this collection, ‘Waropen’, locates our record (hit number 3) in OLAC:

The item at UH comes in at hit number 57:

OLAC’s language pages are an excellent source of information, and if we can add to each page by providing a fairly minimal pointer in an OLAC-compliant record then that may also solve the problem for the Kven and Sámi material that Maliniemi discovered.

Maliniemi, Kaisa. 2009. Public records and minorities: problems and possibilities for Sámi and Kven. Archival Science. Vol. 9, Numbers 1-2: 15-27 DOI 10.1007/s10502-009-9104-3

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a comment