Finding language material, Web2 or Wikipedia?

[From Nick Thieberger, University of Melbourne]
On the topic of trying to locate material in a small language, I was reading Kaisa Maliniemi’s 2009 article on the discovery of new linguistic material in Kven and Sámi in Norway’s public records archives. She discusses the fact that the records have been publicly available for some time and that a number of researchers must have worked with them in the past, but there was no trace in that activity of the fact that the records included considerable amounts of information in these two minority languages. She argues that archives can make available to ‘the other’ those voices and knowledge marginalized by the western-dominated global mainstream. But the point that the article made strongly for me is that we should be able to provide a means for tagging such collections so that they can be located by others interested in those languages (this was also a topic at the ELIIP conference reported on by Jane Simpson here and here ).
The suggestion that we can use Wikipedia [in Peter Austin’s reply to Jane’s blog] is only part of a solution. I have put links to South Efate material into a Wikipedia entry here as a way to make the information available. We can, however, do better than an unstructured language page that is made by hand, as in the Wikipedia approach, rather than being automatically populated by web-based information in Web2 style. Using Web2 technologies, the Open Language Archives Community (OLAC) harvests information from participating collections and then establishes a page for every language represented in those collections, like this one, where the three-letter language code (ISO-639-3) designates the language, in this case ‘erk’ = South Efate (Vanuatu). Of course there are languages without ISO standard codes and they need to be brought into the system too.
A focus of our archive, PARADISEC, is to make previously unlocatable material available, and we have done this in several ways. The first, and most straightforward, is to provide an online catalog of material in our own collection. The catalog, using standard terms like country names, language names and the metadata given by the Open Language Archives Community, allows depositors to enter their own metadata. For many, this is the first time they have actually systematised their collection. Because the catalog is part of the OLAC federation, it is accessible via their search mechanisms, and is also locatable via Google.
Second we have made material available by taking scans of around 14,000 pages of notes and placing them online, with enough contextual information to allow them to be located [see Arthur Capell’s notes here, or Stephen Wurm’s notes here, or Calvin Roesler’s notes here]. If you look at the OLAC page with South Efate material listed you will also find a number of references and links to Arthur Capell’s notes which we put online.
Third, we can enter a record in our catalog to make an existing resource more widely available, and, as our catalog is harvested by the Open Language Archives Community, it will then be more generally locatable. For example, George Grace is a linguist who has worked in various parts of the western Pacific, and his fieldnotes have been scanned and put online at the University of Hawai’i (UH) library. If you know that it is there and you search for his name, then you can find it in Google. However, there is no provision made by UH for standardising language names by use of the three-letter code (or ISO-639-3) that reduces ambiguity in searching. The UH library catalog currently does not list these items, nor does their ‘Online resources’ catalog. By entering a record into the PARADISEC catalog (here) the information is then propagated through to OLAC:
.
A Google search for one of the languages mentioned in this collection, ‘Waropen’, locates our record (hit number 3) in OLAC:

The item at UH comes in at hit number 57:

OLAC’s language pages are an excellent source of information, and if we can add to each page by providing a fairly minimal pointer in an OLAC-compliant record then that may also solve the problem for the Kven and Sámi material that Maliniemi discovered.

Maliniemi, Kaisa. 2009. Public records and minorities: problems and possibilities for Sámi and Kven. Archival Science. Vol. 9, Numbers 1-2: 15-27 DOI 10.1007/s10502-009-9104-3