SayMore is a piece of software developed by SIL that (among other things) allows you to annotate a primary audio file with audio annotations. This means that speakers can add information by carefully re-speaking an utterance, or giving an oral translation. However, this becomes a problem because each annotation segment is saved as a separate file, which means you have to manage or archive hundreds or even thousands of 1-2 second audio files.
Time for a 2017 PARADISEC activity update! At our last update in May 2017 we held 25TB of archived material and now just 4 months later we have grown to 31TB of archived material! We have also increased the number of languages represented in the archive to 1116. In the last 4 years PARADISEC has … Read more
As part of a project to improve the metadata of PARADISEC’s Papua New Guinea collections made possible with funding from the Australian National Data Service (ANDS), PARADISEC has welcomed Steven Gagau into the Sydney office. Steven was engaged as a Research Assistant to provide language support for the project. Steven’s key role is listening to PNG collections held in the PARADISEC catalogue to find out more about the recordings and record this information into the catalogue.
Steven’s role involves listening to recordings of people speaking and singing and then documenting details about the content of the collections specific to the items. He verifies the catalogue details in Title, Description, Dates, Subject and Content languages, Regions and Villages and locates on language maps. He further determines the discourse types such as language play, oratory, report, procedural, formulaic, interactive, narrative or singing.
He then edits the data and updates directly the PARADISEC catalogue for metadata enrichment thus contributing to enhancing the knowledge and information of these materials held.
Steven’s initial work was on the extensive collection recorded by Dr. Thomas (Tom) Dutton in the Kuanua language of the “Tolai” people of the Gazelle Peninsula of East New Britain Province. Dr. Dutton was a linguist with the Australian National University between 1969 and 1997. Prior to taking up linguistics Dutton was an Education Officer in the Administration of Papua and New Guinea. His many books include studies on Papuan languages and the collection digitised by PARDISEC includes his fieldwork tape recordings and other recordings developed to accompany his language learning publications.
Lately, Steven has been working on the materials in the catalogue by collectors from various regions with their language and cultural groups in PNG guided by the database of the Summer Institute of Linguistics (SIL) of PNG. Given his local knowledge of Papua New Guinea, he is able to identify the language and cultural groups to improve the metadata materials in the catalogue collections. He is now reviewing tape collections from Divine Word University (DWU) in Madang, PNG where there are a wide variety of items and discourse types being verified and enhanced in the catalogue.
Steven has extended his language and cultural knowledge to Melanesia Region where he is now involved with Vanuatu and Solomon Islands collections and can enhance the metadata in Bislama (Vanuatu) and Pijin (Solomon Islands) languages similar to Tok Pisin where are usually referred to as Melanesian Pidgin languages and are lingua franca languages in these countries.
PARADISEC is steaming in to 2017, with plenty of activity across our offices in Sydney, Melbourne and Canberra.
It’s been a huge year of increasing our quantity of archived material, growing 79% in 13 months since April 2016 from 14TB to 25TB, in part due to the contribution of the Centre of Excellence for the Dynamics of Language. The collection now represents 1,085 languages in nearly 153,000 files. This could be an interesting challenge we will face in the coming years – the continued growth in our requirement for digital storage space. This 11TB represents an increase to 7,150 hours of audio recordings (growth of 125% since April 2016!), with 40 new collections and nearly 2200 new items.
This is the story of institutional collaboration at its best.
In 2013 Bill Palmer sent through a list of 78 rpm discs held by the National Library of Australia, summarised in their catalog as follows:
“The collection consists of two albums and 20 single sound discs, word lists, slides and photographs. Records include specimens of native languages of the British Solomon Islands Protectorate; speech of Hagen natives; gospel recordings; and titles in Fijian, Babatana, Owa Raha, Bilua, Marovo, Dobu, Ungarinyin, Hula, Tavara, Motu, Johore Malay, Western Sumatra Malay, Wedau and Police Motu. Brief typescript word lists are included with the Motu, Hula, Tavara, Dobu and Babatana sound discs. There is an English-Owa Rahan vocabluary for the Owa Raha disc.”
We sent a request to the NLA with whom PARADISEC has always had a close working relationship. They agreed in principle and then we had periodic contact about this. In July 2015 we approached the National Film and Sound Archive who have the necessary playback equipment. Further to-ing and fro-ing of emails finally resulted in agreement from the NLA in June 2016.
Recently the call came to the Sydney office of PARADISEC that a collection of tapes had arrived in Melbourne that needed some cleaning (see the earlier post here). The tapes were from Madang in Papua New Guinea and had been recorded in the 1960s. They contained valuable and rare records of language and music of PNG.
When the tapes arrived they were visibly covered in a white mould and so the PARADISEC audio preservation team moved into action to remediate the tapes ready for digitisation.
Mould is a common form of contamination of magnetic analogue tape that creates problems as the infected tape will not give a clear signal when played back. Even a small speck of dust or mould can cause a gap between the tape and the head resulting in a drop out of sound.
A major part of PARADISEC’s effort goes in finding and digitising audio tapes that record performance in the many small languages of the world. As discussed in a number of posts on this blog it is becoming urgent that these tapes are digitised while they are still playable. Of the tapes described in this earlier post about tapes from Madang in PNG, some are already so badly damaged by mould that they can’t be played anymore.
A more focussed way of finding out what recordings there are is by comparing what is published about a language with what primary records are listed as being in an archive. Assuming that someone doing fieldwork and writing a grammar of a language in the past fifty years must have made some recordings then the mission (should we choose to accept it) is to find those recordings.
After some discussion between PARADISEC and the Pacific Manuscripts Bureau (PAMBU) we now have access to linguistic records in the PAMBU microfilm collection, either for tagging in the PARADISEC catalog, or as digital versions of the microfilm in the PARADISEC collection.
Kylie Maloney at PAMBU kindly made available a list of items in PAMBU that have linguistic content (about 70 items). I sent this list to linguists interested in this field and got a priority list from them. PAMBU then entered into negotiations with their depositors to allow the microfilms to be digitised and produced as pdf files for distribution via PARADISEC’s repository.
Keeping track of what is recorded in the course of fieldwork is critical, both for your own future work and for longterm archiving. Recordings of dynamic performance (audio or video) are easy to misplace or misidentify and very difficult to locate once you forget what a file was named and what you recorded on a particular day. We ran a survey about how people record their metadata from January 21st to April 25th, 2016 and had 142 responses (see also the earlier blog post here). There were two multiple choice questions each allowing selection of more than one checkbox and the entry of free text responses. I can send the full results of the survey on request. This information will help inform the development of new tools for metadata entry. The responses are summarised below.