Author Archive

A WEBSITE IS NOT AN ARCHIVE!!!!!!

I had a message from the ‘pop up archive‘ to say they are closing down and I should download my data. They were a website that allowed users to upload audio files that were then meant to be prepared for searching via automated recognition of features in the file.

Leaving aside the functionality of the site (I admit I did not get it to work with my files), I want to reiterate my frustration with websites that call themselves archives (ok, so in this case the title ‘pop up’ should have been a giveaway), only to disappear at the end of a funding cycle or the retirement of the researcher.

In part this frustration is also motivated by a recent project in which I compared languages that have little representation in the OLAC listing (see the earlier discussion of this here) of holdings in the world’s language archives but have had a grammar written recently. If a linguist has worked on a language in the past thirty or so years then it would be reasonable to expect that some primary records were produced, and that they should be in an archive. They may be in a repository that is not part of OLAC, in which case we can create a record to point to that collection. If they are not in any archive, the task is to ask the linguist if they need help to get the records into an archive. At PARADISEC we have been doing this, partly through our ‘Lost and Found’ survey, which has resulted in a number of collections of analog tapes being digitised and made available.
Continue reading ‘A WEBSITE IS NOT AN ARCHIVE!!!!!!’ »

Finding what is not there

A major part of PARADISEC’s effort goes in finding and digitising audio tapes that record performance in the many small languages of the world. As discussed in a number of posts on this blog it is becoming urgent that these tapes are digitised while they are still playable. Of the tapes described in this earlier post about tapes from Madang in PNG, some are already so badly damaged by mould that they can’t be played anymore.

In order to find more tapes we run a survey http://www.delaman.org/project-lost-found/, that, unfortunately, has only ever had sixteen responses. We have managed to negotiate with these respondents to digitise five of their collections so far (see also the earlier blogpost ‘Where are the records?‘).

A more focussed way of finding out what recordings there are is by comparing what is published about a language with what primary records are listed as being in an archive. Assuming that someone doing fieldwork and writing a grammar of a language in the past fifty years must have made some recordings then the mission (should we choose to accept it) is to find those recordings.

Continue reading ‘Finding what is not there’ »

Pacific Manuscripts now in PARADISEC

After some discussion between PARADISEC and the Pacific Manuscripts Bureau (PAMBU) we now have access to linguistic records in the PAMBU microfilm collection, either for tagging in the PARADISEC catalog, or as digital versions of the microfilm in the PARADISEC collection.
Kylie Maloney at PAMBU kindly made available a list of items in PAMBU that have linguistic content (about 70 items). I sent this list to linguists interested in this field and got a priority list from them. PAMBU then entered into negotiations with their depositors to allow the microfilms to be digitised and produced as pdf files for distribution via PARADISEC’s repository. Continue reading ‘Pacific Manuscripts now in PARADISEC’ »

Results of the metadata survey

Keeping track of what is recorded in the course of fieldwork is critical, both for your own future work and for longterm archiving. Recordings of dynamic performance (audio or video) are easy to misplace or misidentify and very difficult to locate once you forget what a file was named and what you recorded on a particular day. We ran a survey about how people record their metadata from January 21st to April 25th, 2016 and had 142 responses (see also the earlier blog post here). There were two multiple choice questions each allowing selection of more than one checkbox and the entry of free text responses. I can send the full results of the survey on request. This information will help inform the development of new tools for metadata entry. The responses are summarised below.

Continue reading ‘Results of the metadata survey’ »

Chasing John Z’graggen’s records

This week a suitcase of audio tapes will arrive in Melbourne from Madang in PNG. While a lot of the effort of building collections in PARADISEC goes in finding tapes and encouraging people to deposit their recordings, there are some collections that stand out for the amount of work required. This is the story of one of them.

Continue reading ‘Chasing John Z’graggen’s records’ »

Reading HyperCard stacks in 2016

HyperCard (HC) was a brilliant program that came free with every Macintosh computer from 1987 and was in development until around 2004. It made it possible to create multimedia ‘stacks’ (of cards) and was very popular with linguists. For example, Peter Ladefoged produced an IPA HyperCard stack and SIL had a stacks for drawing syntactic trees or for exploring the history of Indo-European (see their listing here). Texas and FreeText created  by Mark Zimmerman allowed you to create quick indexes of very large text files (maybe even into the megabytes! Remember this is the early 1990s). I used FreeText when I wrote Audiamus, a corpus exploration tool that let me link text and media and then cite the text/media in my research.

My favourite HC linguistic application was J.Randolph Valentine’s Rook that presented a speaker telling an Ojibwe story (with audio), with interlinear text linked to a grammar sketch of the language. I adapted that model for a story in Warnman, told by Waka Taylor, and produced as part of a set of HC stacks called ‘Australia’s languages’ and released in 1994. Continue reading ‘Reading HyperCard stacks in 2016’ »

Toolbox to Elan

In the spirit of solving small frustrations I offer my weekend experience of getting Toolbox files into Elan. I have over a hundred texts in Nafsan, most of which are time-aligned and interlinearised. I am working with Stefan Schnell on adding GRAID annotation to some of these texts and the preferred way of doing this is in Elan, with the GRAID annotation at the morphemic-level. I tried importing Toolbox files using the Elan ‘Import’ menu, and had listed all field markers in Toolbox, together with their internal dependencies (which should then map to Elan’s relationship between tiers). These settings are stored in an external file. Unfortunately, the import failed several times, despite changing the settings slightly after each attempt. Continue reading ‘Toolbox to Elan’ »

Songs of the Empty Place

Jimmy Weiner and Don Niles have published Songs of the Empty Place: The Memorial Poetry of the Foi of the Southern Highlands Province of Papua New Guinea. This new book contains songs recorded by Weiner between 1979 and 1995 and can be downloaded from ANU E-Press here. All audio was digitised by PARADISEC and is available in the collection JW1. The songs are organised under three main categories: 7 Women’s Sago Songs (Obedobora), 44 Men’s Songs (Sorohabora), and 7 Women’s Songs (Sorohabora) and accompanied by some 40 photographs.
Continue reading ‘Songs of the Empty Place’ »

Generating word forms

Have you ever wanted to create a list of possible words in a language you are working on? Have you started creating a dictionary but now need to find words that are not yet recorded? This could be the app for you. Word Generator is a free web service that lets you upload a list of words that you know, together with a list of consonants and vowels, like this:

Consonants: b, rd, d, k, g, j, rl, l, lh, ly, m, n, nh, ng, ny, rn, yh, r, rr, n, ng, y, th, w
Vowels: a, aa, i, ii, u, uu

alardi
arinji
arlibala
[ … ]

Word Generator will generate a list of possible words based on this information. It has a number of settings you can alter to adjust the degree of probability, the number and the length of words you want to produce. You can then ask speakers to look through the list to help them think of words that are not already in the dictionary, and it could provoke useful discussion about other forms and meanings.

Please try Word Generator and post any feedback here or by email to me.

Word Generator is being written by Andreas Scherbakov as part of a project funded by ARC Future Fellowship FT140100214

Seeking your help with tool development

We are in the process of identifying gaps in tools for fieldwork and data analysis that can be filled as part of the Centre of Excellence for the Dynamics of Language. I’d like to ask for your input into the requirements for a metadata entry tool. In part, this analysis asks for your opinions on the value of existing tools (listed below) and their relative strengths and weaknesses, and asks if it may be worth putting effort into developing any of them further, rather than starting from scratch.

The high-level requirement of this tool is to make it easy to describe files created in fieldwork, to be available both off- and on-line and to deliver the description as a text file for upload to an archive. This includes capturing as much metadata from the files themselves; providing controlled vocabularies of terms to select from (preferably via drag-and-drop rather than keyboard entry); allowing the metadata to be exported in a range of formats to suit whichever archive will host the collection; allowing the metadata to be imported to the tool for use by collaborative team members; allowing controlled vocabularies to be amended to suit the local situation. This tool could also allow users to visualise the state of a collection: which media files have been transcribed, which have been interlinearised, have text files been scanned, OCRed …. what other processes have been applied, which have been archived, what the rights are for each file, also allowing the user to specify what these criteria are for their own type of collection.

These are the currently available tools, please let us know of any others (especially those created for different disciplinary fieldwork):
Arbil
SayMore
ExSite9
CMDI Maker

You can either add comments below, or else write to me separately (thien [at] unimelb.edu.au) with your ideas that can contribute to how we develop this tool.