Author Archive

Finding what is not there

A major part of PARADISEC’s effort goes in finding and digitising audio tapes that record performance in the many small languages of the world. As discussed in a number of posts on this blog it is becoming urgent that these tapes are digitised while they are still playable. Of the tapes described in this earlier post about tapes from Madang in PNG, some are already so badly damaged by mould that they can’t be played anymore.

In order to find more tapes we run a survey, that, unfortunately, has only ever had sixteen responses. We have managed to negotiate with these respondents to digitise five of their collections so far (see also the earlier blogpost ‘Where are the records?‘).

A more focussed way of finding out what recordings there are is by comparing what is published about a language with what primary records are listed as being in an archive. Assuming that someone doing fieldwork and writing a grammar of a language in the past fifty years must have made some recordings then the mission (should we choose to accept it) is to find those recordings.

Continue reading ‘Finding what is not there’ »

Pacific Manuscripts now in PARADISEC

After some discussion between PARADISEC and the Pacific Manuscripts Bureau (PAMBU) we now have access to linguistic records in the PAMBU microfilm collection, either for tagging in the PARADISEC catalog, or as digital versions of the microfilm in the PARADISEC collection.
Kylie Maloney at PAMBU kindly made available a list of items in PAMBU that have linguistic content (about 70 items). I sent this list to linguists interested in this field and got a priority list from them. PAMBU then entered into negotiations with their depositors to allow the microfilms to be digitised and produced as pdf files for distribution via PARADISEC’s repository. Continue reading ‘Pacific Manuscripts now in PARADISEC’ »

Results of the metadata survey

Keeping track of what is recorded in the course of fieldwork is critical, both for your own future work and for longterm archiving. Recordings of dynamic performance (audio or video) are easy to misplace or misidentify and very difficult to locate once you forget what a file was named and what you recorded on a particular day. We ran a survey about how people record their metadata from January 21st to April 25th, 2016 and had 142 responses (see also the earlier blog post here). There were two multiple choice questions each allowing selection of more than one checkbox and the entry of free text responses. I can send the full results of the survey on request. This information will help inform the development of new tools for metadata entry. The responses are summarised below.

Continue reading ‘Results of the metadata survey’ »

Chasing John Z’graggen’s records

This week a suitcase of audio tapes will arrive in Melbourne from Madang in PNG. While a lot of the effort of building collections in PARADISEC goes in finding tapes and encouraging people to deposit their recordings, there are some collections that stand out for the amount of work required. This is the story of one of them.

Continue reading ‘Chasing John Z’graggen’s records’ »

Reading HyperCard stacks in 2016

HyperCard (HC) was a brilliant program that came free with every Macintosh computer from 1987 and was in development until around 2004. It made it possible to create multimedia ‘stacks’ (of cards) and was very popular with linguists. For example, Peter Ladefoged produced an IPA HyperCard stack and SIL had a stacks for drawing syntactic trees or for exploring the history of Indo-European (see their listing here). Texas and FreeText created  by Mark Zimmerman allowed you to create quick indexes of very large text files (maybe even into the megabytes! Remember this is the early 1990s). I used FreeText when I wrote Audiamus, a corpus exploration tool that let me link text and media and then cite the text/media in my research.

My favourite HC linguistic application was J.Randolph Valentine’s Rook that presented a speaker telling an Ojibwe story (with audio), with interlinear text linked to a grammar sketch of the language. I adapted that model for a story in Warnman, told by Waka Taylor, and produced as part of a set of HC stacks called ‘Australia’s languages’ and released in 1994. Continue reading ‘Reading HyperCard stacks in 2016’ »

Toolbox to Elan

In the spirit of solving small frustrations I offer my weekend experience of getting Toolbox files into Elan. I have over a hundred texts in Nafsan, most of which are time-aligned and interlinearised. I am working with Stefan Schnell on adding GRAID annotation to some of these texts and the preferred way of doing this is in Elan, with the GRAID annotation at the morphemic-level. I tried importing Toolbox files using the Elan ‘Import’ menu, and had listed all field markers in Toolbox, together with their internal dependencies (which should then map to Elan’s relationship between tiers). These settings are stored in an external file. Unfortunately, the import failed several times, despite changing the settings slightly after each attempt. Continue reading ‘Toolbox to Elan’ »

Songs of the Empty Place

Jimmy Weiner and Don Niles have published Songs of the Empty Place: The Memorial Poetry of the Foi of the Southern Highlands Province of Papua New Guinea. This new book contains songs recorded by Weiner between 1979 and 1995 and can be downloaded from ANU E-Press here. All audio was digitised by PARADISEC and is available in the collection JW1. The songs are organised under three main categories: 7 Women’s Sago Songs (Obedobora), 44 Men’s Songs (Sorohabora), and 7 Women’s Songs (Sorohabora) and accompanied by some 40 photographs.
Continue reading ‘Songs of the Empty Place’ »

Generating word forms

Have you ever wanted to create a list of possible words in a language you are working on? Have you started creating a dictionary but now need to find words that are not yet recorded? This could be the app for you. Word Generator is a free web service that lets you upload a list of words that you know, together with a list of consonants and vowels, like this:

Consonants: b, rd, d, k, g, j, rl, l, lh, ly, m, n, nh, ng, ny, rn, yh, r, rr, n, ng, y, th, w
Vowels: a, aa, i, ii, u, uu

[ … ]

Word Generator will generate a list of possible words based on this information. It has a number of settings you can alter to adjust the degree of probability, the number and the length of words you want to produce. You can then ask speakers to look through the list to help them think of words that are not already in the dictionary, and it could provoke useful discussion about other forms and meanings.

Please try Word Generator and post any feedback here or by email to me.

Word Generator is being written by Andreas Scherbakov as part of a project funded by ARC Future Fellowship FT140100214

Seeking your help with tool development

We are in the process of identifying gaps in tools for fieldwork and data analysis that can be filled as part of the Centre of Excellence for the Dynamics of Language. I’d like to ask for your input into the requirements for a metadata entry tool. In part, this analysis asks for your opinions on the value of existing tools (listed below) and their relative strengths and weaknesses, and asks if it may be worth putting effort into developing any of them further, rather than starting from scratch.

The high-level requirement of this tool is to make it easy to describe files created in fieldwork, to be available both off- and on-line and to deliver the description as a text file for upload to an archive. This includes capturing as much metadata from the files themselves; providing controlled vocabularies of terms to select from (preferably via drag-and-drop rather than keyboard entry); allowing the metadata to be exported in a range of formats to suit whichever archive will host the collection; allowing the metadata to be imported to the tool for use by collaborative team members; allowing controlled vocabularies to be amended to suit the local situation. This tool could also allow users to visualise the state of a collection: which media files have been transcribed, which have been interlinearised, have text files been scanned, OCRed …. what other processes have been applied, which have been archived, what the rights are for each file, also allowing the user to specify what these criteria are for their own type of collection.

These are the currently available tools, please let us know of any others (especially those created for different disciplinary fieldwork):
CMDI Maker

You can either add comments below, or else write to me separately (thien [at] with your ideas that can contribute to how we develop this tool.

Grammar writing: where are we now?

Ruth Singer recaps last week’s Linguistics in the Pub, a monthly informal gathering of linguists in Melbourne to discuss topical areas in our field.

Linguistics in the Pub on Tuesday the 24th of February, 2015 centred around the theme: grammar writing. Harriet Sheppard (Monash University) led the discussion. The announcement and short background reading are here.

The descriptive grammar although often reported to be dead is a form of scholarship that is still very much alive. And although e-grammars are said to be the way of the future, most grammars still take the form of a hard copy, whether it is a PhD thesis or published book. The discussion in this session of linguistics in the pub was kicked off with a discussion of the article by Ulrike Mosel cited below, part of a special publication of LDC on grammar writing.
Continue reading ‘Grammar writing: where are we now?’ »