Merging SayMore audio snippets into a single wav file

SayMore is a piece of software developed by SIL that (among other things) allows you to annotate a primary audio file with audio annotations. This means that speakers can add information by carefully re-speaking an utterance, or giving an oral translation. However, this becomes a problem because each annotation segment is saved as a separate file, which means you have to manage or archive hundreds or even thousands of 1-2 second audio files.

Read more

Texts and more texts: corpora in the CoEDL

Corpus development is one of the goals of the ARC Centre of Excellence for the Dynamics of Language (see this web page for more details). We have run a number of workshops on corpus-related themes (e.g. the 2017 workshop that included a day on converting early sources).

In addition to creating useable materials for the source communities (which we have a strong commitment to supporting) we are archiving records that include primary media, transcripts and associated annotations. We aim to produce from this material a subset of accessible texts for a number of languages.
Here it is worth noting that we have come up with this terminology (thanks to Jane Simpson for the formulation) to distinguish the objects we have collected:
Assemblage – all material collected, working files, early sources, multiple versions and drafts
Collection – the archived material, a subset of the above, but curated with sufficient metadata to allow the user to know what all items are
Corpus – a crafted set of texts in the language that can be used for further analysis

Read more

A WEBSITE IS NOT AN ARCHIVE!!!!!!

I had a message from the ‘pop up archive‘ to say they are closing down and I should download my data. They were a website that allowed users to upload audio files that were then meant to be prepared for searching via automated recognition of features in the file.

Leaving aside the functionality of the site (I admit I did not get it to work with my files), I want to reiterate my frustration with websites that call themselves archives (ok, so in this case the title ‘pop up’ should have been a giveaway), only to disappear at the end of a funding cycle or the retirement of the researcher.

In part this frustration is also motivated by a recent project in which I compared languages that have little representation in the OLAC listing (see the earlier discussion of this here) of holdings in the world’s language archives but have had a grammar written recently. If a linguist has worked on a language in the past thirty or so years then it would be reasonable to expect that some primary records were produced, and that they should be in an archive. They may be in a repository that is not part of OLAC, in which case we can create a record to point to that collection. If they are not in any archive, the task is to ask the linguist if they need help to get the records into an archive. At PARADISEC we have been doing this, partly through our ‘Lost and Found’ survey, which has resulted in a number of collections of analog tapes being digitised and made available.

Read more

Are Australia’s Community Languages worth studying? – report on the Melbourne Linguistics in the Pub 13th June 2017

A report on this month’s Melbourne Linguistics in the Pub by Ana Krajinovic (University of Melbourne / Humboldt University)

Our discussion this week was led by James Walker who asked us an intriguing question about the linguistic research areas represented in Australia. Coming from the background of studying variation and change in community languages in Toronto, James became interested in these research topics in the Australian context. Melbourne is a multilingual city, and just like in Toronto, community languages brought through immigration by non-English speakers started appearing in Melbourne in the 20th century. We asked ourselves why the linguistic diversity of different communities isn’t equally well represented in the Australian research agenda. Is the study of indigenous languages of Australia seen as inherently more valuable and, if so, why?

Read more

Maric dialect recorded by Edmund Kennedy in 1847

Guest post by Peter Sutton.

The published Barcoo River (Queensland) expedition diary of explorer Edmund Kennedy (1852) was augmented with unpublished manuscript sources and republished by Edgar Beale (1983). In the context of the present paper the key augmentation came from a handwritten copy of Kennedy’s journal for the period 01 April 1847 to 24 January 1848 made by Rev W.B. Clarke, and held by the Royal Geographical Society of London (RGS, see Beale 1983:96-97).

In an entry for 01 October 1847, Kennedy reported an encounter with Aboriginal people who were ‘without exception the most friendly and best behaved Natives I met with on the journey’. According to the RGS manuscript, Kennedy recorded:

We obtained from this party some useful words, which are correctly written, according to their sounds, River Victoria,1 “Barcoo”; Water, “Ammoo”; Grass, “Oo-lo-noo”; Fire, “Poordie” &c. (Beale 1983:142)

It is a pity that the rest of the list has not so far surfaced.

Read more


Notes

  1. This name was later replaced by ‘Barcoo River’.

Categories in language descriptions and linguistic typology – Melbourne Linguistics in the Pub May 2017

Stefan Schnell (University of Melbourne) recaps last month’s Linguistics in the Pub (Melbourne)

Leading the discussion was Ana Krajinović (University of Melbourne / Humboldt University)

Introduction

The relationship between language-specific descriptive-analytical categories and categories figuring in cross-language comparative studies, and in particular the nature of the latter, have been subject of intensive and recurrent debate over the years, most recently in a dedicated discussion at last year’s SLE conference in Naples, and a focused discussion in the last October issue of Linguistic Typology (Vol 20, issue 2, 2016). In this LiP session, we focused on the research-practical aspects of the issue at hand from a descriptive point of view, asking questions about how researchers go about in identifying relevant categories in the languages they describe, and how they capture and describe their functions and label the categories. But what criteria and concepts do researchers apply when going about these tasks? A notoriously difficult area is research into systems of tense-mood-aspect (TMA) which illustrate some of the points during our discussion.

Read more

Improving the Metadata of Papua New Guinea Collections

Written by Steven Gagau and Jodie Kell

As part of a project to improve the metadata of PARADISEC’s Papua New Guinea collections made possible with funding from the Australian National Data Service (ANDS), PARADISEC has welcomed Steven Gagau into the Sydney office. Steven was engaged as a Research Assistant to provide language support for the project. Steven’s key role is listening to PNG collections held in the PARADISEC catalogue to find out more about the recordings and record this information into the catalogue.

p1010839

Steven can be seen here with Nick Ward from PARADISEC

 

Steven’s role involves listening to recordings of people speaking and singing and then documenting details about the content of the collections specific to the items. He verifies the catalogue details in Title, Description, Dates, Subject and Content languages, Regions and Villages and locates on language maps. He further determines the discourse types such as language play, oratory, report, procedural, formulaic, interactive, narrative or singing.

He then edits the data and updates directly the PARADISEC catalogue for metadata enrichment thus contributing to enhancing the knowledge and information of these materials held.

Steven’s initial work was on the extensive collection recorded by Dr. Thomas (Tom) Dutton in the Kuanua language of the “Tolai” people of the Gazelle Peninsula of East New Britain Province. Dr. Dutton was a linguist with the Australian National University between 1969 and 1997. Prior to taking up linguistics Dutton was an Education Officer in the Administration of Papua and New Guinea. His many books include studies on Papuan languages and the collection digitised by PARDISEC includes his fieldwork tape recordings and other recordings developed to accompany his language learning publications.

Lately, Steven has been working on the materials in the catalogue by collectors from various regions with their language and cultural groups in PNG guided by the database of the Summer Institute of Linguistics (SIL) of PNG. Given his local knowledge of Papua New Guinea, he is able to identify the language and cultural groups to improve the metadata materials in the catalogue collections. He is now reviewing tape collections from Divine Word University (DWU) in Madang, PNG where there are a wide variety of items and discourse types being verified and enhanced in the catalogue.

Steven has extended his language and cultural knowledge to Melanesia Region where he is now involved with Vanuatu and Solomon Islands collections and can enhance the metadata in Bislama (Vanuatu) and Pijin (Solomon Islands) languages similar to Tok Pisin where are usually referred to as Melanesian Pidgin languages and are lingua franca languages in these countries.

Read more

PARADISEC Activity Update

PARADISEC is steaming in to 2017, with plenty of activity across our offices in Sydney, Melbourne and Canberra.

It’s been a huge year of increasing our quantity of archived material, growing 79% in 13 months since April 2016 from 14TB to 25TB, in part due to the contribution of the Centre of Excellence for the Dynamics of Language. The collection now represents 1,085 languages in nearly 153,000 files. This could be an interesting challenge we will face in the coming years – the continued growth in our requirement for digital storage space. This 11TB represents an increase to 7,150 hours of audio recordings (growth of 125% since April 2016!), with 40 new collections and nearly 2200 new items.

Read more

Why researching languages in the family is complicated and how it can be the most entertaining thing – MLIP blog April 2017

MLIP blog April 2017

Alan Ray recaps the April Melbourne Linguistics in the Pub (MLIP) a monthly discussion group. This month’s MLIP was held in conjunction with Language practices and language policies in multilingual contexts workshop, University of Melbourne 6-7 April 2017

Leading the discussion was Judith Purkarthofer, Multiling: Center for Multilingualism in Society across the Lifespan, University of Oslo

She summarised the discussion in the announcement on the RNLD blog as below:
This discussion will start with experiences in researching family languages, policies and practices in a Northern European context. National languages, minority languages and languages of migration are considered a public question, but they are also very much a private question for families and family members.

Read more