2000 Hours

Early this morning, a delivery of audio files was quietly sent from Paradisec’s local server at the University of Sydney to permanent near-line tape storage at the Australian Partnership for Advanced Computing in Canberra. This happens on many days, as you might imagine, but what makes today’s delivery special, was that somewhere in that bunch of files was our 2000th archived hour of audio.
Moreover, we will soon be celebrating five years of operations, in which case, 2000 hours might not seem so impressive – it’s just 400 hours per year after all – but we at Paradisec are very proud of our collection. Especially given that just about everything here is done on a shoestring budget and there have been some lengthy hiatuses of funding lately.
Speaking of which, this may be an opportune time to mention that we are always amenable to generous donations from people wishing to sponsor the digitisation and preservation of a collection of data. See our website for more details.
So, just which file was the lucky 2000th hour? Well, we can’t really be sure, but we do know that it was among a collection of Mark Durie’s research into the dialects of Aceh, an area that was devastated by the Indian Ocean tsunami of Boxing Day 2006.
To help us celebrate both these milestones, Mark has kindly written a small piece for us about Aceh’s dialects, his research of them and the importance of preserving the collection. He has also allowed a small portion of one of these recordings to be posted with this piece, which you can download here.

Read more

Renovations, Repairs and Repositories

A lot of work has been happening at the University of Sydney over the past six months, and at the end of last year the top floor of the Transient Building, which houses Linguistics, Paradisec and a few other offices, got renovated. Unfortunately, since the entire exterior of the building is composed of fibrous asbestos, it’s unlikely that the University will outlay the mammoth insurance costs to do any exterior work. But anyone who knows the Transient building knows that the best option would be to demolish the whole thing and start again from scratch.

Read more

Streaming access to transcribed media

After some effort PARADISEC has finally established a streaming server that can be used in normal web pages. This means that an online dictionary, for example, can have example headwords and sentences spoken, or video clips presented to illustrate a given word. You can see the trial version here, (NB this will only work with the Firefox browser and you will also need to pre-install the Annodex plugin).
For some time it has been troubling that we have no simple way of presenting media online in association with transcripts, especially when an archived field recording may be the only recording of a particular language. It should have been simple enough to access media on the web. After all, we do it on Youtube and other places. But we have been further constrained by really wanting all of this to be open source (freely available software) so that anyone with the right skills can replicate this setup and not have to pay. And we also wanted the process for getting material into an online presentation to follow on from normal fieldwork outcomes, in line with output from the tools typically used by a professional linguist (one who keeps up to date with the methods of their profession). When the archival form of the media exists in a repository, it should then be an automatable process to put it into a streaming server for access.

Read more

UNESCO’s world day of audio-visual heritage

Yesterday (27 October) was the first celebration of UNESCO’s world day of audio-visual heritage. The trailer on that website, put together from the holdings of various audio-visual archives around the world, gives a flavour of the kind of material that is held in audio and film/video archives worldwide. Australia is fortunate to have many cultural institutions that hold and look after material recorded in Australia: the National Film and Sound Archive (NFSA), the Australian Institute of Aboriginal and Torres Straid Islander Studies (AIATSIS), the National Library of Australia (NLA), the National Archives of Australia (NAA) and many others.

Read more

Video in fieldwork

Check out ‘Language Archives Newsletter’ (LAN) No. 10 (edited by David Nathan, Marcus Uneson, Paul Trilsbeek). It features articles on the role of video in language documentation by Patrick McConvell and Peter Wittenburg, as well as reviews of audio recorders including the Zoom H4. LAN 10 Contents: Video – A Linguist’s View (A Reply to … Read more

Go Xena!

So you want to preserve that MSWord novel, those spreadsheets, those AppleWorks fieldnotes forever?
The National Archives of Australia are ahead of you – they’ve developed free and open source software to help in the long term preservation of digital records. Xena! (XML Electronic Normalising for Archives – and I bet they thought hard to come up with the N).
I saw a demo of Xena a couple of years ago, and was greatly impressed by the potential of streamlining the workflow in digital text archives – by detecting the file formats of digital objects, and then converting them into open formats like XML for preservation. Databases remain the nightmare of course.
Anyway, there’s a new release – and here are the details.

Read more

Digitally barefoot archivists

Digital archives of photos, films and recordings are springing up in Indigenous communities, and some of them are even Getting Funding, hurrah! The Bill and Melinda Gates Foundation is giving a million US dollars to the Northern Territory State Library System:

“a 2007 Access to Learning Award recognizes the Northern Territory Library for providing free computer and Internet access and training to impoverished indigenous communities… The award honours the innovative Libraries and Knowledge Centres (LKC) Program, which provides communities with free access to computers and the Internet, and helps Indigenous Territorians to build digital collections of their culture through the Our Story database.”

They’ve got Knowledge Centres at Milingimbi, Wadeye, Peppimenarti, Umbakumba, Angurugu, Pirlangimpi, Milikapiti, Barunga, Ti Tree, and Ltyentye Apurte.
…..As well, “Microsoft, a Global Libraries initiative partner, will donate US $224,000 in software and technology training curriculum to upgrade the organization’s 300 library computers.” [Weep for us Mac users]
The Our Story database is an adaptation of the classic Filemaker Pro Ara Irititja program developed by the artist and historian John Dallwitz for the Anangu Pitjantjatjara.

Ara Irititja, a project of the Pitjantjatjara Council, commenced in 1994 when it was realised that a large amount of archival material about Anangu was not controlled by or accessible to them. This material was held in museums, libraries and private collections. Items held by private individuals were often at risk of being damaged or irretrievably lost. To date, a major focus of Ara Irititja’s work has been retrieving and securing such records for the benefit of Anangu and the broader Australian community.

The great advantage of Filemaker Pro was that it was basically off-the-shelf and basically fairly easy for people to use. There have been elaborate proposals, but going beyond glamour to making things work in remote communities is a very large step.

Read more

Open access or open slather? Nick Thieberger

[ from Nick Thieberger, PARADISEC, Melbourne University branch ]
I am a firm believer in open access to information, especially research information that has been created by taxpayers’ funds. Thus it came as something of a surprise to find myself likened to the main man of the dark forces of corporate information ownership on a site formerly known as the ‘Stolen Grammars’ site.
Constructed by a linguist in Stockholm, the site offered downloadable versions of many grammars which had been copied from various locations (“Browse my collection of stolen .pdf reference grammars if you’d rather not pay.”)

Read more

‘Baking Tapes’ or Analogue Audio Restoration

Last Friday was a bit of a milestone for me, since, in the 6 or so months that I have been involved in the audio preservation side of things at PARADISEC, I hadn’t yet actually cleaned a damaged audio tape. Unfortunately for me, the process isn’t quite as straight-forward as it is for a CD – warm soapy water, a non-abrasive cloth, wipe across the grain – rather, the entire process can take weeks, depending on how badly affected the tapes are.

Read more

For future philologists

On Wednesday last week (25th April) during Endangered Languages Week at SOAS there was a presentation on the “Dawes online” project at SOAS which aims to make an interactive digital facsimile of William Dawes’ notebooks of the Sydney language available on the web. The project has produced high resolution digital images of the notebooks written by Dawes in 1790 and is developing searchable transcriptions of the manuscripts that will include the linguistic analysis made by Jaky Troy (published in 1993) along with topic maps (using the XTM standard for XML topic maps). This will enable users to search by topic, such as “animals” or “names” as well as linguistic topics, such as verb paradigms.
This project brings together knowledge and skills from archive studies, philology, linguistic analysis, and information and multimedia technologies. It is one of the more technically sophisticated of a series of projects that have emerged over the past several years to work on archival materials of Australian and Pacific languages, especially languages that have no or very few speakers. This work has parallels in the richly elaborated studies of Old English manuscripts published by Bernard Muir of Melbourne University as CDs and DVDs. The goal of both Muir’s work and the Dawes project is to present the original materials in an interactive format along with layers of standoff analytical markup.
A related kind of study is what we could call “second generation language documentation” (2GLD) where it is linguist’s fieldnotes and transcriptions which form the basis for documentation rather than speech events or speaker knowledge (usually because it is no longer possible to access such knowledge or events). Paradisec has photographed over 10,000 pages of fieldnotes on a wide range of languages for 2GLD purposes using the system developed at the Australian Science and Technology Heritage Centre This includes Arthur Capell’s notes on Pacific languages.

Read more