Author Archive

Could DNA be the future of digital preservation?

Genetic scientists in Britain overnight, have successfully demonstrated the data-storage potential of DNA, as explained in The Conversation today.

In a proof-of-concept experiment, a string of DNA with a physical size around that of a grain of dust, was encoded with an MP3 file of the ‘I have a dream’ speech of Dr Martin Luther King, a photo, a pdf of the 1953 paper that first described the structure of DNA, as well as the entire sonnets of Shakespeare. Critically however, the speck of DNA was sent via post to the US, where it was decoded and found to be an exact digital copy of the input.

The benefit of using DNA, besides the mind-blowing data space potential, is that it can be freeze-dried and stored for a very long time without any loss of information. In fact the encoding procedure utilises the built-in redundancy in DNA; the replication of data in multiple strings, like a biological RAID array, making it very unlikely that the same information will be lost in all strings.

Sequencing DNA still takes a couple of weeks at the least, and both encoding and decoding are unsurprisingly expensive, although costs are predicted to come down quite dramatically over the next 20 years or so.

Even so, the potential of this technology for the perpetual and secure storage of our cultural heritage is obvious, and digital preservation of linguistic and ethnological materials such as PARADISEC contains, would be a very suitable use of this technology, given that we aim to store and preserve digital objects for a very long time.

Announcing Paradisec’s new catalogue

Over the last year or so, the Paradisec team, in collaboration with software developers Robot Parade, Silvia Pfeiffer and John Ferlito, have been working on the development of a replacement to our ageing catalogue and database systems and a couple of weeks ago, this work culminated in the release of the new catalogue.

There are several features of the software that represent a vast improvement over the previous catalogue, including a much simpler search function, for both collections and individual items, a Google Maps API as a way of exploring data (shown here) and, most usefully for our depositors, the ability to play their own files straight through the browser, or download them directly from the archive.

At the same time, the Paradisec team have been working with the Australian National Data Service (ANDS) to provide collection-level metadata to Research Data Australia (RDA) a federal initiative to aggregate metadata from research around the country using a standard metadata format and make it searchable and discoverable via the RDA website. The new catalogue automates this process, so that when a collection has the required metadata to maximise discoverability, it is harvested by RDA and appears in their database. As such it is important that depositors, or managers of others’ data, provide as rich metadata as possible.

While the software is still undergoing some post-release bug fixes and improvements, we welcome the public to explore the breadth and depth of data in the collection. Access to data itself however, is restricted to collectors and managers of those collections in line with Paradisec policy. Access to other people’s data is subject to access conditions.

We encourage interested readers to explore the collection, and we especially invite collectors of relevant data to get in contact with us to investigate depositing their collections with us for safekeeping.

Participant Observation: A LIP discussion

This post recaps the May meeting of Linguistics in the Pub, whose topic was More than just being there? The place of participant observation in linguistic fieldwork.

Two weeks ago at Linguistics in the Pub, we discussed an issue that many linguists never really consider, but which is central to many anthropologists’ work: the role of participant observation in our fieldwork and research.

We had a bit of difficulty nutting out exactly what we mean by participant observation, given that everything we do is in some way, participating, and we all agreed that there was something a bit different about linguistics from anthropology, something that allows us to be more objective. Perhaps it’s that our subject matter, while deeply and inextricably ingrained in culture, can be observed independently of culture, that is, without the researcher having to necessarily embed themselves deeply into the culture. We’re lucky though, because we can enter a community in order to learn the language, and while doing so, become integrated into the community in a more deeply cultural sense. Anthropologists by definition, are there to learn about the culture, which immediately sets up a very different dynamic between the researcher and the members of the community.

Integrating more with the community can present rare opportunities for language learning, such as being able to participate in events and so forth that outsiders generally cannot access. The question presents itself though: is it permissible to covertly conduct linguistic research, in the form of note-taking, for instance, at such an event?

This is a tricky one. Some events would be more suited to doing so, such as going fishing or hunting with some community members and taking notes on things such as lexical items. There would of course be other types of events where covert note taking would be unacceptable, such as at a funeral.

So if we go into these communities with the stated purpose of learning the language, and sometimes, permissibly or not, use that as a foot-in-the-door to access other areas of the culture, then how well do we in fact learn the language? As a general rule, not particularly well. This is not surprising sometimes, when the language being studied isn’t actually used commonly, or when it’s a very difficult language to acquire. Becoming proficient in a language, even if you’re a linguist, takes either the sort of linguistic genius that only a few people in our trade possess, or long-term, deep integration and years of targeted personal investment, which, to be frank, not many of us at the table are all that willing to put in. We all have pressures in our daily lives – teaching duties, other jobs, families and friends – that mean it’s just not possible, or desirable, to go to the field for any more than three weeks, and even for a trip as short as that, we always take our stove-top espresso machines.

This may have been a result of the fact that those linguists who do live in the communities whose languages they study, typically cannot make it to Linguistics in the Pub on a Tuesday night in Melbourne.

Looking more into the idea of covert data collection, we all agree we’re lucky that we’re linguists, and not, for example, anthropologists of religion. I brought up this particular subject as I have friends who do religious studies, and one in particular is gaining entry into, and covertly studying, an otherwise clandestine community by allowing the community members believe that they are interested in joining.

We also discussed the pragmatic considerations of being taken for a member of the community, and how this can affect our research. Lauren related the experience of a colleague, Amos Teo, who quite coincidentally happens to look like the members of the community whose language he is studying and therefore enjoys easier access faster than others, such as Lauren herself, although this too comes with its own downsides. Jill Vaughan, who’s working on the identity and sociolinguistics of the Irish diaspora, and who has a variable accent from standard Australian right through to Northern Irish, has become aware in listening to her recordings that the degree to which she subconsciously uses her Irish accent can have an effect on the interview, and being taken for a member of the community can mean that her interviewees are less willing to be explicit about certain things, believing her to just know this stuff in virtue of her in-group membership.

We found that questioning the role of participant observation in our data collecting mean that we had to be more honest with each other about what it mean to be working ‘in the field.’ We also felt that we have a lot to learn from anthropological practices, but as always, each field situation, and each field worker is unique and deserves its own considered approach.

The topic for June’s Linguistics in the Pub will the use of technology in the field; the advantages and disadvantages, a topic raised on this very blog by Peter Austin some weeks ago. We will meet at Prince Alfred Hotel on Tuesday June 26th at 6pm. See the LIP page on or contact Ruth Singer for more details.

Unveiling the new and improved ELAC

Cross-posted from Transient Langauges and Cultures

This blog is now well into its fifth year and in all that time, not much has changed (apart from the new ‘look’ which was imposed on us from above). But a major development has now taken place: we have moved to a new home.

Regular readers will know that many contributors to this blog (such as Peter Austin, Jenny Green, David Nash among others) do so under Jane Simpson’s user account. This is because the blog’s user accounts are managed as part of the University of Sydney’s wider authentication system, meaning that only staff or students of the university could have an account.

Now, Jane Simpson has moved to the Australian National University, so we decided late last year to migrate the blog out of the confines of the Sydney University user authentication system and host it ourselves, on a server that PARADISEC won in 2008.
Continue reading ‘Unveiling the new and improved ELAC’ »

One missing slash equals an object lesson in keeping backups

This semester, I have been helping out Jane with her wonderful Field Methods class in technical matters such as recording, uploading files onto the server and allowing students to securely and quickly download both .wav and .mp3 files. I took this course myself some years ago, and it was a great experience for me and the whole class, and many members of that class have continued on in their studies to do field research of their own, and I’m sure the Field Methods class was as much a help to their research as it was to mine.
But this post is not about when I took the class. Instead, it’s about how I almost buggered up this semester’s class in what can best be described as a lesson in keeping backups of your recordings.
(Warning: Some computer nerd stuff follows after the fold.)

Continue reading ‘One missing slash equals an object lesson in keeping backups’ »

More Good News

Following on from Jane’s announcement during the week of all the great news regarding successful grant applications, I have another bit of good news to share: James McElvenny and I recently applied for, and even more recently received, a grant from a philanthropic foundation to support our current work in compiling dictionaries.

Continue reading ‘More Good News’ »

2000 Hours

Early this morning, a delivery of audio files was quietly sent from Paradisec’s local server at the University of Sydney to permanent near-line tape storage at the Australian Partnership for Advanced Computing in Canberra. This happens on many days, as you might imagine, but what makes today’s delivery special, was that somewhere in that bunch of files was our 2000th archived hour of audio.
Moreover, we will soon be celebrating five years of operations, in which case, 2000 hours might not seem so impressive – it’s just 400 hours per year after all – but we at Paradisec are very proud of our collection. Especially given that just about everything here is done on a shoestring budget and there have been some lengthy hiatuses of funding lately.
Speaking of which, this may be an opportune time to mention that we are always amenable to generous donations from people wishing to sponsor the digitisation and preservation of a collection of data. See our website for more details.
So, just which file was the lucky 2000th hour? Well, we can’t really be sure, but we do know that it was among a collection of Mark Durie’s research into the dialects of Aceh, an area that was devastated by the Indian Ocean tsunami of Boxing Day 2006.
To help us celebrate both these milestones, Mark has kindly written a small piece for us about Aceh’s dialects, his research of them and the importance of preserving the collection. He has also allowed a small portion of one of these recordings to be posted with this piece, which you can download here.

Continue reading ‘2000 Hours’ »

Renovations, Repairs and Repositories

A lot of work has been happening at the University of Sydney over the past six months, and at the end of last year the top floor of the Transient Building, which houses Linguistics, Paradisec and a few other offices, got renovated. Unfortunately, since the entire exterior of the building is composed of fibrous asbestos, it’s unlikely that the University will outlay the mammoth insurance costs to do any exterior work. But anyone who knows the Transient building knows that the best option would be to demolish the whole thing and start again from scratch.

Continue reading ‘Renovations, Repairs and Repositories’ »

‘Baking Tapes’ or Analogue Audio Restoration

Last Friday was a bit of a milestone for me, since, in the 6 or so months that I have been involved in the audio preservation side of things at PARADISEC, I hadn’t yet actually cleaned a damaged audio tape. Unfortunately for me, the process isn’t quite as straight-forward as it is for a CD – warm soapy water, a non-abrasive cloth, wipe across the grain – rather, the entire process can take weeks, depending on how badly affected the tapes are.

Continue reading ‘‘Baking Tapes’ or Analogue Audio Restoration’ »