Archive for the ‘Technology’ Category.

Books, HTML, audio, images – falling out from fieldwork

I’ll be going to Vanuatu next month courtesy of Catriona Hyslop’s DoBeS project, to help build an installation of three computer-based interactive dictionaries (Vurës, Tamambo and South Efate) for the Museum there. We will have hyperlinked dictionaries with sound and images where possible. All of this will be HTML-based for low maintenance and to allow new dictionaries to be added to the set over time. This post is aimed at outlining the method used to get these various files into deliverable formats and follows on from an earlier one where I talked about using ITunes to get media back to the village.

Continue reading ‘Books, HTML, audio, images – falling out from fieldwork’ »

A new transcription system

Just over a year ago I wrote a blog post about some of the parameters involved in transcribing media files, and how long it takes to do various sorts of transcription, translation and annotation tasks. In the commentary on my post, the ELAN transcription software tool developed at the Max Planck Institute for Psycholinguistics at Nijmegen came in for some criticism. Thus Ariel Guttman wrote that it was: “highly non-user friendly and non-ergonomic, especially since using the software only through the keyboard is not so easy” and “the people at the MPI should start designing their software with user-friendliness in mind”. Stuart McGill agreed: “you’re spot on with your comments on ELAN and keyboard use” and “transcription in ELAN is simply slow(er than it could be), no matter how well you know the program”. Stuart had decided that Transcriber, despite not handling special characters, was a better tool for his needs.

Well, as a result of user consultation involving Mark Dingemanse, Jeremy Hammond, and Simeon Floyd, the programmers at MPI-Nijmegen have now released ELAN version 4.1 which has a new “Transcription Mode” that Mark and Jeremy describe in a blog post as “designed to increase the speed and efficiency of transcription work. The interface is keyboard-driven and minimizes U[ser] I[nterface] actions”. Further details about the new mode and how to set it up and use it can be found in the blog post. It will be interesting to hear user reactions to the new facility over coming months.

Now, if someone would do a user consultation about the metadata browser IMDI, also developed at MPI-Nijmegen …

Online interlinear text, and streaming media without Flash

Playing media on the web is nothing new. But playing the media that results from fieldwork, having transcripts highlighted to track timecodes in the video or audio, and having it all work without needing to install Flash or other streaming servers, this is new. So is having a path to upload from common fieldwork tools like Elan, Transcriber and Toolbox.

In a blog item here in 2007 I reported that our team had built EOPAS, but at that time the web was not yet ready. With the advent of HTML5 things have changed significantly. We rewrote EOPAS and now it is ready for you to view, to explore stories from a number of languages, or to download and install on your own server. EOPAS is described here, and the requirements for uploading your own material are described on that page. The source code, in case you want to install your own instance of EOPAS, is here.

The working version can be seen at, and details of the schemas used for each of the upload formats can be seen here.

Programming by Silvia Pfeiffer and John Ferlito. Funded by the Institute for a Broadband-Enabled Society (IBES)

Australian Humanities research infrastructure funding

All Australian humanities scholars with an interest in digital scholarship should take this brief opportunity to read and comment on the federal government’s ‘2011 Strategic Roadmap for Australian Research Infrastructure’ discussion paper. Why? Because the two previous ‘Roadmaps’ funded hundreds of millions of dollars’ worth of ‘research infrastructure’, almost exclusively NOT in the Humanities, but including hugely expensive science tools like the $100 million Synchrotron. In the previous Roadmap in 2008 there was a section on the Humanities and Social Sciences that included reference to PARADISEC as an exemplary project building infrastructure for Humanities scholars. But not one cent went to support PARADISEC from that process.

There is a pressing need for developing eresearch skills among humanities scholars. While it has long been a part of the everyday practice of those in the physical sciences to use computing tools at a high level, it is only in the recent past that we have seen humanities computing emerge as a recognised area of interest. It is not unreasonable to place the proportion of researchers in the humanities and cultural industries actively engaged in advanced e-humanities research at less than 20 percent, with the remaining 80 percent likely at the bare minimum to have had contact with digital communication practices (email, discussion lists) and search techniques (online library catalogs and databases). Computing skills are much more than word processing — and most humanities scholars do not realise this. There is a lot of scope for computing in the humanities, especially in recognising the structure of the material we work with and how representations of that material should derive from well-formed data, rather than being the primary aim of the researcher. Thus, for example, there is a current need to distinguish multimedia products which are ends in themselves, from those which derive from data which has some longevity. We have learned that handcrafting our output often locks it into proprietary formats that will not be legible in a short period of time. Data needs to be safeguarded for posterity.

In order for these aims to be achieved, we need to establish work practices and appropriate data-sets now. Data-sets are being produced routinely in the course of our research, but usually there is no focus on conforming to standards of data structure, nor to the large problems of managing this data and storing it safely for later reuse. Much of this data is stored in analog form and so is becoming largely unusable due to the obsolescence of the machinery on which it was recorded, or the deterioration of the media itself.

The product of our tax-payer funded research should be reusable and accessible in the future. Data should be held in non-proprietary formats and have good descriptions which conform where possible to international standards (like Dublin Core).

These infrastructure needs can be summarised under three broad headings: advocacy, research/development and data management.

There is a need for advocacy among humanities scholars to promote good practices in working with digital data and to bring to their attention ways of working that will make their work easier, but will also have better outcomes for data sharing or reuse.

There is a need for research into existing and emerging methods and development of tools that could be applied within the local context.

There is a need for data management skills to be developed among humanities scholars. There are a number of projects which have been completed, and for which there are now large datasets that are not being properly maintained. We need good descriptive systems (metadata) and easy to use systems for data entry, as well as longterm data curation for this work.

In practical terms these three aims could be assisted by establishing a national unit with local representation which would:

  • serve as a reference point for scholars engaged in or interested in projects at the intersection of digital technologies and the humanities;
  • provide advice and support
  • anticipate and identify issues for the development of the e-humanities, both conceptual challenges and technical and resource matters;
  • foster digital resource-building ;
  • provide training and skills development in the use of new and existing resources, particularly emphasising the requirements of the next generation of researchers who are currently postgraduates and early career researchers;
  • disseminate information on the latest digital research initiatives being implemented by Australian and overseas institutions;
  • form collaborations and stimulate new kinds of interdisciplinary research.
  • provide targeted training programs,
  • foster exchanges of personnel and expertise amongst projects and disciplines nationally and internationally.

Responses to the Discussion Paper should be sent to by COB Wednesday, 4 May 2011.

Following analysis of the responses to the Discussion Paper, an Exposure Draft Roadmap will be developed and released for further consultation in June 2011

(Note: I am one of the authors of the current discussion paper, but would be delighted to see an informed debate on its contents and positive suggestions for directions for the policy.)

Consortium on Training in Language Documentation and Conservation (CTLDC)

I recently attended a symposium titled Models for capacity development in language documentation and conservation hosted by ILCAA at the Tokyo University of Foreign Studies. The symposium brought together a group of people involved in supporting language work in the Asia-Pacific region in various ways (see the website for a full list): academic (Institute of Linguistics, Minhsiung, Taiwan, Beijing, China, Goroka, PNG, Batchelor, Australia, Bangkok, Thailand) and community-based (Manokwari, West-Papua; Tshanglalo, Bhutan; Bhasha Research Centre and Adivasi Academy, Gudjarat, India; Miromaa, Australia), using film (Sorosoro, France), or archiving language records (PARADISEC). The aim of the meeting was to build a network that would continue to link between training activities to support language work, the Consortium on Training in Language Documentation and Conservation (CTLDC), whose planning group members are listed here.

Continue reading ‘Consortium on Training in Language Documentation and Conservation (CTLDC)’ »

Wunderkammer Import Package 2 final release

The final release of Wunderkammer Import Package 2 is now available for download. Check out the Wunderkammer website for more info.
Thanks to everyone who pointed out bugs and made suggestions for improvement. In this release several bugs have been squished and a bit of input validation and some friendlier error messages have been added.
Work now begins on version 2.1! Keep the bug reports and other comments coming.

Wunderkammer Import Package 2

The latest version of the Wunderkammer mobile phone dictionary software, Wunderkammer Import Package 2 Beta, is now available for download. The major advance in this distribution is a new easy to use graphical user interface. There’s also a new set of documentation to go with the new user interface.
This is a beta release. We invite bug reports and suggestions for improvement on the PFED discussion board or by e-mail at james followed by the at sign pfed dot info.
The Wunderkammer website has also got a new layout and look.

How can we get the material we have used in our research back to the people we recorded?

Every time I revisited my fieldsite I was asked for copies of photos or recordings and I wanted some way that these could be accessed without me having to be present. When I started visiting Erakor village in central Vanuatu there was intermittent electricity available, usually only in the evenings in the house I lived in.

Continue reading ‘How can we get the material we have used in our research back to the people we recorded?’ »

Bypassing written documentation – oral annotation of recorded text

A large corpus of recorded oral tradition can be created using two recording machines, one playing back the spoken texts and the other used to capture an oral annotation. Recording speakers who are commenting on earlier recordings is a method for providing annotations that bypasses literacy.

Continue reading ‘Bypassing written documentation – oral annotation of recorded text’ »

Wagiman electronic dictionary

Aidan Wilson went up to Pine Creek and Kybrook Farm in the Northern Territory last week to deliver the various versions of the Wagiman electronic dictionary to the Wagiman community. You can read about it at the Project for Free Electronic Dictionaries blog.