Books, HTML, audio, images – falling out from fieldwork

I’ll be going to Vanuatu next month courtesy of Catriona Hyslop’s DoBeS project, to help build an installation of three computer-based interactive dictionaries (Vurës, Tamambo and South Efate) for the Museum there. We will have hyperlinked dictionaries with sound and images where possible. All of this will be HTML-based for low maintenance and to allow new dictionaries to be added to the set over time. This post is aimed at outlining the method used to get these various files into deliverable formats and follows on from an earlier one where I talked about using ITunes to get media back to the village.


Each of the dictionaries is in Toolbox, but the Tamambo dictionary (by Dorothy Jauncey) started out as a MS Word document that needed to be converted, using regular expressions, into a lexical database. Each dictionary was then processed through LexiquePro and exported to HTML, as can be seen in the online version here. The audio function needed tweaking to encode HTML5 media calls, but it didn’t take much work to get audio for 2,000 headwords into the Vurës dictionary. The process of getting the audio into the right shape started with a speaker being recorded reading headwords from a script. The recording was then time-aligned to the script using Transcriber and the resulting text file was exported to a ‘label’ format that could be imported into Audacity. Opening the audio file and the label file in Audacity, then selecting the ‘export multiple’ option resulted in a collection of short audio files, each named by the headword that they contain. These were then linked to in the HTML version by duplicating the headword in a tag that calls the audio, or else by using the contents of the \sf field as the source for the media file name.

In preparation for the trip to Vila I have also prepared two books to take to Erakor village, a dictionary and a collection of stories in South Efate and English. These books are printed by the publish-on-demand book machine at the University of Melbourne, with full colour covers and perfect binding at a cost of around $10 per copy. The pdf version is in the digital repository with handles to ensure persistent location and to allow open access free download of the content. The handle is also printed in the book, as is an ISBN number and a creative commons licence.

The data for each of these books came from Toolbox structured files. The dictionary is an MDF exported dictionary together with a finderlist. The stories are presented in English and South Efate without interlinear information (interlinear versions can be seen in Eopas), so need just the language and free gloss lines to be exported with the story’s metadata header (title, speaker, abstract). Inserting a tab before the free gloss line allows it to become the right hand cell in a table which has English in the right column and South Efate in the left column.

The books will be available for sale here or here and on Amazon (!) as part of the e-press function provided by the publishing centre, and copies will also be distributed by the World Oral Literature Project and there is no postage cost as they can use the pdf file to make as many (or as few) copies as needed.

So many possibilities!

One Comment

  1. Tom Honeyman says:

    Wow, sounds great. And I’d been wondering what would be the easiest way to get audio snippets out of ELAN too – thanks for that!

Leave a Reply