The AudioText Linking Tool

Brett Baker and Michael Kovacs (UNE, Armidale)

How we came to ask the question.

  • dissatisfaction with current formats for electronic dictionaries.
  • incorporating audio required firstly, digitising analog recordings, and secondly incorporating them somehow with a dictionary database
  • problems with the FMP format: proprietary software, x-platform sound issues, scripting stupidity
  • suggestion to go to browser
  • loses functionality on the database side, but this is more a problem for the researcher than the client base
  • advantage is that we can use open code, well-understood tools for building dictionaries
  • and we can use currently existing tools for playing sections of audio

 

The current project.

The components of the linking tool are the following:

server-side database and associated interfaces:

'audio', 'dictionary', 'admin', ('help')

URL: http://turing.une.edu.au/~linking

 

What's it good for?

·         The linking tool is basically an archive for digitised audio together with transcripts which are time-aligned to the audio, which look like this:

15.8.00.02.xml

 

The linking tool has a number of uses:

 

archive for digital recordings

  • because these are transcribed, information in the recordings can be quickly located and downloaded
  • the tool has an admin interface for users, allowing them to upload audio, transcript and dictionary files and (depending on their competence) query the database in SQL language
  • files uploaded to the database may be viewed immediately
  • advantages over other audio archives

·         the tool relies almost entirely on open source code and applications which most people already have on their machines. No one needs to download and install new software to use the tool. Completely x-platform.


 

the dictionary interface can be used as an audio dictionary

  • words in the dictionary which have corresponding recorded versions in the audio files can use those segments as audio examples of the word

advantages over other online, audio dictionaries

  • no slicing and dicing. The audio downloads only the minimal segment which contains the word in question, hence saving bandwidth
  • the audio is in mp3 format, to further enhance bandwidth savings. This is perfectly acceptable for most linguistic purposes

·         there are no issues with file names, because there is no direct association between a dictionary entry and a segment of audio. The audio file names do not have to be updated when/if the dictionary changes

 

language learning tool

  • the dictionary/audio-transcript interface used in reverse acts as a language learning tool
  • students can be presented with texts in any language, play the audio, and look-up definitions and other examples of words they don't recognise

advantages over other language learning tools

·         the linking tool is not dedicated software, it can be update continuously, never loses links between the various parts, and can be used for other purposes besides language teaching. This makes it quite different to the majority of CALL tools out there at the moment

 

Compatibility

  • At the moment, the linking tool accepts .mp3 audio files, and transcript and dictionary files in a given XML format.
  • There is a conversion tool on the admin interface to convert transcripts from SoundIndex files into the XML format, and to convert standard CSV format dictionaries into the XML format. We are currently working on allowing transcript files from CLAN and from Transcriber to be converted and uploaded, since these seem to be the most commonly used transcription linking applications in Australia, at least.
  • It should be possible (in the future) to allow conversion of (standard) 'back-slash coded' dictionary files, which seem to be everywhere. The problem is that everyone uses their own codes.
  • The only compatibility issue we've come across is that older versions of browsers without a current media player version that accepts .mp3 format will not be able to play the audio files.

For the future

  • Right now, Michael is working on a few improvements:
  • Ability to upload data from multiple languages, and possibility of multiple users with multiple languages, some access issues yet to be resolved
  • Addition of CLAN and Transcriber format transcripts to available types
  • Regex searches on the database

·         Better layout of the dictionary: sliding side bar with all headwords