Streaming access to transcribed media

After some effort PARADISEC has finally established a streaming server that can be used in normal web pages. This means that an online dictionary, for example, can have example headwords and sentences spoken, or video clips presented to illustrate a given word. You can see the trial version here, (NB this will only work with the Firefox browser and you will also need to pre-install the Annodex plugin).
For some time it has been troubling that we have no simple way of presenting media online in association with transcripts, especially when an archived field recording may be the only recording of a particular language. It should have been simple enough to access media on the web. After all, we do it on Youtube and other places. But we have been further constrained by really wanting all of this to be open source (freely available software) so that anyone with the right skills can replicate this setup and not have to pay. And we also wanted the process for getting material into an online presentation to follow on from normal fieldwork outcomes, in line with output from the tools typically used by a professional linguist (one who keeps up to date with the methods of their profession). When the archival form of the media exists in a repository, it should then be an automatable process to put it into a streaming server for access.


With various partners, PARADISEC has been part of an ARC-funded project called EthnoER, which has developed an online presentation of media and time-aligned transcripts. By uploading a media file and the transcript in Toolbox format we can present interlinear glossed text as seen in EOPAS here (the media needs to be played in Firefox using plugins available here). The enabling technologies are Annodex (developed by CSIRO) and EOPAS (the EthnoER online presentation and annotation system, developed in collaboration with Ronald Schroeter of the University of Queensland).
Once media is available for streaming, it can be called using fairly simple HTML with a javascript as seen in the trial version here. From PARADISEC’s perspective, we should be able to use this technology to make archival media available (subject to deposit conditions) via a web browser.
To get to this point we have had to work through a number of issues. The idea is that there could be several Annodex servers, perhaps each associated with a linguistic archiving project. Selected media files are transcoded to one of the open-source OGG formats and then placed into the server (our thanks to Stuart Hungerford and Jonathan McCabe of APAC for their help in setting up the PARADISEC Annodex server), and then become available to be called in the way seen in the trial page. If you look at the HTML coding you will see that there is a javascript that controls how the call is made (thanks to Shane Stephens formerly of CSIRO, and now of Google for his assistance here). To get your data into this form you will need to convert the time formats etc. to match the structure of the demo document. (To get you started, I have written an export routine in Audiamus 2.5 to create a skeleton time-aligned document in the correct format).
More technicalities about this process are discussed on the EthnoER Wiki page.

3 thoughts on “Streaming access to transcribed media”

  1. And check out Silvia’s blog for more ideas about video on the web and a comment on EthnoER – only inaccuracy being that she has the post as written by Linda instead of Nick Thieberger.

  2. What you’re saying is completely true. I know that everybody must say the same thing, but I just think that you put it in a way that everyone can understand. I also love the images you put in here. They fit so well with what you’re trying to say. I’m sure you’ll reach so many people with what you’ve got to say.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment