Archive for the ‘Software’ Category.

Is Toolbox the linguistic equivalent of Nietzsche’s typewriter?

There is an aphorism (apparently derived from Maslow 1966) that goes “if all you have is a hammer, everything looks like a nail”. For some documentary linguists reliance on the Toolbox software program means that everything linguistic looks like an interlinear gloss.

Toolbox (developed originally in 1987 as Shoebox by the Summer Institute of Linguistics) is a widely used data management and analysis tool for field linguists. It is designed for researchers to take units of transcribed text (typically ssentences) and semi-automatically “gloss” them to create multi-tier interlinearised text broken into words, which are then broken into constituent morphemes with aligned annotations such as sentence translations, morphemic translations, part of speech designations, and so on (for further discussion of interlinear text models see Bow, Hughes and Bird 2003).

Because Toolbox is free, and widely recommended for use in language analysis (it is commonly taught in training courses, such as InField, or ELDP grantee training, for example), it has had a large and constraining impact on how documentary linguists think they should do their research. I would suggest that it is a tool that has had a narrowing effect, like Nietzsche’s typewriter, as described by Carr 2008:

Sometime in 1882, Friedrich Nietzsche bought a typewriter—a Malling-Hansen Writing Ball, to be precise. His vision was failing, and keeping his eyes focused on a page had become exhausting and painful, often bringing on crushing headaches. He had been forced to curtail his writing, and he feared that he would soon have to give it up. The typewriter rescued him, at least for a time. Once he had mastered touch-typing, he was able to write with his eyes closed, using only the tips of his fingers. Words could once again flow from his mind to the page.

But the machine had a subtler effect on his work. One of Nietzsche’s friends, a composer, noticed a change in the style of his writing. His already terse prose had become even tighter, more telegraphic. “Perhaps you will through this instrument even take to a new idiom,” the friend wrote in a letter, noting that, in his own work, his “‘thoughts’ in music and language often depend on the quality of pen and paper.”

“You are right,” Nietzsche replied, “our writing equipment takes part in the forming of our thoughts.” Under the sway of the machine, writes the German media scholar Friedrich A. Kittler , Nietzsche’s prose “changed from arguments to aphorisms, from thoughts to puns, from rhetoric to telegram style.”

I believe that how annotation is conceptualised in language documentation, and presented in reference works like Schultze-Bernd 2006, reflects the narrowing influence of software tools like Toolbox and the dominance of interlinear glossing as an analytical method.

An alternative, developed originally by David Nathan, that we recommend at SOAS for corpus creation, is summary or overview annotation:

An overview annotation can be considered as a kind of “roadmap” or index of a recording. It could consist of approximately time-aligned information about what is in the recording, who is participating, and other interesting phenomena. For example, you could write:

“from 1 to 3 mins Auntie Freda is singing the song called Fat frog; from 3-7 mins Harry Smith is telling a story about joining the army; from 7-10 mins there is some interesting use of applicative morphology; from 15-18 mins contains rude content that should not be used for teaching children”
This could be written as prose (as above) or, better, structured into a table.

If you are familiar with software such as Transcriber or ELAN, you can do an overview annotation by marking breaks in topics/speakers etc, and typing descriptive text into the segments between breaks. Another strategy is to simply type a number into the time-aligned segment and then create a table which links the numbers with the overview information categories.

Interlinearisation of the Toolbox type is very time consuming (see my blog post on how much time transcription and interlinear annotation takes) while overview annotation can be done rapidly and relatively richly for a whole corpus, rather than the magical 10% of it too frequently referred to in the literature on linguistic annotation. This means that potentially it is a good alternative to the restricted representations that have been affected, like Nietzsche’s typewriter, by the very tool that documenters have come to rely upon.

References

Bow, Cathy, Baden Hughes and Steven Bird. 2003. Towards a general model of interlinear text. EMELD paper. [available online at http://emeld.org/workshop/2003/bowbadenbird-paper.pdf, accessed 2012-04-21]

Carr, Nicholas. 2008. Is Google making us stupid? What the internet is doing to our brains. Atlantic Magazine July/August 2008.

Maslow, Abraham. 1966. The Psychology of Science: A reconnaisance. New York: Harper Collins.

Schultze-Bernd, Eva. 2006. Linguistic annotation. In Jost Gippert, Nikolaus P. Himmelmann and Ulrike Mosel (eds.) Essentials of Language Documentation, 213-251. Berlin: Mouton de Gruyter.

Books, HTML, audio, images – falling out from fieldwork

I’ll be going to Vanuatu next month courtesy of Catriona Hyslop’s DoBeS project, to help build an installation of three computer-based interactive dictionaries (Vurës, Tamambo and South Efate) for the Museum there. We will have hyperlinked dictionaries with sound and images where possible. All of this will be HTML-based for low maintenance and to allow new dictionaries to be added to the set over time. This post is aimed at outlining the method used to get these various files into deliverable formats and follows on from an earlier one where I talked about using ITunes to get media back to the village.


Continue reading ‘Books, HTML, audio, images – falling out from fieldwork’ »

A new transcription system

Just over a year ago I wrote a blog post about some of the parameters involved in transcribing media files, and how long it takes to do various sorts of transcription, translation and annotation tasks. In the commentary on my post, the ELAN transcription software tool developed at the Max Planck Institute for Psycholinguistics at Nijmegen came in for some criticism. Thus Ariel Guttman wrote that it was: “highly non-user friendly and non-ergonomic, especially since using the software only through the keyboard is not so easy” and “the people at the MPI should start designing their software with user-friendliness in mind”. Stuart McGill agreed: “you’re spot on with your comments on ELAN and keyboard use” and “transcription in ELAN is simply slow(er than it could be), no matter how well you know the program”. Stuart had decided that Transcriber, despite not handling special characters, was a better tool for his needs.

Well, as a result of user consultation involving Mark Dingemanse, Jeremy Hammond, and Simeon Floyd, the programmers at MPI-Nijmegen have now released ELAN version 4.1 which has a new “Transcription Mode” that Mark and Jeremy describe in a blog post as “designed to increase the speed and efficiency of transcription work. The interface is keyboard-driven and minimizes U[ser] I[nterface] actions”. Further details about the new mode and how to set it up and use it can be found in the blog post. It will be interesting to hear user reactions to the new facility over coming months.

Now, if someone would do a user consultation about the metadata browser IMDI, also developed at MPI-Nijmegen …