Keeping track of what is recorded in the course of fieldwork is critical, both for your own future work and for longterm archiving. Recordings of dynamic performance (audio or video) are easy to misplace or misidentify and very difficult to locate once you forget what a file was named and what you recorded on a particular day. We ran a survey about how people record their metadata from January 21st to April 25th, 2016 and had 142 responses (see also the earlier blog post here). There were two multiple choice questions each allowing selection of more than one checkbox and the entry of free text responses. I can send the full results of the survey on request. This information will help inform the development of new tools for metadata entry. The responses are summarised below.
Archive for the ‘Technology’ Category.
HyperCard (HC) was a brilliant program that came free with every Macintosh computer from 1987 and was in development until around 2004. It made it possible to create multimedia ‘stacks’ (of cards) and was very popular with linguists. For example, Peter Ladefoged produced an IPA HyperCard stack and SIL had a stacks for drawing syntactic trees or for exploring the history of Indo-European (see their listing here). Texas and FreeText created by Mark Zimmerman allowed you to create quick indexes of very large text files (maybe even into the megabytes! Remember this is the early 1990s). I used FreeText when I wrote Audiamus, a corpus exploration tool that let me link text and media and then cite the text/media in my research.
My favourite HC linguistic application was J.Randolph Valentine’s Rook that presented a speaker telling an Ojibwe story (with audio), with interlinear text linked to a grammar sketch of the language. I adapted that model for a story in Warnman, told by Waka Taylor, and produced as part of a set of HC stacks called ‘Australia’s languages’ and released in 1994. Continue reading ‘Reading HyperCard stacks in 2016’ »
In the spirit of solving small frustrations I offer my weekend experience of getting Toolbox files into Elan. I have over a hundred texts in Nafsan, most of which are time-aligned and interlinearised. I am working with Stefan Schnell on adding GRAID annotation to some of these texts and the preferred way of doing this is in Elan, with the GRAID annotation at the morphemic-level. I tried importing Toolbox files using the Elan ‘Import’ menu, and had listed all field markers in Toolbox, together with their internal dependencies (which should then map to Elan’s relationship between tiers). These settings are stored in an external file. Unfortunately, the import failed several times, despite changing the settings slightly after each attempt. Continue reading ‘Toolbox to Elan’ »
We are in the process of identifying gaps in tools for fieldwork and data analysis that can be filled as part of the Centre of Excellence for the Dynamics of Language. I’d like to ask for your input into the requirements for a metadata entry tool. In part, this analysis asks for your opinions on the value of existing tools (listed below) and their relative strengths and weaknesses, and asks if it may be worth putting effort into developing any of them further, rather than starting from scratch.
The high-level requirement of this tool is to make it easy to describe files created in fieldwork, to be available both off- and on-line and to deliver the description as a text file for upload to an archive. This includes capturing as much metadata from the files themselves; providing controlled vocabularies of terms to select from (preferably via drag-and-drop rather than keyboard entry); allowing the metadata to be exported in a range of formats to suit whichever archive will host the collection; allowing the metadata to be imported to the tool for use by collaborative team members; allowing controlled vocabularies to be amended to suit the local situation. This tool could also allow users to visualise the state of a collection: which media files have been transcribed, which have been interlinearised, have text files been scanned, OCRed …. what other processes have been applied, which have been archived, what the rights are for each file, also allowing the user to specify what these criteria are for their own type of collection.
You can either add comments below, or else write to me separately (thien [at] unimelb.edu.au) with your ideas that can contribute to how we develop this tool.
David Nathan writes
EL Publishing is a new online publisher which was launched on 18th July and which will publish a journal, multimedia, and monographs, focussing on documentation and description of endangered languages. EL Publishing has an international editorial board and operates a fully double-blind peer-review process for all submitted materials.
Lauren Gawne recaps last night’s Linguistics in the Pub, a monthly informal gathering of linguists in Melbourne to discuss topical areas in our field.
Our first Melbourne LIP for the year at our regular venue got off to a rocky start when the function room was usurped by the local Touch Football team. Fortunately, we had such an excellent turn out – especially of local honours and PhD students – that we were able to make do in the general area by breaking up into smaller groups to discuss this month’s topic.
Most of the points discussed below are from either the discussion I participated in, and the general summary discussion we had at the end. This means ideas and discussion points may not be attributed to the correct people, but you’re welcome to add clarifying remarks in the comments below.
Continue reading ‘Things you can do with outputs from language documentation projects: A LIP discussion’ »
Announcing the conference “Research, records and responsibility (RRR): Ten years of the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC)”
Dates: 2nd-3rd December 2013
Venue: University of Melbourne, Australia
Associate Director General (Academic)
Archives and Research Centre for Ethnomusicology
American Institute of Indian Studies
For details and the call for papers see: http://paradisec.org.au/2013Conf.html
This will coincide with the Workshop on digital tools and methods for language documentation on the 3rd-4th December 2013
Alexandre Arkhipov (Moscow State University) on methods used by his research group to build an integrated documentation and analysis system.
Andreas Witt (Head of the TEI-SIG, Institut für Deutsche Sprache, Mannheim) on the Text Encoding Initiative-Special Interest Group and TEI for linguists.
For details and the call for papers see: http://paradisec.org.au/2013ParadisecToolsMethods.html
ExSite9 is an open-source cross-platform tool for creating descriptions of files created during fieldwork. We have been working on the development of ExSite9 over the past year and it is now ready for download and use:
ExSite9 collects information about files from a directory on your laptop you have selected, and presents it to you onscreen for your annotation, as can be seen in the following screenshot. The top left window shows the filenames, and the righthand window shows metadata characteristics that can be clicked once a file or set of files is selected.The manual is here: http://bit.ly/ExSite9Manual
Researchers who undertake fieldwork, or capture research data away from their desks, can use ExSite9 to support the quick application of descriptive metadata to the digital data they capture. This also enables researchers to prepare a package of metadata and data for backup to a data repository or archive for safekeeping and further manipulation.
Scholars in the Humanities, Arts and Social Sciences (HASS) typically need to organise heterogeneous file-based information from a multitude of sources, including digital cameras, video and sound recording equipment, scanned documents, files from transcription and annotation software, spreadsheets and field notes.
The aim of this tool is to facilitate better management and documentation of research data close to the time it is created. An easy to use interface enables researchers to capture metadata that meets their research needs and matches the requirements for repository ingestion.
Ruth Singer recaps some of the interesting points of the last week’sLinguistics in the Pub, an informal gathering of linguists and language activists that is held monthly in Melbourne
A number of linguists in Melbourne have recently begun documenting child language in the field. In the November 2011 LIP we discussed what you need to think about if you want to document child language and why you might document child language as part of a broader language documentation project (see blog at http://www.paradisec.org.au/blog/2011/11/child-language-documentation-a-lip-discussion/). The most recent LIP, led by Lauren Gawne and Birgit Hellwig last week, revisited the topic of child language documentation. This allowed those who have recently returned from the field to discuss some of the problems they faced and how they dealt with them. In particular, we looked at the gap between what is possible in remote fieldsites and some of the assumptions in the field of child language acquisition about what type of data is needed to study child language development. The quantity and frequency of data that can be collected in remote fieldsites is quite different to what can be done in the developed world. The limitations can be quite simple. For example, not being able to get accurate information on children’s ages.
To kick off the discussion we looked at ethics, from a personal point of view. The previous LIP on child language was criticised for focussing too much on the requirements of institutional ethics boards at universities, schools etc. So we discussed what types of decisions researchers had made to satisfy their own ethical concerns. A number of researchers said that they had no plans to make their recordings public. This goes against the current trend to make recordings of endangered languages as open as possible, given community consent.
Just to give an example, I have decided to keep access to my recordings of child language closed, until the children are 18. If they are happy for me to open access to their recordings after they are 18, I will do so. However since I am currently recording children in groups at least 3 people, it is likely that in many cases I will not be able to contact all participants so the recording will remain closed. One of the issues we returned to a number of times in the evenings is that our recordings are often made in open environments, which means that many people wander through the field of view. This is in contrast to mainstream child language data, which is usually made in a room through which only a limited number of people pass by. It was mentioned that the CHILDES language database is a great example of an open access archive but it lacks much data from endangered languages. CHILDES contains data recorded from many different studies of child language acquisition. However to upload data to CHILDES you must have the consent of every person who appears, even if just walking past. This is not going to be possible for many recordings of endangered languages in remote areas. It is often difficult to find a room to record in and even if one is found, it is likely that many people will pass through it.
Some of the other assumptions about child language acquisition research that can prove difficult in remote settings:
- that a mother and child pair form natural conversational partners (they may rarely engage in idle chit-chat)
- that adults typically play with children (it may be the case that children typically play with other children, not adults
Since it is often difficult for a mother-child pair to engage in conversation in front of the camera, some suggested structured tasks, such as those used at Max Planck Institute for Psycholinguistics. Although others pointed out that this makes it difficult to study language socialization, because you are asking people to engage in a culturally foreign activity. Others suggested identifying local games that could be used in language acquisition research.
One big problem in applying the standards of child language acquisition research to remote contexts is the difficulty of obtaining recordings of the same child over regular intervals. Many of the linguists attending the LIP session work in Papua New Guinea and Australian Aboriginal communities. They pointed out that children and often their whole families move around much more than they had expected. The set of children living in a community may barely overlap from one fieldtrip to the next. In addition, some child language researchers recommend making recordings every 2 months or so, and it is not possible to do this in remote settings. The limitations are partly financial and partly due to the time needed for the linguist to travel to the remote location from their home.
There was quite a bit of time devoted to the technology used to record children, who are rather more mobile than adults. One researcher recommended the use of teddy-bear shaped backpacks for children. These can carry the heavy transmitter of the radio microphone. Everyone agreed that noise is a big issue. Even if there is no wind, which small radio microphones don’t handle well, children’s motion invariably causes noise. One researcher only recorded in areas without many leaves as the noise of these being crushed beneath children’s running feet was too loud.
Birgit Hellwig discussed some of the data from her recent 2 month fieldtrip to Papua New Guinea which she did with child language acquisition specialist Evan Kidd (ANU). She said that by the end of the 2 months, the community they were visiting had more or less gotten used to the cameras and exactly what it meant to have child language researchers in the community. One thing that Birgit emphasised is that what participants need to do is not as obvious to them as we might think. Birgit gave a lovely example in the use of the frog story task. The frog story ‘Frog: where are you?’, is a short children’s picture book without any words. Children were asked to tell the story in their own words. It became apparent during the course of Birgit’s 2 month fieldtrip that changes in how children told the story from week to week were related to narrative practices in the community. The story was circulating in the community, just as any story does, and changing slightly over the course of time. Rather than each new child that particpated in the task telling the story afresh, ‘in their own words’, each told it as it was in its current form in the community. This resulted in remarkable convergence between tellings that were recorded around the same time.
It became clear from the discussion that we can’t expect to do research on child language in the same way as it is done in more controlled environments. We will not get comparable quantities of data for each child. However, whatever we do record is likely to be really interesting. We only have data on child language for a small number of languages, so anything will help.
Lauren Gawne recaps last night’s Linguistics in the Pub, a monthly informal gathering of linguists in Melbourne to discuss topical areas in our field.
This week at Linguistics in the Pub it was all about technology, and how it impacts on our practices. The announcement for the session briefly outlined some of the ways technology has shaped expectations for language documentation:
The continual developments in technology that we currently enjoy are inextricably connected to the development of our field. Most would agree that technology has changed language documentation for the better. But while nobody is advocating a return to paper and pen, most would concur that technology has changed the way we work in unexpected ways. The focus is usually on the materials we produce such as video, audio and annotation files as well as particular types of computer-aided analysis. In a recent ELAC post, ‘Hammers and nails‘ Peter Austin claims that metadata is not what it was, in the days of good old reel-to-reel tape recorders. The volume of comments suggests that this topic is ripe for discussion. This session of Linguistics in the Pub will give us a chance to reflect on how our practices change with advances in technology.
There are a (very) few linguists who advocate that researchers should go to the field with nothing beyond a spiral-bound notebook and a pen, though no one at the table was quite willing to go that far; all of us, it seems, go to the field with a good quality audio recorder at the very least. Without the additional recordings (be they audio or video) the only output of the research becomes the final papers written by the linguist, which are in no way verifiable. The recording of verifiable data, and the slowly increasing practice of including audio recordings in the final research output are allowing us to further stake our claim as an empirical and verifiable field of scientific inquiry. Many of us shared stories of how listening back to a recording that we had made enriched the written records that we have, or allow us to focus on something that wasn’t the target of our inquiry at the time of the initial recording. The task of trying to do the level of analysis that is now expected for even the lowliest sketch grammar is almost impossible without the aid of recordings, let alone trying to capture the subtleties present in naturalistic narrative or conversation. Continue reading ‘Technology and language documentation: LIP discussion’ »