Archiving Strategies for Multimedia Language Documentation

 

Peter Wittenburg

MPI for Psycholinguistics, DOBES

 

Language Documentation is recognised as an urgent measure to preserve part of human cultural heritage for future generations and to provide material that could help to revitalise languages where they are close to becoming extinct. It is evident that the documentation has to include traditional linguistic material such as texts, grammars and dictionaries, but to illustrate the phonetics (the sounds of a language, the prosodic parameters) audio recordings are also required. Revitalisation efforts are possible with such written material alone. However, for long-term preservation purposes it was clear in the DOBES programme that language documentation has also to be heavily based on video recordings. After some decades, languages will only be understood when impressions about the living circumstances of the speakers are available. Since it was also understood that young people in the speech community are much more attracted by video scenes, also for the revitalisation purpose video recordings are of increasing relevance. So, language documentation has to cover all media.

 

In DOBES and at the MPI, many researchers, mostly linguists, but also ethnologists, musicologists and others, undertake fieldtrips and take care of the documentation aspect. The MPI has also to take care of the archiving aspect. Archiving here means to take efforts that the documentation material will be stored and accessible forever. Given continuously decreasing lifetimes of our storage media and our encoding standards, this seems to be an unsolvable problem. Indeed we cannot give guarantees, but we can optimize the probability that data will survive. It as understood that survival of data is primarily a matter of societal acceptance influenced by a number of factors: (1) Political attitude that can hardly be influenced by the archivists, so the only help is that we spread our material around the world. To be able to do this we need powerful networks and data-GRID-like techniques. (2) Attractiveness of the content after years that can be increased by including multimedia recordings of many different situations. Here it is also relevant that we carry out our recordings with the highest possible resolution and quality. (3) Involvement of recognised institutions, however Digital Libraries are new and there are no established business models. We are not in need of big stone buildings, but of smart people. Traditional libraries are in general not ready yet to take over the job. (4) Cost efficiency of operation that is largely hampered by the development dynamics of (storage) technology. The only viable, although costly solution is to continuously and automatically migrate to new technology. Cultural Heritage material can survive if it is seen as an integral part of the general data that has to be preserved in the same way by automatic procedures. (5) Quality of data management and technology used, which basically is defined by the degree of accessibility and interpretability and the state and dynamics of maintenance. We prefer the “transform-now” method, i.e. all data checked-into the DOBES archive is translated immediately to make it compliant to a limited set of standards allowing us, as well as subsequent generations, to treat all data in the same way. Interpretability of data is possible if documentation of the encoding standards used is part of the same copying and migration mechanisms. Only open technology standards should be applied therefore. The proper organization of the data has to be part of the archiving strategy. In DOBES it is the IMDI metadata framework that acts as glue for binding different resources that belong together such as an annotation and a video fragment or a lexicon and a set of recordings.

 

In DOBES we understand that only clear workflow agreements between the participating teams can guarantee that there is a good basis for an excellent and easy to maintain archive organization. The talk will explain which choices were made within the DOBES programme with respect to the 5 factors and which standards and tools are currently supported to establish comparatively easy access and maintenance of the archive.