Archiving Strategies
for Multimedia Language Documentation
Peter Wittenburg
MPI for
Psycholinguistics, DOBES
Language Documentation is recognised as an
urgent measure to preserve part of human cultural heritage for future
generations and to provide material that could help to revitalise languages
where they are close to becoming extinct. It is evident that the documentation
has to include traditional linguistic material such as texts, grammars and
dictionaries, but to illustrate the phonetics (the sounds of a language, the
prosodic parameters) audio recordings are also required. Revitalisation efforts
are possible with such written material alone. However, for long-term
preservation purposes it was clear in the DOBES programme that language
documentation has also to be heavily based on video recordings. After some
decades, languages will only be understood when impressions about the living
circumstances of the speakers are available. Since it was also understood that
young people in the speech community are much more attracted by video scenes,
also for the revitalisation purpose video recordings are of increasing
relevance. So, language documentation has to cover all media.
In DOBES and at the MPI, many researchers,
mostly linguists, but also ethnologists, musicologists and others, undertake
fieldtrips and take care of the documentation aspect. The MPI has also to take
care of the archiving aspect. Archiving here means to take efforts that the
documentation material will be stored and accessible forever. Given
continuously decreasing lifetimes of our storage media and our encoding
standards, this seems to be an unsolvable problem. Indeed we cannot give
guarantees, but we can optimize the probability that data will survive. It as
understood that survival of data is primarily a matter of societal acceptance
influenced by a number of factors: (1) Political attitude that can
hardly be influenced by the archivists, so the only help is that we spread our
material around the world. To be able to do this we need powerful networks and
data-GRID-like techniques. (2) Attractiveness of the content after years
that can be increased by including multimedia recordings of many different
situations. Here it is also relevant that we carry out our recordings with the
highest possible resolution and quality. (3) Involvement of recognised
institutions, however Digital Libraries are new and there are no
established business models. We are not in need of big stone buildings, but of
smart people. Traditional libraries are in general not ready yet to take over
the job. (4) Cost efficiency of operation that is largely hampered by
the development dynamics of (storage) technology. The only viable, although
costly solution is to continuously and automatically migrate to new technology.
Cultural Heritage material can survive if it is seen as an integral part of the
general data that has to be preserved in the same way by automatic procedures.
(5) Quality of data management and technology used, which basically is
defined by the degree of accessibility and interpretability and the state and
dynamics of maintenance. We prefer the “transform-now” method, i.e. all data checked-into
the DOBES archive is translated immediately to make it compliant to a limited
set of standards allowing us, as well as subsequent generations, to treat all
data in the same way. Interpretability of data is possible if documentation of
the encoding standards used is part of the same copying and migration
mechanisms. Only open technology standards should be applied therefore. The
proper organization of the data has to be part of the archiving strategy. In
DOBES it is the IMDI metadata framework that acts as glue for binding different
resources that belong together such as an annotation and a video fragment or a
lexicon and a set of recordings.
In DOBES we understand that only clear workflow
agreements between the participating teams can guarantee that there is a good
basis for an excellent and easy to maintain archive organization. The talk will
explain which choices were made within the DOBES programme with respect to the
5 factors and which standards and tools are currently supported to establish
comparatively easy access and maintenance of the archive.