Barking up the same tree: the need for digital archives

The surprise for me from the Sustainable Data from Digital Fieldwork workshop (aka Suzzy Data..) was how much plant taxonomists and field linguists have in common. And how much we need to work together with librarians and archivists. We both have to look after records – the decaying recordings of the languages, and the dried specimens in the herbariums. We both work with the living communities, the trees that will get logged and the communities that live with the trees, and the families and children who will switch to speaking another language.


Botanists and linguists both have overlapping communities of users and communities of creators and managers. Local people create linguistic data in that they are speakers. Local people manage trees and plants in their areas, and are creators of information through their traditional ecological knowledge. Linguists and botanists create records and analyses of data. And all of us are users – since we all want access to each others’ information.
We’re both interested in sustainability and collaboration. I was struck by Murray Henwood’s description of how Dr Carrot visits a herbarium, is immediately taken to the shelves of long flat boxes to admire the specimens belonging to the carrot family, whereupon (s)he writes comments on the information sheets associated with those specimens. That’s a good way of building knowledge and having quality control.
Collaboration is also involved in data collecting – Barry Conn talked about how his and Damas Kipiro’s project on documenting trees of Papua New Guinea is based on collaboration, and the need for:
• an outcome for the people of Papua New Guinea who need to make environmental decisions such as where and how to log trees.
• work to be done by people in Papua New Guinea. You shouldn’t need a PhD in botany to be able to identify a commercially valuable species and present a case for how it should be managed. But a lack of such skills is hampering local people’s abilities to make environmental decisions. And what do you do if you have better access to a bush knife than to a microscope? There are obvious parallels for linguists.
Barry also talked about controlled vocabulary – something which botanists have long realised is essential for taxonomy, and which is a mark of their collaboration. It is also important for building software for learner taxonomists to enter data. Ronald Schroeter and Nick Thieberger raised this later when talking about linguistic software. We really need templates with controlled tier labels in interlinearising software like ELAN (The CHILDES program CLAN already has recommended tier labels) so that material can be translated into other software programmes without too much fuss (and see Bruce’s post on this here). We also need more agreement on glossing conventions (E-MELD GOLD and the Leipzig glossing conventions are a start). Botanists are way way ahead of linguists on this one.
So what is sustainable digital data? It’s having a good way to keep the records of data (Murray Henwood mentioned the loss that befell botany on 1 March 1943 when the Berlin Herbarium and its 4 million specimens were bombed). Digital objects need cataloguing but ideally also have linking from the catalogue entry direct to the digital object. We need to collaborate with librarians and archivists on this, as they have been developing “Digital Assets Management Systems”.
One such system is the open source DSpace, developed at MIT, which provides a kind of permanent URL (‘handle’) for digital material. Interesting pilots of this were shown. Su Hanfling of the University of Sydney showed the pilots of eBot and eFlora. eBot is a “digital repository of botanical objects” – mostly photos and their descriptions – with an illustrated glossary of botanical terms. eFlora is an “electronic compendium of the plants of the Sydney region”. Kim Mackenzie and Murray Garde showed ANU’s Bidwern in which digital photographs, videos and text material related to communities in Western Arnhem Land are stored in DSpace.
Sustainable digital data also requires having good ways of accessing the records of the data and the digital objects themselves. If people (local people and researchers) don’t access the material, then sooner or later someone will decide it’s not worth keeping (Barry Conn’s sad story of how three years of plant records disappeared because of such a decision). All the DSpace projects have interesting user interfaces The Bidwern material will be linked and accessible through an interface involving Google maps as well as a thesaurus based on Murray Garde’s Bininj Gun-wok dictionary. Bidwern and eFlora both have interfaces designed for browsing users. eBot potentially is also an interface for creators. Very soon Dr Carrot will be able to visit the virtual herbariums and look at images of specimens and add notes online.
These DSPace projects are still in prototype stage. A project which is actually up and working is the Dena’ina Qenaga web site, run as a collaboration between Dena’ina people, Alaska Native Language Center, Alaska Native Heritage Center, The LINGUIST List, and the Arctic Region Supercomputing Center. It’s an elegant introduction to aspects of the Dena’ina language and people. But behind it all is an archive “which provides digital access to more than five hundred documents and recordings relating to the Dena’ina language, including nearly everything written in or about Dena’ina language.” Very nice!
The Dena’ina Qenaga project was funded by the National Science Foundation of the USA. The importance of this kind of e-humanities/e-social sciences work is recognised by the US Government – there’s a program funded through the National Endowment for Humanities and the Institute of Museum and Library Services to carry out this kind of e-humanities work (thanks Kimberley!. They “encourage projects that explore new ways to share, examine, and interpret humanities collections in a digital environment and to develop new uses and audiences for existing digital resources”. Lucky Americans!
What Australia needs is long-term funding for digital archives to collaborate with users and creators to make archivally stable digital objects, make them accessible, and preserve them. E-Science, E-Humanities, E-Social Sciences, we’re all after the same thing. Please?

3 Comments

  1. Kimberly Christen says:

    Jane is right, we need these sorts of projects and we need to think transnationally about collaborations. DSpace is a good start. There is no reason why digital archives have to be nationally oriented–even if funding bodies like NEH are coming from the US pool of money.
    I have been working with Warumungu people in Tennant Creek and some IT partners (as a U.S. researcher) on creating a digital community archive. My challenge will soon be to make such partnerships to extend the usability to other indigenous communities. I think collaboration amonst anthropologists, linguists, local communities, etc. and IT folks can lead to innovative, culturally appropriate digital projects. But these need some rather large funding usually. Money from NSF and NEH can usually cover these costs, but if those aren’t available, getting more creative about pooling resources across disciplines and projects might be the way to go.

  2. Jane Simpson says:

    Actually there was some discussion of how to implement community controlled access to material during the workshop – and Kimberly and Chris’s project was raised as a good example of how we might go – simple and manageable at a community level.

  3. Kimberly Christen says:

    For some reason the link didn’t work in that last post. There is more information on our project on my blog under the projects tab: http://www.kimberlychristen.com

Leave a Reply