Barking up the same tree: the need for digital archives

The surprise for me from the Sustainable Data from Digital Fieldwork workshop (aka Suzzy Data..) was how much plant taxonomists and field linguists have in common. And how much we need to work together with librarians and archivists. We both have to look after records – the decaying recordings of the languages, and the dried specimens in the herbariums. We both work with the living communities, the trees that will get logged and the communities that live with the trees, and the families and children who will switch to speaking another language.


Botanists and linguists both have overlapping communities of users and communities of creators and managers. Local people create linguistic data in that they are speakers. Local people manage trees and plants in their areas, and are creators of information through their traditional ecological knowledge. Linguists and botanists create records and analyses of data. And all of us are users – since we all want access to each others’ information.
We’re both interested in sustainability and collaboration. I was struck by Murray Henwood’s description of how Dr Carrot visits a herbarium, is immediately taken to the shelves of long flat boxes to admire the specimens belonging to the carrot family, whereupon (s)he writes comments on the information sheets associated with those specimens. That’s a good way of building knowledge and having quality control.
Collaboration is also involved in data collecting – Barry Conn talked about how his and Damas Kipiro’s project on documenting trees of Papua New Guinea is based on collaboration, and the need for:
• an outcome for the people of Papua New Guinea who need to make environmental decisions such as where and how to log trees.
• work to be done by people in Papua New Guinea. You shouldn’t need a PhD in botany to be able to identify a commercially valuable species and present a case for how it should be managed. But a lack of such skills is hampering local people’s abilities to make environmental decisions. And what do you do if you have better access to a bush knife than to a microscope? There are obvious parallels for linguists.
Barry also talked about controlled vocabulary – something which botanists have long realised is essential for taxonomy, and which is a mark of their collaboration. It is also important for building software for learner taxonomists to enter data. Ronald Schroeter and Nick Thieberger raised this later when talking about linguistic software. We really need templates with controlled tier labels in interlinearising software like ELAN (The CHILDES program CLAN already has recommended tier labels) so that material can be translated into other software programmes without too much fuss (and see Bruce’s post on this here). We also need more agreement on glossing conventions (E-MELD GOLD and the Leipzig glossing conventions are a start). Botanists are way way ahead of linguists on this one.
So what is sustainable digital data? It’s having a good way to keep the records of data (Murray Henwood mentioned the loss that befell botany on 1 March 1943 when the Berlin Herbarium and its 4 million specimens were bombed). Digital objects need cataloguing but ideally also have linking from the catalogue entry direct to the digital object. We need to collaborate with librarians and archivists on this, as they have been developing “Digital Assets Management Systems”.
One such system is the open source DSpace, developed at MIT, which provides a kind of permanent URL (‘handle’) for digital material. Interesting pilots of this were shown. Su Hanfling of the University of Sydney showed the pilots of eBot and eFlora. eBot is a “digital repository of botanical objects” – mostly photos and their descriptions – with an illustrated glossary of botanical terms. eFlora is an “electronic compendium of the plants of the Sydney region”. Kim Mackenzie and Murray Garde showed ANU’s Bidwern in which digital photographs, videos and text material related to communities in Western Arnhem Land are stored in DSpace.
Sustainable digital data also requires having good ways of accessing the records of the data and the digital objects themselves. If people (local people and researchers) don’t access the material, then sooner or later someone will decide it’s not worth keeping (Barry Conn’s sad story of how three years of plant records disappeared because of such a decision). All the DSpace projects have interesting user interfaces The Bidwern material will be linked and accessible through an interface involving Google maps as well as a thesaurus based on Murray Garde’s Bininj Gun-wok dictionary. Bidwern and eFlora both have interfaces designed for browsing users. eBot potentially is also an interface for creators. Very soon Dr Carrot will be able to visit the virtual herbariums and look at images of specimens and add notes online.
These DSPace projects are still in prototype stage. A project which is actually up and working is the Dena’ina Qenaga web site, run as a collaboration between Dena’ina people, Alaska Native Language Center, Alaska Native Heritage Center, The LINGUIST List, and the Arctic Region Supercomputing Center. It’s an elegant introduction to aspects of the Dena’ina language and people. But behind it all is an archive “which provides digital access to more than five hundred documents and recordings relating to the Dena’ina language, including nearly everything written in or about Dena’ina language.” Very nice!
The Dena’ina Qenaga project was funded by the National Science Foundation of the USA. The importance of this kind of e-humanities/e-social sciences work is recognised by the US Government – there’s a program funded through the National Endowment for Humanities and the Institute of Museum and Library Services to carry out this kind of e-humanities work (thanks Kimberley!. They “encourage projects that explore new ways to share, examine, and interpret humanities collections in a digital environment and to develop new uses and audiences for existing digital resources”. Lucky Americans!
What Australia needs is long-term funding for digital archives to collaborate with users and creators to make archivally stable digital objects, make them accessible, and preserve them. E-Science, E-Humanities, E-Social Sciences, we’re all after the same thing. Please?

3 thoughts on “Barking up the same tree: the need for digital archives”

  1. Jane is right, we need these sorts of projects and we need to think transnationally about collaborations. DSpace is a good start. There is no reason why digital archives have to be nationally oriented–even if funding bodies like NEH are coming from the US pool of money.
    I have been working with Warumungu people in Tennant Creek and some IT partners (as a U.S. researcher) on creating a digital community archive. My challenge will soon be to make such partnerships to extend the usability to other indigenous communities. I think collaboration amonst anthropologists, linguists, local communities, etc. and IT folks can lead to innovative, culturally appropriate digital projects. But these need some rather large funding usually. Money from NSF and NEH can usually cover these costs, but if those aren’t available, getting more creative about pooling resources across disciplines and projects might be the way to go.

  2. Actually there was some discussion of how to implement community controlled access to material during the workshop – and Kimberly and Chris’s project was raised as a good example of how we might go – simple and manageable at a community level.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment