As linguists and anthropologists working on small and often endangered languages, we should consider distributing the materials that we accumulate over time. Obviously institutions such as PARADISEC provide a repository for the data, and this is an important role for the safeguarding of raw materials for long-term forward compatibility. But we also have all the other outputs of the research such as publications, courses or even transcriptions and edited recordings. Some of these obviously appear in various journals, presented to conferences or are published as manuscripts. But, even though they are originating from the same primary evidence or research group, they occur in disparate locations.
So how do we collate all the papers and dissertations, transcriptions and translations and then link these to the data that has been collected? One really good solution is to create a website that is research group specific. So, if you have been given a grant for language documentation or research, then one of goals should be to create a space on the www that collects all inputs and outputs together. Some kind of macro-output.
The stimulus for these thoughts is the newish website created by Daniel Everett and Robert Van Valin here. This is a really good example of what research groups should consider for the presentation of their work on information structure. In its blurb it states, “this site is dedicated to presenting research recently undertaken on the topic of information structure in Amazonian languages”. And it does this very well.
Not only does it provide a .pdf copy of all the papers produced by the group (including unpublished materials such as Honours theses) but it also provides all the semi-sanitized data so that other linguists can do their own analyses or check on any of the paper’s claims. There are photos (my favourite is of the tapir like animal), audio files of various genres and registers and their accompanying transcriptions.
The layout is easy to use, with separate sections for each language that was documented. There is no need for fancy linguistic software, with all the data presented in cross-platform formats such as .wav, .jpeg and .pdf. The transcriptions are presented with a neat 3 line interlinear gloss. The site is definitely user friendly, even for researchers who are interested in the data for non-linguistic purposes.
It has solved the problem of how to present “unpublishable” data that is still important to the linguistic or research community. The data that journals won’t publish is still valuable, and if research groups make an effort to self-publish this in open and accessible formats, then linguistic theory should benefit by being able to access new and interesting materials. Anyway, if you have a passing interest in information structure, Amazonian languages, language documentation or web presentation check out the site. It is an excellent 21st Century contribution to the linguistic field.

