There’s data and there’s research

[Joint post by Peter K. Austin, Endangered Languages Academic Programme, SOAS and Jane Simpson]
There has been a flare up on the LINGTYP list again (cf. PKA’s post last week) – this time from Gideon Goldenberg who suggested there is a distinction between research (good) and data collection (bad). He was writing about typological databases but it looks like the same opinion applies to documentary linguistic corpora – here’s what he said:

“The clear and sharp distinction between research and materials is essential. The latter will be needed to illustrate scholarly discussion, but data themselves are not research even though they require thoughtful preparation. When electronic means became available there was the hope that from then on the mere accumulation of data would no longer be able to give credit of scientific work; it unfortunately turns to go the other way about. To share databases with others is OK and can be beneficial, but do not mistake it for research.”

Ouch! All those digitised sound and video recordings with time-aligned multi-tier annotated corpora with linked metadata that we’ve been creating are fine and dandy, but it ain’t research folks!


The real worry comes in the response from Nigel Vincent. He recently participated in a review of Dutch humanities research and it became clear to him that in the Netherlands basically the same position is taken by the NWO research council – only paper publications count. The UK Research Assessment Exercise (RAE) that will be on in earnest next year does recognise databases and electronic corpora as research outputs but exactly how that will play out when panels get down to bean counting for the RAE remains unclear.
Why should we worry? Because these reviews determine what money goes to research. If they decide our laboriously collected and analysed materials are not research, then we will not be funded to collect and prepare more of them, i.e. you’ll get funded to do an analysis of some linguistic property of UgaBuga which will disappear with your theoretical framework, but you won’t get funded to produce the material on UgaBuga which will allow future speakers and future linguists to hear the language and watch speech events with all the associated information about them.
As for how the Research Quality Framework newly introduced in Australia by DEST will treat what Goldberg calls “mere accumulation of data”, well, linguists have made representations for acknowledging as research output documented and catalogued collection of field recordings of endangered speech varieties archived in public archives. But have they been accepted? Perhaps readers gearing up for the RQF would care to comment?
It’s all a bit of a worry for those of us documenting endangered languages who need to get recognition and support from our academic colleagues, institutions and funding agencies. Especially for young researchers.

5 thoughts on “There’s data and there’s research”

  1. It *is* clearly research from the point of view of human subjects definitions. That’s the first question they ask. If it’s not research, then it doesn’t have to be evaluated. The folklore people, for example, have successfully argued that what they do isn’t research, the research is what they do on the materials once they’ve collected them. Some ethnographers argue the same. I disagree vehemently, as you can imagine. I’m trying to imagine the parallels in medical research – sorry, you can’t do that thing with the petri dishes and the mice, because that’s just data collection, not research?

  2. Claire – if I understand your comment correctly, I think you are talking about research methods not outputs. Scientists don’t produce collections of mice or petri dishes, rather they use them to make observations (measurements) which then form the basis for analysis and theory. The same parallel would hold in linguistics — data collected from speakers then used to publish the ‘real’ research. The issue we are discussing here is the status of the ‘observational data collection’ part of our work – does it count as ‘research’ too?

  3. But even in hard sciences the distinction between methods and outputs isn’t always that clear. Sequencing the human genome, for example: that’s “observation” in the sense you’re talking about, and it’s also the basis of analysis and theory. Or the case where someone comes up with a way to splice DNA – that’s a “method” or technique, but it’s still publishable and fundable because it allows all sorts of other work to get done.
    Likewise, presumably these text collections and other observational data aren’t ends in themselves, and we spend all that time creating Elan text tiers and so on in order to facilitate analysis, AND presumably when we do this we have something in mind that we want to study.

  4. I agree – you cannot draw the line between data collection and research that easily. It doesn’t work that way.
    For example – I’ve spent the last oh, five years, working on collecting data (and I’m lucky in that I don’t have to go into the field to get this! – I just get it off those who do).
    If I didn’t have that data, I wouldn’t be able to do the research. It’s that simple: you can’t have one without the other.
    –Simon

  5. Simon,
    Yes, but there’s a push in lingustics for our data collection to be (much) greater in volume, for our data management to be much greater in sophistication and our return to the communities we work with to be equally volumnious and sophisticated. All of this is GREAT and there is now funding for these activities (yay ELDP and VW/DOBES), but they take a lot of time (in theory ok b/c of funding) which means less time for *research* output, where research is measured in traditional terms.
    I like Michael Walsh’s article from circa 2000 in ARAL (can’t find a reference online cos can’t get access to journal indexes from home), in which he argues for a paradigm shift in the way the work academic linguists do is ‘measured.’

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment