Citation, citation

Continuum International Publishing Group has just sent me a complimentary copy of Jim Miller’s new textbook A Critical Introduction to Syntax which includes a chapter on “Noun Phrases and Non-configurationality” (pages 61-98). Since this is a topic I have published on (Austin and Bresnan 1996, Austin 2001a, 2001b) I figured I’d have a quick look at this chapter first. Interestingly, on page 78 I found example sentence (27) which is “from the Native Australian language Jiwarli” for which Miller (2011:77) quotes as the source “Pensalfini (2004, p. 364)”:

(27) Kutharra-rru ngunha ngurnti-nha jiluru
two-now that lie-pres egg (Nom)
‘Now those two eggs are lying there.’

As readers who have done research on Australian Aboriginal languages will probably recognize, Pensalfini cannot be the original source for the example since only Alan Dench and I ever recorded data on Jiwarli from its last speaker, the late Jack Butler, and only I have published primary material on the language. Pensalfini (2004) indeed gives Austin (1993) as his source, but Miller makes no mention of this (my article was actually published as Austin 2001a, three years before Pensalfini’s article appeared1). This seems to be what we could call ‘the example sentence variant’ of the “violation of citation etiquette” described so eloquently by Pullum 1988.

However, the story has a further twist to it. The glossing of the Jiwarli example, faithfully copied by Miller, is not the glossing given in Austin (1993, 2001a) , but was changed by Pensalfini. Here is the example in its original form:

(27) Kutharra-rru ngunha ngurnti-nha jiluru
two.nom-now that.nom lie-pres egg.nom
‘Now those two eggs are lying there.’ [T51s9]

What I was trying to show in my glossing is that each nominal element in Jiwarli can be understood as being nominative case-marked and that there is no evidence for noun phrases. Hence, each of ‘two’, ‘that’ and ‘egg’ is marked for case, something that Pensalfini’s reglossing does not make clear. Rather more egregious however is that a whole category of information, the “[T51s9]” following the English free translation, has been silently eliminated. Let me explain what this is.

In 1981 I returned to Australia from a short-term teaching post at Harvard University to take up a position at La Trobe University Linguistics Department, and recommenced my research on Western Australian languages, including Jiwarli, after a three year break in the United States. I started the “Gascoyne-Ashburton Languages Project” (GALP) at La Trobe and as part of that established a basic principle of providing metadata giving the source of all the sentence examples (and lexical items) collected in the project. In doing so I was influenced by the same practice I had seen in Jane Simpson’s PhD research (I had been in contact with Jane in Boston in 1980-81); as Simpson (1983:4) says2 : “I have tried to indicate the source of each example sentence where I know it. If the sentence is made up, I have indicated this, unless the sentence is elementary.” For GALP I developed a metadata source indication system that distinguished two categories (usually indicated in publications as material in square brackets following each English free translation):

  1. elicited examples whose metadata reference has the form [AABBCCNDDpEEsFF] where AA is an abbreviation representing the language, BB is an abbreviation representing the speaker, CC is an abbreviation representing the recorder, DD is an integer for a fieldnote book, EE is an integer for the page of the notebook, and FF is an integer for the sequential order of the sentence on the notebook page. Thus [TRCYTKN01p79s07] is the seventh sentence on page 79 of notebook 1 collected by Terry Klokeid from Chubby Yowadji in Tharrkari.
  2. text examples whose metadata reference has the form [AABBCCTDDsEE] where AA is an abbreviation representing the language, BB is an abbreviation representing the speaker, CC is an abbreviation representing the recorder, DD is an integer for a text in a collection, EE is an integer for the sequential order of the sentence within the text. Thus [WRAEOGT03s01] is the first sentence in text 3 collected by Geoffrey O’Grady from Alec Eagles speaking Warriyangka.

For Jiwarli text examples, a reference like [JIJBPAT51s9] could be reduced to [T51s9] since the texts were just those recorded by myself from the late Jack Butler (and published in Austin 1997). I introduced this system to keep track of the contributions of individual speakers and recorders, the genre of examples, and to ensure that it was always possible to go back to the original fieldnotes and text collections to check materials, if necessary. I have maintained this system in my Toolbox data sets and in publications since.

Interestingly, a feature of Miller’s A Critical Introduction to Syntax is that it makes use of “real language examples” taken from spoken and written English corpora. Each such example has relevant source metadata clearly indicated (thus page 39 example (79) is from “Miller-Brown corpus, conversation 58″, and page 133 example (25) is from “The Herald, 17 October 2009, p. 4″) yet no example sentence in a language other than English gets a metadata source reference, not even Russian which is extensively exemplified. Surely what’s good for the (English) goose should be good for the gander?

In their seminal paper on data portability and digital language documentation, Bird and Simons (2003) identify citation as one of the major problems currently faced by those who wish to document and describe languages. They state that3 : “[w]e value the ability of users of a resource to give credit to its creators, as well as to learn the provenance of the sources on which it is based. Thus the best practice is one that makes it easy for … language documentation and description to be cited.” Having developed such a system for my own research some thirty years ago, I find it disappointing that Miller, and Pensalfini before him, simply left out the crucial identifying citation metadata.

Let’s hope that practices in linguistic research improve in this area so that the hard work of language speakers and language documenters can be properly recognised, especially as material is passed around, resulting in second and third hand publications4.

References

Austin, Peter. 1993. Word order in a free word order language: the case of Jiwarli. La Trobe University Manuscript.
Austin, Peter. 1997. Texts in the Mantharta Languages, Western Australia. Tokyo: ILCAA, Tokyo University of Foreign Studies.
Austin, Peter K. 2001a. Word order in a free word order language: the case of Jiwarli. In Jane Simpson, David Nash, Mary Laughren, Peter Austin, Barry Alpher, (eds.) Forty years on: Ken Hale and Australian languages, 205-323. Canberra: Pacific Linguistics.
Austin, Peter K. 2001b. Zero arguments in Jiwarli, Western Australia. Australian Journal of Linguistics 21(1): 83-98.
Austin, Peter and Joan Bresnan. 1996. Non-configurationality in Australian Aboriginal languages. Natural Language and Linguistic Theory 14: 215-268.
Bird, Steven and Gary Simons. 2003. Seven dimensions of portability. Language 79(3): 557-582.
Bow, Cathy, Biaden Hughes and Steven Bird. 2003. Towards a general model of interlinear text. E-MELD workshop paper. Available here
Pensalfini, Robert. 2004. Towards a typology of configurationality. Natural Language and Linguistic Theory 22(2): 359-408.
Pullum, Geoffrey. 1988. Citation etiquette beyond Thunderdome. Natural Language and Linguistic Theory 6(4): 579-588.
Simpson, Jane. 1983. Aspects of Warlpiri Morphology and Syntax. PhD dissertation, MIT.
Simpson, Jane. 1991. Warlpiri Morpho-Syntax. Amsterdam: Kluwer Academic Publishers.


Notes

  1. The same example also appears in Austin 2001b, p. 85, example 3, as well as in Austin and Bresnan 1996, p. 246 example 42
  2. Simpson 1991 is the revised published version which continues the same practice
  3. see also Bow, Hughes and Bird 2003 who propose a four-level model of interlinear glossed text that includes a text level which is “the complete unit of data under examination which functions as a unit in its entirety … The text level includes metadata”.
  4. The example sentence quoted above gets particularly woeful treatment at the hands of ODIN, the Online Database of Interlinear text, which is “a repository of interlinear glossed text extracted mainly from scholarly linguistic papers”. ODIN identifies the language of this example as Mangala, spoken a thousand kilometers north of Jiwarli on the coast of Western Australia, because of misidentification of Jiwarli with Juwarliny, a dialect of Mangala!

2 Comments

  1. Claire says:

    There’s a double misidentification! Juwarliny is actually a dialect of Walmajarri (it’s also called Jiwarliny).

  2. Jane Simpson says:

    When I’ve worked on languages, I’ve felt the same anxious compulsion as scholars of classical languages (or middle English, or indeed Otto Jespersen) have had in documenting their sources. Without native speaker intuition, it’s the only way to avoid some pretty hideous mistakes. (And even so they creep in..)

Leave a Reply