Sustainable data from digital research – presentations available

In December 2011 PARADISEC hosted a conference titled ‘Sustainable data from digital research: Humanities perspectives on digital scholarship’. Presentations from that conference are now available as audio or video downloads from the following repository: http://ses.library.usyd.edu.au/handle/2123/7890. Ten of these presentations also include a peer-reviewed chapter in the conference proceedings.

See below for an RSS feed of all titles and links in the The University of Sydney eScholarship Repository
Continue reading ‘Sustainable data from digital research – presentations available’ »

Get stuffed endangered languages!

I thought a low point of journalistic reporting about endangered languages was reached yesterday when the Australian press ran a story entitled “Cyber zoo to preserve endangered languages” that appeared in the Sydney Morning Herald, the Brisbane Times (and apparently also the Brisbane Sunday Mail, hat tip Felicity Meakins), the Melbourne Age, the Canberra Times and a string of rural papers, including the Wyndham Weekly, the Forbes Advocate and the Whyalla News.

Well it seems I was wrong — a new low was surely reached by the Spanish daily Publico on Saturday last week with its publication of a feature article on the ‘technology saves dying languages’ story (the one that I blogged about yesterday), highlighting the work of David Harrison and his colleagues at the Living Tongues Institute. In the Publico article the researchers at Living Tongues are described as “los principales taxidermistas de idiomas” which I translate as: the main taxidermists of the (spoken) languages! (Yep, I didn’t quite believe it too, so I checked my translation with a Colombian friend today, and it seems to be more-or-less correct).

So, Linguists = Taxidermists — it reminds me of the comment I heard from Michael Silverstein years ago about endangered languages and “tongues in aspic”, But it seems to me that we may well bring at least some of this on ourselves by our own rhetoric. I don’t know what the press release contained that Harrison and National Geographic released which led to this article, but I do know we bandy about terms like preservation/conservation/saving more than we should. After all, the responsible committee of the Linguistic Society of America is CELP, the Committee on Endangered Languages and their Preservation — come on LSA, do we really need the “preservation” bit? (Committee on Endangered Languages has a nice ring to it.) And one of the key journals in the field is Language Documentation and Conservation — jams and conserves to go with the tongues in aspic?

At SOAS many years ago we started using less loaded terms like “language support” (eg. one of our MA pathways is called “Language Support and Revitalisation”) to cover the theoretical and applied work that we do with language communities.

Maybe a bit of verbal hygiene on the part of linguists might reduce the chances of descriptions like “zoo” and “taxidermists” appearing in press coverage? It’s worth a try.


Note: I couldn’t resist the title — twice a day on my bus to and from work I pass a taxidermy shop in Islington, London, called Get Stuffed.

Endangered languages, technology and social media (again)

There has been a little flurry of media stories about endangered languages in the last couple of days with titles like “Digital tools ‘to save languages’” on the BBC News website and “Cyber zoo to preserve endangered languages” in the Sydney Morning Herald (readers who are on Facebook can find a full listing on David Harrison’s home page). The stories were all triggered by publicity from a session at the American Association for the Advancement of Science in Toronto called “Endangered and Minority Languages Crossing the Digital Divide” co-organized by David Harrison and Claire Bowern (see Mark Liberman’s Language Log post for a report). The abstract for the session says:

“Speakers of endangered languages are leveraging new technologies to sustain and revitalize their mother tongues. The panel explores new uses of new digital tools and the practices and ideologies that underlie these innovations. What new possibilities are gained through social networking, video streaming, twitter, software interfaces, smartphones, machine translation, and digital talking dictionaries?”

It’s good that the mainstream media is focussing attention on endangered languages again, though as usual they find themselves falling back on the old tropes of “technology saves dying tongues” (surely the SMH has to win the booby prize with its use of the word “zoo” in this context!). I suppose I would be told it’s sour grapes if I were to point out that for over three and a half years already some of us have been writing about and making talking dictionaries on mobile phones (see James McElvenny’s 2008 blog post and the Project for Free ELectronic Dictionaries), and observing and participating in the use by minority language speakers of social media like Facebook and Twitter, but it’s interesting that it takes a news story out of North America National Geographic to get some publicity for these topics.

Oh well, at least it’s in the news for a day or two.

International Mother Language Day 21 February 2012

Large or small, Indo-European or Inuit, endangered or killer, let’s celebrate our mother tongues on UNESCO‘s International Mother Language Day!

We don’t all die for language rights, like the Bangla-speaking students of the University of Dhaka who were killed on 21 February 1952, protesting the then Government of Pakistan’s decision to promote Urdu as the sole national language. But we CAN support this year’s theme which is yes! “Mother tongue instruction and inclusive education”.

UNESO IMLD 2012 poster

Hopes and dreams

On Thursday I had an interesting time in a sleek-looking conference room at Parliament House with the House of Representatives Inquiry into language learning in Indigenous communities. The terms of inquiry cover learning English and learning Indigenous languages. Lots of people have put lots of time and thought into their submissions and appearances (available online). They are a fascinating snapshot of current concerns, hopes and dreams. (A couple contain not-so-subtle touting – gimme a gazillion and I’ll solve literacy/attendance/savethelanguage, but they’re the exception).

So I was answering questions about my submission [.pdf] on language learning in Indigenous communities. Here goes with points that I wanted to make, and then what I remember of questions asked by the Committee:
Continue reading ‘Hopes and dreams’ »

Scam alert or how to make a lot of money really quickly

Felicity Meakins writes…

Just recently I was on Amazon, when I came across two potentially interesting books:

At first I berated myself for having never noticed these books before, let alone the authors. Surely these were important volumes that I should have referenced! However a little further investigation revealed a scam that grew bigger (and actually more impressive) as I dug deeper.

I first became suspicious when I recognised some of the wording of the abstract of the first book. Sure enough, the entire abstract was a word-for-word copy of the Wikipedia entry on mixed languages. A loud excited outburst from me drew Myf Turpin into the fray. We had a look at the Alphascript publishing website only to find that ALL of their books were edited by Frederic P. Miller, Agnes F. Vandome and John McBrewster, with topics ranging from Japanese mythology and Franco-Belgium comics to cloud seeding and swine flu! And when I say “ALL of their books”, I mean all 1006 books. Who were these prolific authors?!

When we googled their names, we found a number of scam alerts, so we are certainly not the first to notice them. Unfortunately the University of Queensland library was drawn in for five books’ worth on topics including abalone and Mayan civilisations. Indeed, as Alphascript publishing proudly announce on their webpage, most of the major book distributors, including Amazon, list their books.

One can’t help being secretly impressed with the size of the scam. Most of the books are sold for AU$40.00. UQ Library would have spent around AU$200 on their books, and there is a good chance too that many other university libraries did the same before realising it was all a scam. In a single year, Frederic P. Miller, Agnes F. Vandome and John McBrewster probably had enough in the bank to buy an small island and disappear.

Aside from being impressed (or gobsmacked), it is probably worth checking your university library and alerting them to the scam, and letting other prospective buyers know if you come across their books on book seller pages.

Yan-nhaŋu in the National Year of Reading

What a good decision in today’s Australia Day honours to make Laurie Baymarrwangga Senior Australian of the Year 2012! Read Claire Bowern’s post for an appreciation of her and her work documenting the Yan-nhaŋu language and getting it written down. She sounds a delightful person.

2012 is also National Year of Reading. Everyone with a reading-scheme in their revolver will be lobbying the government for funds to smelt and fire their silver bullets. Will the glitter of silver blind officials to the evidence as to whether they can hit the target?

How about for a change we read Yan-nhaŋu, Warlpiri, Enindilyakwa, Arabic, Vietnamese…? And for an even greater change, fund the production of reading material and decent language enrichment programs in these languages? Which brings me to a quibble about the description of Ms Baymarrwangga’s achievements:

Speaking no English, with no access to funding, resources or expertise, she initiated the Yan-nhangu dictionary project. Her cultural maintenance projects include the Crocodile Islands Rangers, a junior rangers group and an online Yan-nhangu dictionary for school children.

‘initiate’ is a slippery word, which then slithered into the ABC report as’establishment’.

Another is her establishment of the Yan-nhangu dictionary project, without any funding, resources, expertise or the ability to speak English.

This is a dangerous inaccuracy. Others were involved in the Yan-nhaŋu dictionary work who had access to resources. Ignoring their contribution lets governments off the hook. They want us to believe that love is all you need to maintain a language and create an online dictionary for it. Not schools, not interpreters or translators, not curricula or interesting stuff to read, not web-hosting or software, not linguists or programmers, nothing that needs paying for. Certainly nothing that would cost as much as some of the silver bullet reading-schemes.

Making old dictionaries new again

Today’s post is something of a recipe for making old dictionaries new again. I’ll explain how a 35 year old old, single-copy typewritten dictionary is living a new life as a digital database.

The language of this dictionary is Kagate – A Tibeto-Burman language of the Central Bodic branch, spoken in Nepal. I met some speakers of this language a number of years ago, as I’m working on a dialect of Yolmo, which is closely related. There was some documentation of Kagate in the mid-1970s although most of the material output was liturgical instead of linguistic.

As well as the two publications on Kagate mentioned on the Ethnologue site Monika Höhlig and Anna Maria Hari also created a typewritten Kagate-Nepali-English-German dictionary. A copy of this dictionary has remained with their primary consultant, and although it is well looked after and still useable it is also the only copy they have access to. It is also only in Latin script instead of the Devanagari script they have developed for their language.

On a previous visit the Kagate speakers were kind enough to allow me and my colleague Amos Teo to scan the pages of the dictionary. At this point we also made them another paper copy of the dictionary, but obviously this is an unsustainable process in the long term. As you can see, the dictionary is already becoming discoloured and faded:


Amos took the scans and used the optical character recognition (OCR) software that comes with Adobe Acrobate 9. Even with such faded font the OCR was effective at recognising the characters. As is to be expected with this kind of process though there was still a fair bit of cleaning up to do at this point. There were some alignment issues and some irregular characters. Also, some entries would copy strangely, with a row of 5-7 lexical items and then the corresponding definitions all in the lone below.

From here the data needed to be massaged so that the appropriate headers were present for Toolbox to read. With the data that we had we needed, at a minimum, to create these headers:

\lx – the Kagate word
\ps – part of speech
\de – an English definition
\dn – a Nepali definition
\xv – an example sentence

Using the find and replace function in an .RTF file Amos was able to create these using the formatting of the original document to his benefit. For example, all of the Nepali definitions start with Np: so we replaced “Np:” with “\gn.” Also all of the colons are at the start of the English definition, so Amos just selected “find : ” and “replace \de.” Of course Amos careful to do this in a set order – doing these two the other way around would have lead to more confusion. Of course, using Regular Expressions is a more efficient way of doing this task – but even if you don’t know how to use RegEx (yet) it won’t stop you from doing this kind of work.

Once the file was made ready to open in Toolbox it still required a little bit of cleaning up. There were a few instances where the letter ‘l’ had been read as the number  ’1′ and some reduplicated entries – but going through each entry and cleaning up these kinds of problems is still much more efficient than retyping out the whole thing again.

The great thing about now having a database to work with instead of a photocopy is that it was the work of an hour to create this:

It’s still exactly the same data as above – but it is much easier to manipulate into different forms. For example I could have just created a list of nouns, or only included the Nepali definitions. This database is also the start of a project to create a new dictionary. While the owner of this dictionary is proud of it, there are many limitations. The first is that it is all written in Latin script, and there is now a fully functional Devanagari script for Kagate, as well of course for Nepali. There are also few example sentences, and some items are missing – such as the number eleven. But of course the most pressing issue with the current dictionary is that there is only one copy. By working in a database we’ll be able to make as many copies as we like at the end, and use the information in other ways too. But that’s all a story for another post.

Buttering parsnips in the Year of the Dragon

Three things to think about/do..

1. Creeping towards constitutional recognition
Section 127A Recognition of languages
The national language of the Commonwealth of Australia is English.

The Aboriginal and Torres Strait Islander languages are the original Australian languages, a part of our national heritage

This is what was proposed in a report on recognising Aboriginal and Torres Strait Islander peoples in the Constitution (You Me Unity. The report authors seem to think that many people will vote for this because they are worried about the loss of Indigenus languages. The national language bit is supposed to soften the doubters into accepting Indigenous languages.

And as well, the report authors want to add:

Respecting the continuing cultures, languages and heritage of Aboriginal and Torres Strait Islander peoples;

Q. What is respect? A. Respect = Fine Words

Evidence from the report: “However, a separate languages provision would provide an important declaratory statement in relation to the importance of Aboriginal and Torres Strait Islander languages. The Panel understands that a declaratory provision would be ‘technically and legally sound’, and would not give rise to implied rights or obligations that could lead to unintended consequences.”

Q. What are unintended consequences? A. = making Governments pay for decent education, translators, interpreters etc

Evidence from the report: “In relation to the second sentence of the first paragraph of the proposed ‘section 127B’, consultations with lawyers and State government officials indicated that an ‘opportunity’ to learn, speak and write English could give rise to legal proceedings challenging the adequacy of literacy learning. Similarly, the last paragraph in the proposal about recognising a ‘freedom’ to speak, maintain and transmit languages of choice could lead to argument about the right to deal with government in languages other than English. Such expressions would raise potentially contentious issues for all levels of government. The Panel has concluded that the potential unpredictable legal risks associated with these two sentences are such that they would not be appropriate for inclusion as part of a proposed constitutional amendment.”

Intended consequence: the language parsnips are not going to get buttered.

As a side-point, information distributed by the YouMeUnity mob [thanks Bruce!], include YouTube audios of a whole lot of translations into Indigenous languages and creoles of information attributed to Alison Page, a Panel Member, but read by language speakers:

“15 Aboriginal and Torres Strait Islander languages, namely Guringdji , Murrinh-Patha, Anindiyakwa , Arrernte, Kimberley Kriol, Pitjantjatjara, Wik Mungan , TSI Kriol, Warramangu , Walpirri , Yolngu, Kriol, Tiwi, Alywarra and Kunwinjku”.

The awful spellings of names of Indigenous languages in the report shows how little butter the parsnips are getting.

2. New resources
- From Claire Bowern
Claire has posted a call for material for the Australian part of ‘ElCat’, a new catalogue of endangered languages that will be launched (late February). She’s calling for links to sites about language programs [photos, videos, links to you-tube channels too!], “or if you’d like to include something about your language and what it means to you”. Hop over to Anggarrgoon to read the call and add your bit.

2. What I wish I could hop over to

- From Candide Simard
7th European Australianists workshop 2012
3-4 April 2012
School of Oriental and African Studies (SOAS), London

The European Australianists are happy to announce their seventh workshop to be held at SOAS, University of London, on 3-4 April 2012. The purpose of the workshop is to provide a venue for the presentation and discussion on current research on Australian languages. As in previous workshops a theme is suggested: “Contact phenomena in Australian languages”. However, participants are free to present papers not related to this theme, we welcome contributions relating to any aspect of Australian languages, from any perspective.

Where are they now?

Over at the Hans Rausing Endangered Languages Project we have started a new series of web posts called ELAP in Focus where we present stories about our former MA and PhD students, and the interesting lives they are leading since studying at SOAS.

The first story about a former MA student is by Takashi Nakagawa who has been involved with community radio activism and development of broadcasting in local languages, especially concerning natural disasters. Since writing his post Takashi has been accepted with scholarship to undertake the PhD at Nanyang Technological University in Singapore, which is developing a new specialisation in documentation of endangered languages.

The first story about a former PhD student is by Stuart McGill who, after two years as a post-doctoral researcher, is now working in information technology and doing linguistics in his spare time.

We plan to publish new stories around once or twice a month in future so readers may wish to check the website regularly for updates.

Update 28 January 2011 — two new stories have now been added: former MA student Sim Tze Wei, who is an activist for Hokkien, and former PhD student Pete Budd who is working in English as a Foreign Language.

More stories will appear in coming weeks.