Language documentation index

The map below is built on information produced by a group of linguists working in Vanuatu. It is a sample documentation index that provides a visualisation of what is known about each language. Note that this is not a language vitality index of the kind outlined in Harmon and Loh (2010). Leaving aside thorny questions of what constitutes a language and language name (see Good and Cysouw 2013) and choosing to use a given set of language names (that is not limited to ISO-639-3), this exercise produced a map of the languages of Vanuatu, with each language assigned an index number on a 21 point scale assigning 1-5 points for each of four categories: Grammar; Lexicon; Texts; Media corpus. The icons are colour-coded (white = 0; red = 1-5; purple = 6-10; yellow = 11-15; green =16-20). 54 languages in this list have a zero rating, indicating that virtually nothing is known about those languages.

Each language icon also provides links to further information about each language, and, in some cases, links to media so you can hear samples of the language being spoken (see the Araki or South Efate links for example). [Let me know if you can’t see the icons in the map below and I will give the system a kick, which it seems to need from time to time].

The information can be downloaded for use in google earth here (you need to change the extension to “.kmz”). As of February 2015 Google no longer supports kml in googlemaps so the map below no longer works. Sorry!

View Larger Map
This map draws from a spreadsheet (using Spreadsheet Mapper v3.1) and is updated periodically.

Building an index of what is known about over 7,000 world languages is a big task and one that would benefit from automated processes, ideally drawing on OLAC’s existing aggregated information about each language (as discussed on this blog here). There are existing sites that give such an index for particular regions, and an early index described by by Wurm (1963:137) set out a scale similar to the one discussed here, but it was only applied to a few Australian languages. McConvell and Thieberger (2001) implemented this index for Australian languages and subsequently Austlang assigns up to 16 points for each Australian language, depending on the amount and quality of each of four features: Word list; Text Collection; Grammar Audio-visual. Another example is Lynch & Crowley (2001, pp. 17-19) who provide a five-star system for documentation of languages of Vanuatu.


  • Good, Jeff & Cysouw, Michael. 2013. Languoid, Doculect, and Glossonym: Formalizing the Notion ‘Language’. LD&C 7.
  • Harmon, David and Jonathan Loh. 2010. The index of linguistic diversity: A new quantitative measure of trends in the status of the world’s languages. LD&C 4. 97-151.
  • Lynch, John & Terry Crowley. 2001. Languages of Vanuatu : a new survey and bibliography. Canberra : Pacific Linguistics.
  • McConvell, Patrick and Nicholas Thieberger. 2001. State of Indigenous languages in Australia – 2001. Australia State of the Environment Second Technical Paper Series (Natural and Cultural Heritage), Department of the Environment and Heritage, Canberra. (
  • Wurm, Stephen A 1963, Aboriginal languages, pp.125-165 in Australian Aboriginal Studies, W.E.H. Stanner and H. Sheils, eds. Australian Aboriginal Studies. A Symposium of Papers presented at the 1961 Research Conference. OUP: Melbourne.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment