Social Media and Language Documentation – a MLIP recap

Jonathon Lum recaps the June Linguistics in the Pub (LIP), a monthly informal gathering of linguists in Melbourne to discuss topical areas in our field.

Despite the cold Melbourne weather, June’s LIP attracted a good number of linguists who came together to discuss the topic ‘Social media and language documentation’, led by Peter Schuelke of the University of Hawaii. Under discussion was the potential for social media to play a role in language documentation, maintenance and revitalization. While social media is a largely untapped resource in these fields, it also presents certain logistical and ethical issues, many of which were considered throughout the discussion.

Peter began by speaking about his own experience with Roviana, an Oceanic language of the Solomon Islands. Roviana has 6 000 – 10 000 speakers, many of whom participate in a Facebook community called ‘Roviana Language – Communicate, Learn & Teach’ ( This is an online space for Roviana speakers (including second language speakers) to use Roviana and to ask questions about the language. Similar groups exist for other languages, and are likely becoming more popular as the internet and social media continue to spread around the world.

Participating in these groups can be extremely valuable for linguists documenting the language, since they allow a researcher to get quick responses to linguistic enquiries without having to be with a consultant at a field site. Another advantage is that the researcher can get responses from several native speakers in an efficient manner, rather than just one at a time. This may reveal linguistic variation and disagreements between native speakers in a way that traditional, face-to-face elicitation does not, though the latter is obviously still important and will not be replaced by online elicitation. Social media can also be a source of naturalistic language data, where native speakers post messages for each other on public pages. The use of such data in language documentation and description projects may be highly valuable in that it suffers much less from the observer effect than many other methods of data collection.

The discussion also covered social media’s potential role in language preservation and revitalization. Social media brings together speakers who may now be geographically dispersed, providing a new domain in which use of the language to continue. As for revitalization, social media can bring together people interested in learning their heritage language. An example is Klallam, a Straits Salishan language of British Columbia. A Facebook community, ‘Klallam Word of the Day’ ( attracts considerable interest from second language speakers and helps to bestow prestige on the Klallam language.

There is also great potential for such online communities to expand from Facebook groups to other online spaces such as Youtube and other video (or vine) sharing sites, which would potentially allow researchers to access spoken language data instead of just written texts. The discussion briefly turned to the use of apps and websites in various language documentation projects. In particular, one participant spoke about his involvement with Phonemica (, a platform for crowd-sourced stories from around China told in local languages and dialects. Anybody can log in and upload recordings or work on transcribing or translating stories. It was agreed that this could only be successful for languages that have reasonably large numbers of literate, technologically capable speakers, but that such a platform is invaluable for documenting such languages.

A number of challenges and limitations were discussed. Aside from being of little use in communities with low levels of literacy, or where electricity and the internet are limited (or absent entirely), there are issues to do with what one is documenting in the first place. As we all know, speech and writing are very different things, and social media mostly involves written languages (though as mentioned earlier, there is also potential for more video sharing in the future). And in many languages, some elements of the phonology, such as tone, may not be expressed in the Romanized scripts that tend to be used on social media. One participant in the discussion also raised the point that speakers may be using different varieties of the same language when communicating on social media.

Ethical issues were also raised and given some consideration. One concern was that social media sites such as Facebook may technically own the data that researchers wish to use. Another related to consent, and in particular the point that there is ‘consent’ as a technical/legal notion vs. ‘informed consent’ as required by ethics committees. Many Facebook users post publically, but probably do not anticipate that their messages may be used by linguists for academic purposes. It was pointed out that consent forms can be sent online too, though there is still an issue that without meeting a person face-to-face, we cannot truly know if they are who they say they are, if their data is suitable to use or if they are capable of giving informed consent (e.g. they may be underage). Despite these issues, one participant pointed out that not using any data from social media is also a decision with consequences: it means overlooking a domain of language use that is increasingly important in many languages, and may cause the researcher to miss out on valuable data ‒ this impinges on the quality of a grammatical description.

1 thought on “Social Media and Language Documentation – a MLIP recap”

  1. Interesting discussion. Probs also worth mentioning the great effort by to highlight the use of smaller languages on Twitter, allowing Tweeters to tap into and grow Twittersubspheres in various under-represented languages. Also has a nifty ‘trending’ function for each language and a lot of the headings and menu items have been translated, creating rare monolingual spaces for small languages (see e.g. the Welsh section:

    If you know of a language or Twitter account that should be on, you can add them yourself!

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment