Language diversity in the city of London is in the news again due to a research project by Ed Manley and James Cheshire of University College London (UCL) on posts on Twitter collected over the summer just ended. To identify the languages in their collection of tweets they used:
“the Chromium Compact Language Detector – a open-source Python library adapted from the Google Chrome algorithm to detect a website’s language – in detecting the predominant language contained within around 3.3 million geolocated tweets, captured in London over the course of this summer”
I have previously blogged about language diversity in London, and minority languages on Twitter, but this new work nicely combines both themes. Unfortunately, it only presents a partial picture of the language diversity of London Twitter users as it “only include[s] people who have a good location (through GPS) and those who are connected to the internet”. Nevertheless, it does show at least 66 languages were used in the data collected by our UCL colleagues. This of course is just the tip of the iceberg of the hundreds of languages spoken but not tweeted in the city.
[Hat tip Mark Liberman at Language Log]