Reviving dictionaries

More on the theme of refreshing existing dictionaries (discussed a few times on this blog).

The Kwara’ae language of Malaita, Solomon Islands, has had various dictionaries produced over time, some handwritten (this is an image of one of these in PARADISEC), and some created using computers. In running workshops with the Kulu Languages Institute over the past few years, I met Jeremiah Dauara who is a Kwara’ae speaker and went on to help him in using Elan and FLEx for his work of recording, transcribing, and analysing his language.

Jeremiah had a printed copy of a recent dictionary produced by Ben Burt that he wants to revise and update. We located a pdf file, but it would not produce useful textual output via optical character recognition (OCR). So Jeremiah got in touch with Ben Burt and was sent a MS Word version, that looks like this:

As you can see, the structure of this files is not too bad, but definitions may or may not wrap over a line, with a carriage return followed by a tab inside the definition. Sub-entries are consistently marked by two spaces at the start of a line. Scientific names are given in italics in parentheses e.g. (Brugiera). I was able to convert this document, using coding routines (in MS Word and then in BBEdit) to convert these elements within the dictionary to a marked-up document in the format needed to import to FLEx. I could then send a FLEx backup file to Jeremiah to load onto his computer in Honiara. Jeremiah had been willing to retype the dictionary entirely so it was some relief for him to avoid doing that by having the converted digital file. Why do this conversion? FLEx is a database designed to make dictionaries, which can then be printed, or made into phone apps. Jeremiah is now editing the dictionary and preparing to produce the phone app as the first output of the revised dictionary.

This all relied on Jeremiah having contact with the dictionary’s creator who graciously sent the file with permission for its re-use. But what if the creator was no longer contactable, and the dictionary only existed as a pdf file? It would have taken some time to work on the scanned document to re-create the editable format it was originally in.

I hope we can encourge more dictionary creators to archive their primary files so that the process of conversion from older frmats to the ones needed for current tools can be made easier. PARADISEC holds a number of dictionary file, many of them in the underlying format as well as the pdf of the finished work (see examples here: If you are creating a dictionary please consider depositing the FLEx or Toolbox files with a digital language archive to help with projects like Jeremiah’s in future.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment