More on the theme of refreshing existing dictionaries (discussed a few times on this blog).

The Kwara’ae language of Malaita, Solomon Islands, has had various dictionaries produced over time, some handwritten (this is an image of one of these in PARADISEC), and some created using computers. In running workshops with the Kulu Languages Institute over the past few years, I met Jeremiah Dauara who is a Kwara’ae speaker and went on to help him in using Elan and FLEx for his work of recording, transcribing, and analysing his language.
Jeremiah had a printed copy of a recent dictionary produced by Ben Burt that he wants to revise and update. We located a pdf file, but it would not produce useful textual output via optical character recognition (OCR). So Jeremiah got in touch with Ben Burt and was sent a MS Word version, that looks like this:

As you can see, the structure of this files is not too bad, but definitions may or may not wrap over a line, with a carriage return followed by a tab inside the definition. Sub-entries are consistently marked by two spaces at the start of a line. Scientific names are given in italics in parentheses e.g. (Brugiera). I was able to convert this document, using coding routines (in MS Word and then in BBEdit) to convert these elements within the dictionary to a marked-up document in the format needed to import to FLEx. I could then send a FLEx backup file to Jeremiah to load onto his computer in Honiara. Jeremiah had been willing to retype the dictionary entirely so it was some relief for him to avoid doing that by having the converted digital file. Why do this conversion? FLEx is a database designed to make dictionaries, which can then be printed, or made into phone apps. Jeremiah is now editing the dictionary and preparing to produce the phone app as the first output of the revised dictionary.
This all relied on Jeremiah having contact with the dictionary’s creator who graciously sent the file with permission for its re-use. But what if the creator was no longer contactable, and the dictionary only existed as a pdf file? It would have taken some time to work on the scanned document to re-create the editable format it was originally in.
I hope we can encourge more dictionary creators to archive their primary files so that the process of conversion from older frmats to the ones needed for current tools can be made easier. PARADISEC holds a number of dictionary file, many of them in the underlying format as well as the pdf of the finished work (see examples here: http://catalog.paradisec.org.au/items/advanced_search?data_category_ids=5&description=dictionary). If you are creating a dictionary please consider depositing the FLEx or Toolbox files with a digital language archive to help with projects like Jeremiah’s in future.