Seeking your help with tool development

We are in the process of identifying gaps in tools for fieldwork and data analysis that can be filled as part of the Centre of Excellence for the Dynamics of Language. I’d like to ask for your input into the requirements for a metadata entry tool. In part, this analysis asks for your opinions on the value of existing tools (listed below) and their relative strengths and weaknesses, and asks if it may be worth putting effort into developing any of them further, rather than starting from scratch.

The high-level requirement of this tool is to make it easy to describe files created in fieldwork, to be available both off- and on-line and to deliver the description as a text file for upload to an archive. This includes capturing as much metadata from the files themselves; providing controlled vocabularies of terms to select from (preferably via drag-and-drop rather than keyboard entry); allowing the metadata to be exported in a range of formats to suit whichever archive will host the collection; allowing the metadata to be imported to the tool for use by collaborative team members; allowing controlled vocabularies to be amended to suit the local situation. This tool could also allow users to visualise the state of a collection: which media files have been transcribed, which have been interlinearised, have text files been scanned, OCRed …. what other processes have been applied, which have been archived, what the rights are for each file, also allowing the user to specify what these criteria are for their own type of collection.

These are the currently available tools, please let us know of any others (especially those created for different disciplinary fieldwork):
CMDI Maker

You can either add comments below, or else write to me separately (thien [at] with your ideas that can contribute to how we develop this tool.

5 thoughts on “Seeking your help with tool development”

  1. EASE OF USE / USERFRIENDLYNESS would be on top of my agenda.

    Of the other tools, I only know Arbil (and IMDI, predecessor to CMDI I guess) which are both forbidding in that respect.

    Also import facilities (and possibly export) to other tools, and spreadsheets for that matter, as it is what many people in fact end up using.

    Thank you in advance if your group is developing a tool that does all that is stated in the post above.



  2. One thing I have noticed is that the Metadata from files is not always accurate. That is, some people do not set the date on their DAR correctly or they do not set the date on the camera. This also happens with word documents and PDFs which pull metadata from workflows with word processors. So, while I am a fan of reduced workloads in the digital object submission process, I also recognize that any (well a lot anyway) automatically extracted metadata needs to be visually verified, before it is passed on to the archive as accurate. This means that there needs to be some human verifiable nag + automation verification process, otherwise we just get used to things and click through pop-up screens.

  3. I have been thinking about metadata collections tools quite a lot recently in the context of a team-based project that I am leading. I am not familiar with all the current tools. I’ve spent some time with Arbil, and also played with CMDI Maker, and I hope to get time to work with CMDI Maker more than I have to this point. But, for the project, I am working on now (looking at multilingual practices in Cameroon), we are planning to build a new metadata collection tool (that will form part of a larger metadata management system) for a variety of reasons:

    1. I was able to get about 30 free, last generation smartphones from this lab at my own institution: This meant that I suddenly had access to a large number of, in effect, handheld computers that could be distributed across the team.

    2. The project team is intended to be composed of, perhaps, as many as fifteen Cameroonian students, as well as some others, over the course of three years. Metadata management will be a major concern, and we want to develop a tool that captures as much metadata as possible at the point of collection rather than after the recording is made (which, I think, is a more typical workflow). By building a metadata capture app into a smartphone, we hope to facilitate such “real-time” metadata collection both by having a form that needs to be filled out before recording begins and by using all the metadata the smartphone already may “know” (e.g., owner, date, location).

    3. For a project on multilingualism, having good, consistent, updated metadata on speakers is crucial. The main requirements described in the post above are centered on the files that are created, but, for my project, we need a tool that also makes speaker metadata entry as easy as possible. Our hope is to have this take place at the time of session metadata creation, using the camera capability of the smartphone to help us add a picture to the record of a speaker. We can also place into this workflow prompts to help the collectors discuss issues of informed consent with the consultants, as well as issues of data restrictions, etc. (Obviously, a technological solution to ethical concerns has limitations, and this is only one part of a broader ethics training plan.)

    Obviously, a general metadata tool will have different needs, but I thought I’d mention some of the reasons why I’m working on a new tool. Point 1 won’t apply to other projects, but points 2 and 3 raise some more general issues.

  4. Hi Nick. Thanks for this call out and good luck with it!

    I’m a lo-fi linguist and prefer as few bells and whistles as possible, so I guess I relate to Eva’s comment above. For my PhD, I just used an Excel spreadsheet and that seemed to work just fine for that purpose. The benefit of a spreadsheet is I could define all my own fields myself and didn’t have to learn or use a new program.

    One thing I did struggle with and what I would like a metadata tool to do for me (because I always forget) is to keep track of my processing of a file – how much is transcribed, how much glossing has been done, whether the recording has been adapted into another material (e.g. a community resource), who copies have been given to in the community etc. So for me, it’d be cool if there was a very straightforward metadata program that would automatically respond to when a recording is being worked on, and note down who has worked on it and what did they do. If a tool could do this, maybe it could also then provide remembers and flag recordings requiring work – e.g. “you recorded this recording 30 days ago but have not started annotating it – would you like to do so now? etc.” Not sure if it’s possible – or maybe I could wrangle Excel to do this for me anyway – but that’s what I’d find useful.

  5. The ideal metadata tool for me would accept excel spreadsheets because I’m very happy with my current excel spreadsheets, the trouble is just getting the data into a shape that Paradisec and other archives, combined corpora etc can accept. Like Wamut I also spend quite a bit of time keeping track of where the processing of a file is at, i.e. whether it has been glossed, transcribed by me, by somebody else, sent to somebody else for checking other languages or phonetic analysis etc. However ideal metadata tool for me would not be infinitely flexible so that everyone can use it as their primary tool. What might be better was something that can draw on excel files, or other kinds of database files that fieldworkers often use to extract the relevant data, and even combine some fields, so as to convert it into a suitable format for catalogues for archives etc.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment