Merging SayMore audio snippets into a single wav file

SayMore is a piece of software developed by SIL that (among other things) allows you to annotate a primary audio file with audio annotations. This means that speakers can add information by carefully re-speaking an utterance, or giving an oral translation. However, this becomes a problem because each annotation segment is saved as a separate file, which means you have to manage or archive hundreds or even thousands of 1-2 second audio files.

Simon Allen (Appen) has written a script which concatenates these short snippets according to their timestamps relative to the primary audio file, with silence in between. This file can then be added as a linked media file in ELAN, so you can switch to listen to that channel if you want to hear the oral translation or careful re-speaking at a particular point.

The script will work on Linux and Mac (it might work on Windows but has not been tested). It has only been tested with Python 2 but could be easily adapted to Python 3.

The script also makes use of the SoX program/command. See ‘Troubleshooting’ if you’re getting errors related to this.

Usage can be downloaded from the CoEDL GitHub repository here.

The script takes a single directory as input, and outputs one or two wav files (careful speech or oral translation as applicable). The contents of the input directory might look something like this:

Each file starts with the same session name, in this case LAN1-140311-000. (You can even have multiple sessions in one folder, and it will work across all of them, as long as the filenames have the same ‘stem’). The recordings of careful speech and oral translations include the start and end times in the filename, and end with a ‘C’ or ‘T’ respectively. The script uses this information to output two wav files (…translation.wav and …careful.wav). This is what it looks like when I open up the folder in Terminal, run the script, and check the contents of that directory:

I navigated to this directory by using the ‘cd’ command:

cd ~/Dropbox/SayMore/data

(Hint: on a Mac, you can just drag the little folder icon at the top of an open Finder window into the terminal, and it will copy out the path.)

The command I used above is:

python ../ -i . -o .

And these are its component parts:
– I saved the script in the folder above the current location (~/Dropbox/SayMore), so I find it by going ‘up’ with ../
– The input folder is specified after a -i option. In this case, I was already working in the directory with my data so I specify it with a full stop ‘.‘, but I could also put the full path: -i ~/Dropbox/SayMore/data
– Likewise, the location where I want the script to output the two wav files is specified after a -o option. I wanted the output to appear in the same folder, but I could have put any location.


If you follow the above instructions and get an error message containing this line:

sh: sox: command not found

then it means you don’t have SoX installed (SoX is a bit of software that converts audio between different formats). It already comes on most Linux machines. If you have Homebrew installed on your Mac, you can install SoX by entering the command:

brew install sox

Otherwise, you can download it from here.

You can check which version of Python you have on your computer with the command:

python --version

If you prefer to use Python 3 and this script isn’t working, please email swilmoth at appen dot com.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment