One missing slash equals an object lesson in keeping backups

This semester, I have been helping out Jane with her wonderful Field Methods class in technical matters such as recording, uploading files onto the server and allowing students to securely and quickly download both .wav and .mp3 files. I took this course myself some years ago, and it was a great experience for me and the whole class, and many members of that class have continued on in their studies to do field research of their own, and I’m sure the Field Methods class was as much a help to their research as it was to mine.
But this post is not about when I took the class. Instead, it’s about how I almost buggered up this semester’s class in what can best be described as a lesson in keeping backups of your recordings.
(Warning: Some computer nerd stuff follows after the fold.)

The course is being run in conjunction with Paradisec, which is where my helping hand comes in. We provided the equipment for the class to record their informants (two Karo Batak speakers), and provided space on our server for the recordings to sit. Eventually they will be archived in the larger Paradisec collection.
I always like to find ways of doing things quickly and simply using fairly simple programming. I’m not much of a programmer as such, but I know my way around bash, and have been using it to do most things automatically, such as moving things around and producing mp3 files of each recording.
This week, the field methods students began their individual sessions with their informant, meaning that there are suddenly many more than just one recording per week; in fact there are closer to five or six per day, for two days per week. With this in mind, Jane suggested we organise the recordings into directories based on what day they were recorded. A very sound suggestion which I was happy to implement.
Of course, it would have been too easy to do it manually; so I tried to do it in a couple of lines of code. The first step was to take the names of the recordings (which are named in line with out specifications at Paradisec), and create directories based on those filenames such that they will contain all recordings on a given day. To take an example, we might have a list of recordings such as the following.

  • FM2-20100310-01.wav
  • FM2-20100317-01.wav
  • FM2-20100317-02.wav

The command I wrote would create two directories, called FM2-20100310 and FM2-20100317 (the command would also try to produce a directory for the last file, but it fails, since it already exists after being created for the second). Here was the code:
$ for x in *; do mkdir ${x%-*} ; done
Translation: for all files, make a directory of the same name, but strip off everything from the last dash (-).
This worked fine, and despite some redundancy I had a bunch of directories, one for each day. The next step then was to move each file, like those above, into the directories that correspond to that day (which is always predictable from the filename). The code for this should have been:
$ for x in *; do mv $x ${x%-*}/ ; done
Translation: For all files, move them to the directory which has the same name, but with everything from the last dash (-) stripped off.
However I missed the crucial forward-slash in the section ${x%-*}/, meaning I had sent the files not to the directories of the same name, but to files of that name.
Now, when you have many files of the same day, the output filename for this command is the same. So the way the command is run, it takes the first file, say FM2-20100420-01.wav, and moves it (which is synonymous with renaming it) to the file FM2-20100420. If there is a second file, let’s say FM2-20100420-02.wav, then it similarly moves it to FM2-20100420, thus overwriting what was there before.
As I pointed out earlier, only this week did the class begin their individual sessions, so only this week was there more than a single recording in a given day. And therefore only recordings from this week were adversely affected (by which I mean deleted). The others were merely renamed.
Luckily, I realised what was going on by the fact that it was taking far too long to perform a mere move, and managed to stop it after only a couple of files got deleted. Even more luckily, especially as this saved my own skin, we have kept backups and the data is safe.
The problem can be boiled down – computationally speaking – to a mere missing slash. But the real culprit here was my trying to be too clever by half.
So let this be a lesson: Always, always keep backups. Especially if you are going to do any work on your recordings, even if you think it’s as mundane as simply moving them from one location to another.

7 thoughts on “One missing slash equals an object lesson in keeping backups”

  1. I always tell my students there are three basic principles of documentary linguistics: backup, backup, backup. And not just any kind of backup – one that’s not useless (for which have a look at this advice).
    Interesting to see the word “informant” used a couple of times in this post.

  2. And Backup Offsite – a student has just had her laptop AND backups on memory sticks stolen. All that labour coding data…. Terrible.

  3. Jane – that’s one of the ways to create useless backups, as explained in the web page I linked to.
    I recommend using Dropbox or similar facilities (like Files Anywhere or Jungle Disk) which provide a couple of gigabytes of storage for free, or more storage at fairly low cost. Alternatively, open a Gmail account and email files to yourself, or get on Google Docs which provides 1 Gigabyte of free storage (you can purchase more for US 25 cents per Gigabyte, and you can set different access priviledges to files stored there). Reportedly it will soon be possible to store any type of file, including audio and video, on Google Docs in its original format (though there is an upload limit of 250 Mbytes per file)

  4. Aidan,
    You might like to add “Always carefully test your regular expressions” to your lesson.
    As on old Unix hand I usually test things like this by replacing the “mv” or “mkdir” with an “echo” and have a close look at the output before committing to something irrevocable.
    Oh, and Peter, I recommend Dropbox to everyone. I swear by it for both backup and convenience – no more leaving files at home or work.

  5. Tony
    Yes, I was introduced to Dropbox by Claire Bowern and have used it extensively, both to backup files and to share files with colleagues on the other side of the world when doing a joint publication project. I have found it simple and easy to use, and, as you say, it means files are accessible from anywhere. We have also used Googledocs to write documents together (so much easier than emailing “track changes” documents back and forth) though there are limitations on what you can do with Googledocs, especially in terms of formatting.

  6. This site has some great information about backup — different types and methods are discussed in detail. It is intended for digital photographers and was an initiative funded by Library of Congress, but has advice with wide applicability. See also here.

  7. Hi Aidan, I enjoyed reading this little anecdote, having tried myself to be too clever with little shell scripts. Apart from the obvious importance of backups which you thankfully had, I always like to try it out on a test file/directory first.
    Nice to hear you are helping out with the Field Methods class – I have such fond memories of our semester with Muna. That course is gold.

Here at Endangered Languages and Cultures, we fully welcome your opinion, questions and comments on any post, and all posts will have an active comments form. However if you have never commented before, your comment may take some time before it is approved. Subsequent comments from you should appear immediately.

We will not edit any comments unless asked to, or unless there have been html coding errors, broken links, or formatting errors. We still reserve the right to censor any comment that the administrators deem to be unnecessarily derogatory or offensive, libellous or unhelpful, and we have an active spam filter that may reject your comment if it contains too many links or otherwise fits the description of spam. If this happens erroneously, email the author of the post and let them know. And note that given the huge amount of spam that all WordPress blogs receive on a daily basis (hundreds) it is not possible to sift through them all and find the ham.

In addition to the above, we ask that you please observe the Gricean maxims:

*Be relevant: That is, stay reasonably on topic.

*Be truthful: This goes without saying; don’t give us any nonsense.

*Be concise: Say as much as you need to without being unnecessarily long-winded.

*Be perspicuous: This last one needs no explanation.

We permit comments and trackbacks on our articles. Anyone may comment. Comments are subject to moderation, filtering, spell checking, editing, and removal without cause or justification.

All comments are reviewed by comment spamming software and by the site administrators and may be removed without cause at any time. All information provided is volunteered by you. Any website address provided in the URL will be linked to from your name, if you wish to include such information. We do not collect and save information provided when commenting such as email address and will not use this information except where indicated. This site and its representatives will not be held responsible for errors in any comment submissions.

Again, we repeat: We reserve all rights of refusal and deletion of any and all comments and trackbacks.

Leave a Comment