Online Elan file player in the PARADISEC collection

In PARADISEC we store media files with their transcriptions whenever possible, typically in .eaf format, created by the standard transcription tool Elan. Best practice in language documentation includes creating a corpus of media with transcripts so that others can access it in future and locate what is in the files. Untranscribed files remain largely inaccessible, … Read more

Offline static collections

Following the earlier discussion of creating collections for offline delivery (particularly on Raspberry Pi), we now have a simple method that indexes a set of items from the PARADISEC collection and generates an html view, which means that files are not disconnected from the catalog in the way they were in the past. To do … Read more

Large language models for small languages

Of the 7,000 languages in the world today, most have little presence on the internet. What records there are in these languages are often religious translations from large languages and so, while being a valuable set of texts in the language, they have little local cultural content. For some of these languages, there are recordings … Read more

Splicing, cracking, rehousing: Preparing cassettes for digitisation

Some cassette collections we receive for digitisation are in good shape and digitisation is a relatively smooth process. Often though, this is not the case. Some collections we are currently receiving were recorded over 50 years ago, have often been sitting for years in less than ideal conditions for maintaining their quality, e.g. high humidity, … Read more

The Insider Archivist: Collecting Music Recordings from East New Britain (ENB)

In 2024, I initiated the “ENB Digitisation and Preservation Project” aiming to collect old analogue tape recordings from my community in East New Britain in Papua New Guinea. As a staff member at PARADISEC, community members were approaching me to let me know they had recordings of church choir songs, gospel songs, choral music, string … Read more

The Tape Restorator

We wrote about dried out cassette tapes in an earlier blog post, and the problem they create for playback, screeching as they try to move through the playback machine’s mechanism and ultimately failing to play. You can hear an audio example in that post. To get the tapes into a playable form, they need to … Read more

From Manuscript to Machine: Advancing Access to Cultural Heritage with OCR & HTR Tools

PARADISEC contains, at an informed guess, in the tens-of-thousands of pages of handwritten notes relating to the languages and cultures of the Pacific region. Many of those pages pertain directly to audio-visual media also housed in the archive, such as audio or video files, and the pages might include transcriptions, translations, explanations, notes, etc, of … Read more