{"id":5533,"date":"2011-05-27T08:54:52","date_gmt":"2011-05-26T21:54:52","guid":{"rendered":"http:\/\/www.paradisec.org.au\/blog\/?p=5533"},"modified":"2011-05-27T09:06:56","modified_gmt":"2011-05-26T22:06:56","slug":"searching-in-endangered-languages-archives","status":"publish","type":"post","link":"https:\/\/www.paradisec.org.au\/blog\/2011\/05\/searching-in-endangered-languages-archives\/","title":{"rendered":"Searching in Endangered Languages Archives"},"content":{"rendered":"<p>In a <a href=\"http:\/\/www.paradisec.org.au\/blog\/2011\/04\/who-uses-digital-language-archives\/\">previous post<\/a> I looked at who might be using materials deposited in endangered languages digital archives. In this post I will look at searching for information in several of the larger archives.<\/p>\n<p>The Archive of the Indigenous Languages of Latin America (<a href=\"http:\/\/www.ailla.utexas.org\/site\/welcome.html\">AILLA<\/a>) has a simple <a href=\"http:\/\/www.ailla.utexas.org\/search\/search.html\">search interface<\/a> which allows users to find materials by Language, by Country or by Genre using drop down menus that contain a list of search terms (click on the image to enlarge).<\/p>\n<p><a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Archive-of-the-Indigenous-Languages-of-Latin-America.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Archive-of-the-Indigenous-Languages-of-Latin-America-300x158.png\" alt=\"\" title=\"Archive of the Indigenous Languages of Latin America\" width=\"300\" height=\"158\" class=\"aligncenter size-medium wp-image-5554\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Archive-of-the-Indigenous-Languages-of-Latin-America-300x158.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Archive-of-the-Indigenous-Languages-of-Latin-America-1024x540.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Archive-of-the-Indigenous-Languages-of-Latin-America.png 1261w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>I tried searching for &#8220;Educational Material&#8221; and got a list of 56 deposits presented in no apparent order (click on the image to open it and again to enlarge).<\/p>\n<p><a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/AILLA_Education.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/AILLA_Education-85x300.png\" alt=\"\" title=\"AILLA_Education\" width=\"85\" height=\"300\" class=\"aligncenter size-medium wp-image-5557\" \/><\/a><\/p>\n<p>Details about particular deposits can be found by clicking on <i>Details<\/i> link on the right-hand side. If relevant files are publicly available then they can be downloaded or played from this detailed page of information about the individual item.<\/p>\n<p>The Pacific and Regional Archive for Digital Sources in Endangered Cultures (<a href=\"http:\/\/www.paradisec.org.au\/\">Paradisec<\/a>) has a form-based <a href=\"http:\/\/azoulay.arts.usyd.edu.au\/paradisec\/search_item.php\">search interface<\/a> that requires the user to understand the metadata categories used by the archive. While it is possible to search by language name, dialect, or even village, or to find deposits according to who collected them, it seems that nothing about the <b>content<\/b> of the deposited materials is searchable. So I could not work out a way to find &#8220;Educational Material&#8221; in Paradisec for example.<\/p>\n<p><a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Paradisec-Search-items.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Paradisec-Search-items-300x158.png\" alt=\"\" title=\"Paradisec - Search items\" width=\"300\" height=\"158\" class=\"aligncenter size-medium wp-image-5569\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Paradisec-Search-items-300x158.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Paradisec-Search-items-1024x540.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/Paradisec-Search-items.png 1261w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>To access the actual files in Paradisec one must be a registered user and login to the system before being able to download materials.<\/p>\n<p>Users of the archive of the Dokumentation Bedrohter Sprachen (<a href=\"http:\/\/www.mpi.nl\/dobes\">DOBES<\/a> project are invited to &#8220;Come in and have a look&#8221; on its home page. There are 15,626 sessions available in the archive but searching for materials in DOBES requires using the &#8220;Metadata browsing&#8221; IMDI browser tool which is a Java applet that runs in the user&#8217;s web browser (so you must have Java installed and enable cookies). This presents the user with a hierarchy tree that is navigated by clicking on nodes in a graphical representation on the left of the web page. If the user right clicks on a node (I don&#8217;t know how Macintosh users do this and no instructions are given on the site) several options are presented, including keyword search. I selected &#8220;Metadata search&#8221; which opens up a simple keyword search interface and tried searching for &#8220;Educational Material&#8221; &#8212; this returned a list of 1223 session names.<\/p>\n<p><a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational-300x188.png\" alt=\"\" title=\"DOBES_Educational\" width=\"300\" height=\"188\" class=\"aligncenter size-medium wp-image-5572\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational-300x188.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational-1024x643.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational.png 1275w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Clicking on one of the session names in the list brings up individual file information, indicated by a green bag (but be prepared to wait as the Java applet has to reload, open a new window, move down to the relevant node in the hierarchy tree and display the stored metadata). When I tried this with a random line in the list I ended up at a Marquesan deposit for a &#8220;Farm Animals game&#8221; in which the word &#8220;educational&#8221; appears in the description of the r\u00f4le of French in the Pacific!<\/p>\n<p> <a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational2-300x208.png\" alt=\"\" title=\"DOBES_Educational2\" width=\"300\" height=\"208\" class=\"aligncenter size-medium wp-image-5574\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational2-300x208.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational2-1024x712.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Educational2.png 1151w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>To access the actual files one must be a registered DOBES user and login to the system. Note that DOBES has an additional sophisticated search function which enables users to search for annotations within Toolbox and ELAN files in the archive (probably only linguists would be interested in this). Thus it is possible, for example, to search across a set of DOBES deposits for a given morphemic gloss such as &#8220;ERG&#8221;.<\/p>\n<p>Note that I also tried a search on &#8220;Teaching Materials&#8221; and this returned files with &#8220;teaching&#8221; anywhere in the metadata description, such as one from the Chintang and Puma project where we are told that Nepali is used in teaching in Nepal.<\/p>\n<p><a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Teaching2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Teaching2-300x174.png\" alt=\"\" title=\"DOBES_Teaching2\" width=\"300\" height=\"174\" class=\"aligncenter size-medium wp-image-5575\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Teaching2-300x174.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Teaching2-1024x594.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/DOBES_Teaching2.png 1278w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>The Endangered Languages Archive at SOAS (<a href=\"http:\/\/www.elar-archive.org\/\">ELAR<\/a>) has a new home page and a new <a href=\"http:\/\/elar.soas.ac.uk\/search\/apachesolr_search\">search capability<\/a> designed and implemented by Tom Castle, Ed Garrett and David Nathan. This enables users to search the 7,587 resources in the archive in several ways.<\/p>\n<p><a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/elar.soas_.ac.uk-search-apachesolr_search.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/elar.soas_.ac.uk-search-apachesolr_search-300x158.png\" alt=\"\" title=\"elar.soas.ac.uk search apachesolr_search\" width=\"300\" height=\"158\" class=\"aligncenter size-medium wp-image-5582\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/elar.soas_.ac.uk-search-apachesolr_search-300x158.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/elar.soas_.ac.uk-search-apachesolr_search-1024x540.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/elar.soas_.ac.uk-search-apachesolr_search.png 1261w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>The search interface is built from the metadata provided by depositors and classified into several types: Country, Language, Type, Tags, Genre, Topic, and Participants. There are boxes on the lefthand side listing all the terms used by depositors to describe their materials. Thus, the &#8220;Genre&#8221; list has the following (the numbers after each term reflect the number of file bundles with that categorisation):<\/p>\n<p>Bislama version (32)<br \/>\nChronicle (13)<br \/>\nCommentary (2)<br \/>\nCommunity materials (17)<br \/>\nConsonant contrasts (48)<br \/>\nConversation (138)<br \/>\nCulture (13)<br \/>\nCustom description (13)<br \/>\nCustom narrative (16)<br \/>\nCustom story (1)<br \/>\nDescription (53)<br \/>\nDescriptive narrative (3)<br \/>\nDescriptive (3)<br \/>\nDictionary Materials (131)<br \/>\nDirectional story (4)<br \/>\nDiscourse (8)<br \/>\nDoctoral dissertation (1)<br \/>\nElicitation (802)<br \/>\nEncouraging speech (2)<br \/>\nEthnographic (13)<br \/>\nFailed Recording (7)<br \/>\nFolk Definition (15)<br \/>\nFolk Tales (20)<br \/>\nFolk tale (23)<br \/>\nFrog story (11)<br \/>\nGrammar Materials (63)<br \/>\nGrammar Qs (14)<br \/>\nGrammar elicitation (8)<br \/>\nHistorical description (21)<br \/>\nHistory (26)<br \/>\nHumor (7)<br \/>\nInteraction (4)<br \/>\nInterview (14)<br \/>\nKastom story (30)<br \/>\nKastom (6)<br \/>\nKinship terms (3)<br \/>\nLanguage teaching (17)<br \/>\nLetter (4)<br \/>\nLexical items (24)<br \/>\nLexicon (6)<br \/>\nLocal history \/ personal story (2)<br \/>\nLocal history (4)<br \/>\nLove song (70)<br \/>\nMetadata (6)<br \/>\nMiscellaneous (9)<br \/>\nMusic (14)<br \/>\nMyth narrative (16)<br \/>\nNarration (12)<br \/>\nNarrative from visual prompt (4)<br \/>\nNarrative (114)<br \/>\nNon-traditional narrative (1)<br \/>\nNuestras tradiciones (4)<br \/>\nNuestras vidas (4)<br \/>\nNuestros cuentos (7)<br \/>\nOratory (3)<br \/>\nPersonal history (3)<br \/>\nPersonal narrative (3)<br \/>\nPersonal story (51)<br \/>\nPersonal (2)<br \/>\nPicture\/video description (11)<br \/>\nPlanned (translated) myth narrative (5)<br \/>\nPlanned interview (2)<br \/>\nPrayer (2)<br \/>\nPraying (2)<br \/>\nPrimary Data (34)<br \/>\nPrimary Text (1158)<br \/>\nProcedural text (10)<br \/>\nProcedure (19)<br \/>\nRitual dance (5)<br \/>\nRitual singing (3)<br \/>\nRoute description (10)<br \/>\nSchool Materials (11)<br \/>\nSecret\/Sacred (4)<br \/>\nSemi-spontaneous interview (2)<br \/>\nSentence trans (52)<br \/>\nSong (108)<br \/>\nSongs (21)<br \/>\nSpeech (2)<br \/>\nStaged event (50)<br \/>\nStories (19)<br \/>\nStory (11)<br \/>\nSurvey (5)<br \/>\nSwahili summary (2)<br \/>\nTalk (16)<br \/>\nTeaching materials (17)<br \/>\nText (57)<br \/>\nText-based elicitation (21)<br \/>\nTexts (62)<br \/>\nTonal contrasts (35)<br \/>\nTraditional games (2)<br \/>\nTranscription (11)<br \/>\nTranscription\/translation (29)<br \/>\nTranscriptions (not yet categorised) (2)<br \/>\nTranslation (19)<br \/>\nTravel (1)<br \/>\nVideo recording of everyday activity (4)<br \/>\nVowel contrasts (34)<br \/>\nWord list (18)<br \/>\nWord\/phrase trans (57)<br \/>\nWordlist (4)<\/p>\n<p>Notice that ELAR does not insist in standardisation of metadata categories (or require them to be in English) but allows depositors to express information about their materials which they consider to be relevant and important. ((Metadata is generally understood to be data about the data, recorded to ensure that its context, meaning and use can be properly determined. As I noted in a <a href=\"http:\/\/www.paradisec.org.au\/blog\/2011\/01\/lsa-2011-sessions-on-metadata-in-language-documentation-and-description\/\">previous post<\/a> &#8220;early work in language documentation starting around ten years ago was heavily influenced by library concepts (eg. Dublin Core), and &#8230; key metadata notions were interoperability, standardisation, discovery, and access &#8230; Today, however, we see more focus on expressivity and individuality in metadata descriptions that researchers are creating, and increasing emphasis on protocols, meta-documentation (documentation of the documentation itself), greater clarity on stakeholder rights and responsibilities, and more diverse ways in which researchers are creating and manipulating their metadata.&#8221;))  This results in some minor variation in classification (Narration versus Narrative, for example) or potentially synonymous categorisation (eg. Personal history versus Personal narrative, perhaps). One nice feature is that users can directly search for particular individuals who contributed to deposits as speakers.<\/p>\n<p>I tried searching for &#8220;Teaching materials&#8221; in the Genre category (&#8220;Educational material&#8221; was not in the list) and this gave a listing of 17 bundles of data:<\/p>\n<p><a href=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/ELAR_Teaching.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/ELAR_Teaching-300x246.png\" alt=\"\" title=\"ELAR_Teaching\" width=\"300\" height=\"246\" class=\"aligncenter size-medium wp-image-5583\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/ELAR_Teaching-300x246.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/ELAR_Teaching-1024x842.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2011\/05\/ELAR_Teaching.png 1261w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>ELAR tags all of its materials for access and usage status using its URCS system (U = open to all, R = researchers, C = community members, S = subscribers approved by the depositor or their delegate) so any bundles that are tagged as U will contain files that can be downloaded or played immediately by any registered user (there are 4,217 such bundles in the archive&#8217;s collection, ie. over half of the total deposited material is immediately accessible). <\/p>\n<p>There is now a wealth of resources available on endangered languages in these archives and others, and I encourage readers to access them and explore the wonderful videos, sound files, pictures and text materials that await them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a previous post I looked at who might be using materials deposited in endangered languages digital archives. In this post I will look at searching for information in several of the larger archives. The Archive of the Indigenous Languages of Latin America (AILLA) has a simple search interface which allows users to find materials &#8230; <a title=\"Searching in Endangered Languages Archives\" class=\"read-more\" href=\"https:\/\/www.paradisec.org.au\/blog\/2011\/05\/searching-in-endangered-languages-archives\/\" aria-label=\"Read more about Searching in Endangered Languages Archives\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[9,33],"tags":[],"class_list":["post-5533","post","type-post","status-publish","format-standard","hentry","category-archiving","category-endangered-languages"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/5533","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/comments?post=5533"}],"version-history":[{"count":29,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/5533\/revisions"}],"predecessor-version":[{"id":5588,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/5533\/revisions\/5588"}],"wp:attachment":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/media?parent=5533"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/categories?post=5533"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/tags?post=5533"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}