{"id":8983,"date":"2018-02-28T11:30:20","date_gmt":"2018-02-28T01:30:20","guid":{"rendered":"http:\/\/www.paradisec.org.au\/blog\/?p=8983"},"modified":"2018-02-28T11:30:42","modified_gmt":"2018-02-28T01:30:42","slug":"texts-and-more-texts-corpora-in-the-coedl","status":"publish","type":"post","link":"https:\/\/www.paradisec.org.au\/blog\/2018\/02\/texts-and-more-texts-corpora-in-the-coedl\/","title":{"rendered":"Texts and more texts: corpora in the CoEDL"},"content":{"rendered":"<p>Corpus development is one of the goals of the <a href=\"http:\/\/www.dynamicsoflanguage.edu.au\/\" target=\"_blank\" rel=\"noopener\">ARC Centre of Excellence for the Dynamics of Language<\/a> (see this <a href=\"http:\/\/www.dynamicsoflanguage.edu.au\/research\/language-diversity\/corpus-development\/\" target=\"_blank\" rel=\"noopener\">web page for more details<\/a>). We have run a number of workshops on corpus-related themes (e.g. <a href=\"https:\/\/sites.google.com\/site\/shapecorpus2017\/home\" target=\"_blank\" rel=\"noopener\">the 2017 workshop that included a day on converting early sources<\/a>).<\/p>\n<p>In addition to creating useable materials for the source communities (which we have a strong commitment to supporting) we are archiving records that include primary media, transcripts and associated annotations. We aim to produce from this material a subset of accessible texts for a number of languages.<br \/>\nHere it is worth noting that we have come up with this terminology (thanks to Jane Simpson for the formulation) to distinguish the objects we have collected:<br \/>\n<strong>Assemblage<\/strong> &#8211; all material collected, working files, early sources, multiple versions and drafts<br \/>\n<strong>Collection<\/strong> &#8211; the archived material, a subset of the above, but curated with sufficient metadata to allow the user to know what all items are<br \/>\n<strong>Corpus<\/strong> &#8211; a crafted set of texts in the language that can be used for further analysis<\/p>\n<p><!--more-->A corpus is a collection of texts in a language and is often built to address a particular research question, typically coding parts of the corpus to allow analysis of certain features. Some corpora are created with no particular question in mind, for example the <a href=\"http:\/\/clu.uni.no\/icame\/brown\/bcm.html\" target=\"_blank\" rel=\"noopener\">Brown corpus<\/a> or the <a href=\"http:\/\/ice-corpora.net\/ice\/\" target=\"_blank\" rel=\"noopener\">International Corpus of English<\/a>. It is the latter kind of material that we will be producing, texts that can be used for various kinds of analysis.<\/p>\n<p>The aim is to have corpora from texts in as many of the following languages as possible: Abui, Anindilyakwa, Bininj Gun-Wok, Cook Islands M\u0101ori, Dalabon, Gamilaaray\/Yuwal, Gurindji, Gurindji Kriol, Kalam, Kanjimey, Kayardild, Kaytetye, Kriol, Ku Waru, Marind, Mawng, Mudburra, Murrinhpatha, Nafsan, Nen, Ngaanyatjarra, Nungon, Vera&#8217;a, Warlpiri, Warumungu, Wubuy, Wutung, Yolngu (not yet ready to be made public).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Corpus development is one of the goals of the ARC Centre of Excellence for the Dynamics of Language (see this web page for more details). We have run a number of workshops on corpus-related themes (e.g. the 2017 workshop that included a day on converting early sources). In addition to creating useable materials for the &#8230; <a title=\"Texts and more texts: corpora in the CoEDL\" class=\"read-more\" href=\"https:\/\/www.paradisec.org.au\/blog\/2018\/02\/texts-and-more-texts-corpora-in-the-coedl\/\" aria-label=\"Read more about Texts and more texts: corpora in the CoEDL\">Read more<\/a><\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-8983","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/8983","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/comments?post=8983"}],"version-history":[{"count":2,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/8983\/revisions"}],"predecessor-version":[{"id":8985,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/8983\/revisions\/8985"}],"wp:attachment":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/media?parent=8983"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/categories?post=8983"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/tags?post=8983"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}