{"id":10262,"date":"2026-02-14T05:56:20","date_gmt":"2026-02-13T19:56:20","guid":{"rendered":"https:\/\/www.paradisec.org.au\/blog\/?p=10262"},"modified":"2026-02-14T05:57:11","modified_gmt":"2026-02-13T19:57:11","slug":"speech-recognition-on-your-laptop","status":"publish","type":"post","link":"https:\/\/www.paradisec.org.au\/blog\/2026\/02\/speech-recognition-on-your-laptop\/","title":{"rendered":"Speech recognition on your laptop"},"content":{"rendered":"\n<p>Following on from the work reported on <a href=\"https:\/\/www.paradisec.org.au\/blog\/2025\/10\/large-language-models-for-small-languages\/\" target=\"_blank\" rel=\"noreferrer noopener\">two posts ago <\/a>with progress on speech recognition for Bislama and Nafsan, Aso Mahmudi has now created a desktop app (called Easper &#8211; <strong>E<\/strong>lan <strong>A<\/strong>utomated <strong>Spe<\/strong>ech <strong>R<\/strong>ecognition) that takes a wav file as input, segments it, does speaker diarisation, and transcribes it, delivering an Elan file as the output. All this is done on a laptop computer and needs no internet connection.<\/p>\n\n\n\n<p>As can be seen in the image below, you can set the number of speakers, and can alter the silence recognition (&#8216;gap between segments&#8217;) and the minimum segment length, depending on the characteristics of the recording and the rate of speech. You select the language model (which has to have been created in advance) and then run the process. For example, with a 44 minute file the first pass of segmentation took less than three minutes. Transcription took seven minutes on my laptop (MacBookPro 2021, 16Gb RAM, System 26.3). It always needs checking and correcting, but is remarkably good (as <a href=\"https:\/\/www.paradisec.org.au\/blog\/2025\/10\/large-language-models-for-small-languages\/\" data-type=\"link\" data-id=\"https:\/\/www.paradisec.org.au\/blog\/2025\/10\/large-language-models-for-small-languages\/\" target=\"_blank\" rel=\"noreferrer noopener\">reported earlier<\/a>, around 10% character error rate for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Nafsan_language\" target=\"_blank\" rel=\"noreferrer noopener\">Nafsan<\/a>).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.10.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"947\" src=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.10-1024x947.png\" alt=\"\" class=\"wp-image-10275\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.10-1024x947.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.10-300x277.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.10-768x710.png 768w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.10-1536x1420.png 1536w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.10.png 1802w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>Easper also provides Elan file analysis, as seen below, with the following results given:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>characters used in the file (can be corrected by the user)<\/li>\n\n\n\n<li>how many words in the file<\/li>\n\n\n\n<li>a frequency list of all words<\/li>\n\n\n\n<li>how many overlaps of speakers<\/li>\n\n\n\n<li>how many long segments (these can be problematic and result in artefacts like repetition of the same word many times in the transcript)<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.54.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"951\" src=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.54-1024x951.png\" alt=\"\" class=\"wp-image-10276\" srcset=\"https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.54-1024x951.png 1024w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.54-300x279.png 300w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.54-768x713.png 768w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.54-1536x1426.png 1536w, https:\/\/www.paradisec.org.au\/blog\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-17.12.54.png 1792w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>Aso Mahmudi is currently working on extending the Nafsan model to neighbouring languages. We can then determine how useful it will be to continue extending existing models to other languages with similar phonological systems and orthographies.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Following on from the work reported on two posts ago with progress on speech recognition for Bislama and Nafsan, Aso Mahmudi has now created a desktop app (called Easper &#8211; Elan Automated Speech Recognition) that takes a wav file as input, segments it, does speaker diarisation, and transcribes it, delivering an Elan file as the &#8230; <a title=\"Speech recognition on your laptop\" class=\"read-more\" href=\"https:\/\/www.paradisec.org.au\/blog\/2026\/02\/speech-recognition-on-your-laptop\/\" aria-label=\"Read more about Speech recognition on your laptop\">Read more<\/a><\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[101,6],"tags":[],"class_list":["post-10262","post","type-post","status-publish","format-standard","hentry","category-llm-and-asr","category-paradisec"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/10262","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/comments?post=10262"}],"version-history":[{"count":10,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/10262\/revisions"}],"predecessor-version":[{"id":10278,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/posts\/10262\/revisions\/10278"}],"wp:attachment":[{"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/media?parent=10262"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/categories?post=10262"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.paradisec.org.au\/blog\/wp-json\/wp\/v2\/tags?post=10262"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}