Javanese Wikisource Competition

Translate This Post

In 6th of March, the Yogyakarta Wikimedia Community (Komunitas Wikimedia Yogyakarta) held the first of what’s planned to be annual Javanese Wikisource Competition (Kompetisi Wikisumber). Although this is the first time the Yogyakarta Wikimedians organize this competition, they were no stranger to edit-a-thons and writing challenges, having previously organized WikiTeroka 3.0 (2022) and WikiTeroka 4.0 (2023) for Javanese Wikipedia. In fact, this is not the first time there’s a competition involving Javanese Wikisource either. The Indonesian Wikisource Community (Komunitas Wikisource Indonesia) held an annual Indonesian Wikisource Competition since 2020, and in the 2022 edition the community invited the recently hatched Balinese and Javanese Wikisource to participate. There was 10 Latin Javanese books and 2 Javanese script books, totaling almost two thousand pages that were proofread and validated in just 2 weeks.

So what’s so special about the Javanese Wikisource Competition? Beside proofreading around 30 Latin Javanese books of various subjects, the participants could also proofread three special manuscripts dated from late 18th century to early 19th century. These manuscripts are Serat Jaya Lengkara Wulang (1803) – BL MSS Jav 24, Serat Sélarasa – BL MSS Jav 28, and Serat Damar Wulan – BL MSS Jav 89. These three handwritten manuscripts totaling more than a thousand pages are preserved in pristine form and digitized along with the rest of Javanese manuscripts collection from the British Library. Javanese language, spoken by around 100 million people that primarily reside in Java island in Indonesia, is one of the unique language in the world which have literature tradition in three writing system: the European influenced Latin alphabet, the Arabic influenced Pegon script, and the native Javanese script (itself is a derivative of the 8th century Kawi script). Thus, it is not unusual for a title to be written in three different scripts, although nowadays it’s hard to find any publication in Javanese language.

With the dearth of literature in Javanese language, more so for older and hard to find books (almost all Javanese books are out of print), the Yogyakarta Wikimedia Community wishes that the books provided in the Javanese Wikisource – and the articles in Javanese Wikipedia – could alleviate the need for good reading materials that are freely available in Javanese language.

Proofreading Javanese script manuscripts also poses its own challenge. The Javanese script input method was not available in Javanese Wikipedia until 2013, and most people are still not familiar with typing in their own script, because years of neglect and lack of use in daily life. Some video tutorials were provided so that people can type the script using Universal Language Selector, and an online workshop were held by GLAM Indonesia last October. It is helpful, therefore, that another organization have taken the monumental task to transcribe the manuscripts, albeit in Latin transliteration format. Yayasan Sastra Lestari, in partnership with British Library, have all the pages from the three manuscripts transcribed on their website. Whenever a Wikisource proofreader stumped on the readings of the original text, they can take a look at the transliteration for reference.

The competition coincides with the launch of Wikisource Loves Manuscript in Jakarta on the International Mother Tongue Day, with the aim to digitize, preserve, and transcribe over 20,000 pages of Indonesian Manuscripts. Ideally, just like proofreading books in Latin is helped by OCR (Optical Character Recognition), the same goes for non-Latin scripts, including Javanese script. Unfortunately it is not possible yet to OCR a page with Javanese script. The Wikisource Loves Manuscript pilot project aim to use AI-powered Transkribus tool to automate the text recognition. But the technology has a catch: in order for it to be good at character recognition, it has to be trained first using existing data. But no data for Javanese manuscripts and its corresponding transcription exist yet. This is where the Javanese Wikisource Competition have a role; by providing enough user-proofread transcription, the Transkribus could learn how to distinguish Javanese script. With more data and training, the more proficient it becomes. Hopefully, in the future Javanese Wikisource competitions, the tool will be good enough to provide the base OCR text to be checked by human (Wikisourcian) proofreaders. Then, more and more knowledge from literature written in Javanese script, once out reach from the common people, would be unlocked and their wisdom gets shared to the modern audience.

The Javanese Wikisource Competition 2023 takes place from 6-20 March 2023. Participants are open for everyone, without having to know Javanese language (esp. for the Latin books). But the prize for winners and souvenirs for participants can only be sent to Indonesian address.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?