WikiTutur: Documenting Indonesia’s Regional Languages Through Wiktionary and Lingua Libre

Translate this post
A video introduction to WikiTutur

Indonesia, a nation renowned for its linguistic diversity, is home to over 700 living languages, making it the second most linguistically rich country in the world after Papua New Guinea. Yet, the digital representation of these languages remains scarce, even within the generally diverse Wikimedia movement. Gaps in representation still exist between languages with large and small numbers of speakers, as well as between those with and without institutional support, both locally and nationally.

In the Indonesian Wiktionary, entries for languages spoken in Indonesia are still lacking, much less those with audio pronunciation guides. Per January 2024, only 12 languages of Indonesia have their audio recordings available on Wikimedia Commons. Multiple regional Wikimedia communities in Indonesia have conducted several initiatives to collect more audio recordings and lexical entries in the regional languages of Indonesia, but most of them were limited in terms of focus, targeting only one language community at a time.

To address these gaps, a group of dedicated volunteers from the Wikimedia Jakarta Community launched WikiTutur, a language documentation initiative and community outreach program supported by Wikimedia Foundation Rapid Fund grants. Combining the power of collaborative platforms like Wiktionary and tools like Lingua Libre, WikiTutur aims to document and preserve Indonesia’s regional languages both textually and orally, ensuring their voices in the digital world.

What is WikiTutur?

WikiTutur” comes from two words: wiki ‘referring to a freely editable platform’ and tutur (Indonesian) ‘speech’. WikiTutur is a collaborative project aimed at documenting and preserving Indonesia’s local languages through Wiktionary and Lingua Libre. So, the goals of the project are preserving the languages orally and textually at once. The project involves workshops and meetups held across various cities in Indonesia, engaging local communities or institutions and supported by volunteers.

Lingua Libre is a tool by Wikimedia France, specifically designed to help native speakers record vocabulary in their languages for use in Wikimedia projects. Lingua Libre eliminates several intermediate steps between recording and uploading to Commons, making the process much more streamlined and easy to follow.

Preliminary outcomes

During the first phase of this program (January–June 2024), workshops were conducted across several Indonesian cities and online, where participants from linguistically diverse backgrounds were trained to record audio files for Wikimedia Commons with the help of Lingua Libre, and to input lexical entries on the Indonesian Wiktionary. This initial phase of the program has achieved remarkable results:

  • Expanded Wiktionary content: A total of 238 people actively contributed to adding or editing pages on Indonesian Wiktionary, especially for regional language terms.
  • Audio documentation through Lingua Libre: Despite some technical challenges, the project successfully collected audio recordings from various regional languages, including rare dialects. More than 10,000 audio files were collected in 59 distinctly named lects (regardless of dialect/language status), or 39 languages by ISO 639-3 codes.
  • Local community involvement: Collaborations with local Wikimedia communities in various cities have successfully engaged more people in regional language preservation efforts. A new community was even formed after a workshop.
  • Development of training materials: The project team has developed various training materials, presentations, and guides that can be used by other communities to conduct similar activities.
Map of recorded languages and dialects throughout the course of WikiTutur events

Challenges and lessons learned

Despite its successful outcomes, the first phase of WikiTutur also faced several challenges, such as:

  • Device compatibility: Not all participant devices were compatible with Lingua Libre, hindering the recording process.
  • Lack of experience among novice participants: Many participants were new to the Wikimedia platform and required additional guidance.
  • Limited resources for the languages concerned: Minimal cooperation with language communities and language development institutions made preparation for the workshop materials less than optimal.

From these challenges, the project team learned valuable lessons, such as the importance of conducting site surveys before events, ensuring participant readiness, and establishing partnerships with more parties.

Moving forward

The project team have identified several critical strategies to develop WikiTutur, which include:

  • Expanding the network: Establishing partnerships with more language communities and related institutions.
  • Improving content quality: Creating more comprehensive and accurate word lists.
  • Simplifying tool usage: Collaborating with Lingua Libre developers to address technical issues.
  • Sharing knowledge: Sharing project results and learnings with other Wikimedia communities.

As of December 2024, the second phase of WikiTutur is still ongoing. In this iteration, we seek to improve our impacts by collaborating with both Wikimedia and non-Wikimedia communities and institutions in Indonesia. In one of the workshops, we also invited volunteers from Wikimedia Community User Group Malaysia to participate, collaborate, share experiences and resources to enrich our Wiktionaries.

The inaugural WikiTutur workshop in Jakarta

Conclusion

With its collaborative spirit and community support, WikiTutur has proven to be a successful initiative in bringing people together in their ultimate aim to document Indonesia’s diverse linguistic heritage. By showcasing the potential utilization of Wikimedia projects and tools in supporting language documentation, WikiTutur also empowers its participants and language activists alike who seek to share their native tongues online, in hope of bridging the digital linguistic divide.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?