Language and Internationalization/Newsletters/4

Translate this post

Welcome to the July 2024 edition of the Language and internationalization newsletter by the Wikimedia Foundation Language and Product Localization team! This newsletter provides you with quarterly updates on new feature developments, improvements in various language-related technical projects and support work, community meetings, and ideas to get involved in contributing to the projects.

Key highlights

New designs for Translatable pages published

Translatable pages support the creation of multilingual content on wikis such as Meta or MediaWiki. Recent user research found that the page creation feature, which involves two steps: setting up translation and marking a page for translation, while very useful, is a bit too complex for our users, both new and regular. New designs for Translatable pages propose changes to the page creation and editing workflow, including three entry points for translation settings: page alerts, language selector, and tools menu. The goal of these entry points is to give more clarity on actions to be taken by editors, by providing them with more context around missing translations and languages relevant for them. According to the new designs, every new content page will have a designated translation settings page where an editor can select if the content is stable and ready for community translations, regardless of whether or not manual markup was added to that page. The implementation of these proposals are currently under discussion with the Language and Product localization team’s engineers. Detailed designs and research can be found here.

Wikimedia Foundation joins Unicode Consortium

The Wikimedia Foundation recently became one of the 30+ organizational members of the Unicode Consortium. Founded in 1991, Unicode Consortium is a non-profit organization dedicated to developing and maintaining universal and open-source character encoding standards. Their work goes beyond character encoding to include character properties, algorithms, language and data for internationalization, and software libraries to support these standards. This work is carried out by leaders from governments, research and educational institutions, industry groups and associations in internationalization, fonts, rendering, and all aspects of text processing. Until now, the Wikimedia Foundation had never held membership with the Consortium, though it has participated in Unicode projects, discussion forums, and conferences in the past. This membership, valid for a period of one year, will help us to be better aware of and advocate for issues related to language support and scripts relevant to our projects and tools. Through this relationship, we will also be able to act as a connecting voice between our movement and the Consortium. Learn more about this membership.

MinT for Wiki Readers is available on 23 Wikipedias

MinT for Wiki Readers is an on-wiki tool that allows readers to search for any article and generate automatic translation of that article from the language it is covered in to available languages on Wikipedia. It uses the MinT (Machine in Translation) service. You can access the special page on any language Wikipedia (e.g., English Wikipedia) and search for an article to get a machine-translated version from other languages. The contents for a topic can vary for each language. Initially, Igbo Wikipedia was chosen for the initial enablement since they have been using MinT frequently for the creation of translations. After the initial enablement, more wikis of different characteristics were targeted: communities with machine translation for the first time with good levels of activity, medium-sized wikis with the potential to use machine translation to expand sections, and communities where data shows MinT may be providing good quality translations. So far, 23 Wikipedias have received this feature. See a complete list of wikis [1] [2]. Below is a quote from a user:

Love the new feature that translates any language articles to Santali! […] The ability to seamlessly translate any article into Santali is incredibly impressive. This feature can bridge the gap for readers who want to consume content without actively contributing translations themselves. There are many articles which are only available in Santali, which may be read by others too. Great for readers who just want to enjoy content. – Rocky 734

Kadazandusun language Wikipedia launched

Wiki Kent Club’s first meeting after inauguration in Wikimania Singapore 2023 under user group in Malaysia. CC BY SA 4.0 User:Jjurieee

Kadazandusun is an indigenous language spoken by the Kadazan-Dusun people in Sabah, Malaysia, with a total population of 714,000 as of 2024. It is a member of the Austronesian language family and has several dialects, reflecting the diverse cultural heritage of its speakers. Content development in Kadazandusun language in Wikimedia Incubator was first initiated in 2011, but remained inactive for over 10 years due to lack of active editors. At Wikimania Singapore 2023, Incubator was reintroduced by Taufik Rosman to the members of the Kent Wiki Club and the development regained. Kent Wiki Club is a Wikimedia student association at the Institute of Teacher Education Malaysia supported by the Wikimedia Community User Group Malaysia (WCUGM). Their goals are to uplift the Kadazandusun language, introduce the cultures of the Kadazandusun community globally and digitize the cultural and language knowledge of Kadazandusun. Through their consistent efforts to grow content in the incubator through community events and on various occasions (WikiGap, Wikipedia Birthday, Wikimedia Incubator workshop & edithaton in conjunction with the Wikipedia’s 23rd birthday celebration), 858 Kadazandusun articles were added in the incubator project leading to the launch of the Kadazandusun Language Wikipedia in May 2024. Learn more in this Diff blogpost [3].

Translation dumps available for download

Regular dumps of translations from translatewiki.net are now available for download. These dumps are provided as a tarball that consists of .po files grouped into folders for each project on translatewiki.net. The translation dumps are licensed under CC BY 3.0, the same as the translations by translators. These dumps are created for use in training translation memories or machine translation services. This translation dump is updated every six months.

Changes coming to MLEB extension release cycle

The MediaWiki Language Extension Bundle (MLEB) is a set of MediaWiki extensions designed to enhance multilingual support. It is a collection of tools and features aimed at improving the multilingual capabilities of MediaWiki sites, making it easier to manage and display content in multiple languages. The key extensions included are: Babel, CLDR (“Common Locale Data Repository”), CleanChanges, Translate and UniversalLanguageSelector. One of the benefits of using MLEB is that the bundle is regularly updated to include the latest translations and localization data, ensuring that the MediaWiki site is always current with the latest language support. Started in 2012 as monthly releases, the shift to quarterly releases occurred in 2015. Since the development of the extensions has stabilized with fewer changes and sufficient language coverage, the MLEB will now be released twice a year, aligning with MediaWiki releases. This change aims to address the challenges of maintaining backwards compatibility, which requires managing stability and compatibility across different software versions and adds complexity to extension maintenance and contributions.

New Wikimedia Foundation team “Language and Product Localization” announced

With the start of the new fiscal year, the Wikimedia Foundation’s Product and Technology department continues to focus on addressing content and knowledge gaps. A new Language and Product Localization team has been formed by merging the Language and Inuka teams to consolidate their efforts in supporting language communities. Since both teams shared the goal of closing knowledge gaps and there was an overlap with the communities they work with, the merger aims to enhance work efficiency and use the combined skills and expertise of both teams. The new team will focus on “supporting multilingualism within the movement and providing standards-based tooling for Wikimedia communities” to advance localized technical initiatives, bridge knowledge gaps, and promote language equity [4].

Introducing WikiProject Multilingualism

WikiProject Multilingualism organizes efforts to achieve 100% Wikidata multilingualism for every language with MediaWiki internationalization support. It is initiated, developed, and supported by the Wikimedia Language Diversity community volunteers. You can support the project by localizing content—properties, items, labels, and descriptions—into the languages you know, and through outreach and documentation. This project focuses on providing detailed statistics, collecting best practices, and supporting language-related subprojects to strengthen their presence in Wikidata’s future.

MinT in the press

AI can play a key role in providing access to knowledge, especially during current events such as the recent elections in India, where machine translation tools can be instrumental in enhancing access to Wikipedia’s reliable information. Machine in Translation (MinT), a Wikimedia machine translation service, includes external data models such as IndicTrans2, an IIT Madras initiative focused on building open-source language models for Indian languages. This integration has enabled MinT to support Wikipedia editors in translating content into 22 Indian languages as part of the over 250 languages that MinT supports globally. This collaboration benefits both projects: the IndicTrans2 system helps increase contributions in Indian languages on Wikipedia, and through more contributions, it improves the quality of machine translations available in these local languages. One success story is the Santali language Wikipedia, which has experienced a 30% increase in content creation since the deployment of MinT. More on this can be found in User:Pginer-WMF’s interview with the Indian Express [5].

Persian Wikipedia passed 1 million article milestone

The logo of one million Persian Wikipedia articles. CC BY SA 4.0 User:Parsa 2au

Persian Wikipedia (fa.wikipedia.org), which recently celebrated its 20th anniversary, has surpassed the 1,000,000 article milestone. Launched in December 2003, as of July 2024, it boasts 1,008,176 articles and ranks as the 19th largest by article count among Wikipedia language editions. It is the most widely used version of Wikipedia in Iran and Afghanistan. Persian Wikipedia garners around 350 million monthly page views [6], supported by approximately 1,500 active editors [7] and around 34 administrators who maintain its operations.

Language converter for the Manipuri language developed at Wikimedia Hackathon

Manipuri language can be written in the Meitei script (also referred to as Meetei) and in the Bengali script. Until now, Wikipedia in this language has been written in the Meitei script, but there has been a long-standing request from the editor community to provide a way to display it in Bengali for those who prefer reading in that script. Script conversion technology has been available in MediaWiki for many years. However, for more complex languages, developers require input from volunteers who are well-versed in the language. At the Wikimedia Hackathon in Tallinn, User:Nokib_Sarkar, who is familiar with the Bengali script, developed a conversion algorithm and collaborated with User:SSastry_(WMF) and User:Aaharoni-WMF to begin implementing it in PHP for MediaWiki. After the Hackathon, the code went through several review cycles and is now deployed and functional. If you want to try it, please visit mni.wikipedia.org and find the script conversion button next to the “Talk” tab.

Community meetings and events

  • In case you missed the language community meeting in May, you can catch up by watching the video recording and reading the notes. Sign up here to attend the upcoming meeting in August.
  • Attending Wikimania Hackathon in Poland? There will be a Language Support track table focusing on addressing small, language-specific technical requests across various Wikimedia projects to support new and existing language communities. Join the track organizers and participants to collaborate on technical tasks, organize technical sessions, run a support help desk, and more.
  • User:Pginer-WMF presented at esLibre conference in Valencia [8] on the topic of tools for translating Wikipedia.

Get involved

Stay tuned for the next release! You can subscribe to this newsletter.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?