Language and Internationalization/Newsletters/2

Translate This Post

Welcome to the January 2024 edition of the Language and internationalization newsletter by the Wikimedia Foundation Language team!

This newsletter provides you with quarterly updates on new feature developments, improvements in various language-related technical projects and support work, community meetings, and ideas to get involved in contributing to the projects.

Subscribe to the newsletter

Key highlights

Fon Wikipedia officially launched after five years of development in the Wikimedia Incubator

Fon Wikipedia, born at Wikimedia Hackathon 2018 in Barcelona, has officially launched after graduating from the Incubator! Fon is spoken by millions in Benin and Togo and is the mother tongue for many. It’s also widely used in Benin as their national language. It took five years to create this new Fon Wikipedia. Since many people couldn’t write in Fon, and native languages in Africa get less attention than others, building a community to support the project was a tough challenge for the community members who started it.[1] Also, discover more about the four new Wikimedia language projects that were approved recently (Wikipedia Dagaare, Wikipedia Moroccan Amazigh, Wikipedia Toba Batak, and Wikiquote Banjar).

Mahuton, a Beninese Wikimedian on building a keyboard for easy article editing in Fon at Wikimedia Hackathon 2018, Barcelona. CC BY SA 4.0.

Introducing Sentencex, tool for enhanced Natural Language Processing (NLP) and multilingual sentence extraction

The language team has just launched a new tool called Sentencex, now available in both Python and Javascript. Sentence segmentation, an essential part of natural language processing, involves breaking down a text into individual sentences. This process has various uses and helps improve language functionality and speed, especially in Wikimedia’s new machine translation system (MinT) and the section translation project.[2]

You can find the tool on GitHub and see it in action.

MinT translation service available to 55 new Wikipedias, doubles content, ranks second in usage

The new machine translation service, MinT, which now offers machine translation for the first time to 55 Wikipedias, has had a positive impact on Wikimedia language communities. This extensive language support has nearly doubled published translations, and articles created using MinT have a low deletion rate (1.72%). MinT is now used in 8% of the translations published with Content Translation, making it the second most used translation service in Wikipedia, after Google Translate, in just a few short months.[3]

 Graphical representation of languages supported by MinT for the first time. CC BY SA 4.0.

Open language identification service now available for 200+ languages

The Language team created an open language identification service to automatically detect the language in which a given text is written to simplify users’ interaction with Wikimedia platforms. The service supports the detection of 201 languages, and anyone can access the API to use the service. Currently, the final checks for the service and the evaluation of its ability to withstand high traffic are underway.[4]

In 2012 fundamentalist Islamists took over the city of Timbuktu, Mali. Fearing for the safety of hundreds of thousands of ancient manuscripts, some dating from the 11th Century, a group of librarians and preservationists smuggled between 200,000 and 400,000 manuscripts from Timbuktu to the Malian capital of Bamako.

Since that time, the NGO SAVAMA-DCI (“Sauvegarde et Valorization des Manuscripts pour la defense de la Culture Islamique” in French, translated to English as “Association for the Protection and Promotion of Manuscripts and the Defense of Islamic Culture”), has worked to clean, protect, restore, digitize, and eventually translate hundreds of thousands of manuscripts.

This photo shows a worker gently cleaning the dust and other contaminants off one of the ancient manuscripts.

For more information on the group and their work, see my story at https://fischerfotos.exposure.co/preserving-malis-historic-manuscripts

Wikisource now recognizes handwritten texts with Transkribus

Handwritten text recognition is now active on Wikisource through the Transkribus OCR Engine. Transkribus, an AI-powered platform, simplifies the handling of handwritten or printed manuscripts by offering various models tailored to different writing scripts, historical periods, and other factors. The Transkribus engine is now available as an option alongside Google and Tesseract and it is currently operational on the Wikisources listed on this page.[5]

Unified section translation dashboard for desktop and mobile users

The Language team is actively working towards the adoption of a unified section translation dashboard for both desktop and mobile users. Originally designed for mobile in Content Translation, it’s now being refined to serve as a unified dashboard across various platforms, providing an improved translation environment. Currently in beta mode, you can test it on Test Wikipedia or any Section Translation-enabled wiki using the URL parameter “unified-dashboard=true” (e.g., ig.wikipedia.org/wiki/Special:ContentTranslation?unified-dashboard=true).

This unified dashboard offers a seamless cross-platform translation experience. Users can start translating on their desktop and continue on a mobile device, or vice versa. It also supports section translations on the desktop, giving users flexibility across devices.

Community meetings and events

Get involved

Stay tuned for the next release! You can subscribe to this newsletter.

References

  1. https://diff.wikimedia.org/2023/10/04/welcome-to-the-fon-wikipedia/
  2. https://diff.wikimedia.org/2023/10/23/sentencex-empowering-nlp-with-multilingual-sentence-extraction/
  3. https://diff.wikimedia.org/2023/11/20/unlocking-the-worlds-languages-in-wikipedia-a-look-into-mints-impact-so-far/
  4. https://diff.wikimedia.org/2023/10/24/open-language-identification-api-for-200-languages/
  5. https://diff.wikimedia.org/2023/07/13/enabling-handwritten-text-recognition-on-wikisource-using-transkribus-ocr-engine/

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?