Content Translation helped create 30,000 new Wikipedia articles this year

Translate This Post

Screen-Shot-2015-11-11-at-2.07.39-PM
Content Translation helps volunteers translate articles between different-language Wikipedias. Screenshot from “Content Translation Screencast” video. Screencast by Pau Giner, freely licensed under CC BY-SA 4.0.

Weekly article creation, deletion and in-progress trends for Content Translation for the English Wikipedia. Photo by Runa, freely licensed under CC0 1.0.

The number of articles created with the Content Translation tool recently crossed the 30,000 mark.[1] This tool is being used by more than 7000 editors to translate Wikipedia articles into many languages.
As per our recent observations, on average more than 1000 new articles are created each week using Content Translation. The number of articles deleted, as part of the normal article review process, comes to around 7% per week. Compared to new articles created using usual editing tools, this figure is considerably low. As the tool is designed to create a good article by reusing existing content (in another language), this is an encouraging outcome and confirms the assumption that the initial translated version is significantly better in terms of content quality to merit retention.
Challenges about content syntax and improvement
Ever since the tool was first deployed as a beta feature in January 2015, the development team has made an active attempt to monitor the articles being created and examine how well do they fit into the respective wikis, primarily in terms of their internal structure and code—categories, links, templates, footnotes, general wiki syntax handling, and so on. Content Translation, by its inherent nature, is transforming text between diverse Wikipedia projects and this can lead to some issues caused by the differences between the projects in the use of templates, references and markup.
As the article creation and deletion statistics demonstrate, general observation is that the new articles appear to fit well. However, the wiki syntax’s cleanliness is a considerable challenge and new issues are being uncovered through regular use of the tool. Over the year we fixed numerous bugs in the handling of categories, templates and footnotes. While we have fixed many of these bugs already, we know that many still remain. We are thankful to the editors who report bugs in this area and help us understand and fix them.
Improvements to machine translation
Machine Translation improvements have been a recurring request from many users of Content Translation ever since the tool was made available. Until recently, Apertium was the only MT service that was available for Content Translation. Since 4 November 2015, however, Yandex machine translation service has been available for users of the Russian Wikipedia—where Content Translation is especially popular—and can be used when translating Wikipedia pages from English to Russian using Content Translation (see the announcement).
The translation service will be accessible via a freely available API, and the translated content returned by the service is freely licensed according to Wikipedia policy for use in Wikipedia articles. As the interaction between Content Translation and the translation service happens on the server side, no personal information from the user’s device is sent to Yandex. The translated content can be modified by users, just like usual content on wiki pages. The information about the modifications is also available publicly under a free license through the Content Translation API for anyone to develop and improve translation services (from University research groups, open source projects to commercial companies, anyone!). More information about this translation service is available on Mediawiki and in the Content Translation FAQ. For more details about the interactions between Content Translation and Yandex translation service, please see this image.
Enhancements have also been made to the Apertium machine translation service. As a result of recent changes, eight new language pairs are now covered by Apertium. These are, alongside the complete list:

  • Arabic – Maltese (both directions)
  • Breton – French
  • Catalan – Esperanto
  • French – Esperanto
  • Romanian – Spanish
  • Spanish – Esperanto
  • Spanish – Italian (both directions)
  • Swedish – Icelandic (both directions)

Upcoming plans and office hour
In our last update, we informed our readers about article suggestion that provides users a list of articles that can be translated for a certain language. Sometime soon, it will be possible for users to create collections that can be used for translathons or similar shared editing activity. If you have participated in such an event or organized one that involved article creation through translations, we would like to learn from you (via this form) more details about how Content Translation’s article lists can support this activity.
Please join us for the next online office hour on 25 November 2015 at 1300 GMT. We welcome your comments and feedback on the Content Translation project talk page and Phabricator.
Runa Bhattacharjee, Amir Aharoni, Language Engineering, Wikimedia Foundation

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

6 Comments
Inline Feedbacks
View all comments

I have used the Content Translation tool to translate sixteen articles from the English Wikipedia to the “Wikipedia en español”. It’s a fantastic tool! I encourage you to continue improving it.
Good job! ; )

Thank you Daniel. We really appreciate your kind words. Please do let us know if you have any suggestions on how we can make the tool better for you. Thanks.

Hi, Runa!
My suggestion to make the tool better is the same I wrote in this other post: The new Content Translation tool is now used on 22 Wikipedias.
“I’ve been translating English articles into Spanish. I always include a reference manually about the original article in English after the process of translation have finished (credits!). An improvement could be that the tool directly could do it.”
Thank you!

“Challenges about content syntax and improvement” : what you describe seems so far from what feels like happening. Sorry, but you’re not actively monitoring (or from far far away, just basic statistics), you’re not really fixing bugs either (usually, reported bugs for CX seem to be simply ignored, even after months)

Many articles exist only in a certain language, so I was using Google Translate. Now it’s easy to find articles in different languages.

very useful blog, thank you for sharing this