Content Translation helps volunteers translate articles between different-language Wikipedias. Screenshot from “Content Translation Screencast” video. Screencast by Pau Giner, freely licensed under CC BY-SA 4.0.
The number of articles created with the Content Translation tool recently crossed the 30,000 mark. This tool is being used by more than 7000 editors to translate Wikipedia articles into many languages.
As per our recent observations, on average more than 1000 new articles are created each week using Content Translation. The number of articles deleted, as part of the normal article review process, comes to around 7% per week. Compared to new articles created using usual editing tools, this figure is considerably low. As the tool is designed to create a good article by reusing existing content (in another language), this is an encouraging outcome and confirms the assumption that the initial translated version is significantly better in terms of content quality to merit retention.
Challenges about content syntax and improvement
Ever since the tool was first deployed as a beta feature in January 2015, the development team has made an active attempt to monitor the articles being created and examine how well do they fit into the respective wikis, primarily in terms of their internal structure and code—categories, links, templates, footnotes, general wiki syntax handling, and so on. Content Translation, by its inherent nature, is transforming text between diverse Wikipedia projects and this can lead to some issues caused by the differences between the projects in the use of templates, references and markup.
As the article creation and deletion statistics demonstrate, general observation is that the new articles appear to fit well. However, the wiki syntax’s cleanliness is a considerable challenge and new issues are being uncovered through regular use of the tool. Over the year we fixed numerous bugs in the handling of categories, templates and footnotes. While we have fixed many of these bugs already, we know that many still remain. We are thankful to the editors who report bugs in this area and help us understand and fix them.
Improvements to machine translation
Machine Translation improvements have been a recurring request from many users of Content Translation ever since the tool was made available. Until recently, Apertium was the only MT service that was available for Content Translation. Since 4 November 2015, however, Yandex machine translation service has been available for users of the Russian Wikipedia—where Content Translation is especially popular—and can be used when translating Wikipedia pages from English to Russian using Content Translation (see the announcement).
The translation service will be accessible via a freely available API, and the translated content returned by the service is freely licensed according to Wikipedia policy for use in Wikipedia articles. As the interaction between Content Translation and the translation service happens on the server side, no personal information from the user’s device is sent to Yandex. The translated content can be modified by users, just like usual content on wiki pages. The information about the modifications is also available publicly under a free license through the Content Translation API for anyone to develop and improve translation services (from University research groups, open source projects to commercial companies, anyone!). More information about this translation service is available on Mediawiki and in the Content Translation FAQ. For more details about the interactions between Content Translation and Yandex translation service, please see this image.
Enhancements have also been made to the Apertium machine translation service. As a result of recent changes, eight new language pairs are now covered by Apertium. These are, alongside the complete list:
- Arabic – Maltese (both directions)
- Breton – French
- Catalan – Esperanto
- French – Esperanto
- Romanian – Spanish
- Spanish – Esperanto
- Spanish – Italian (both directions)
- Swedish – Icelandic (both directions)
Upcoming plans and office hour
In our last update, we informed our readers about article suggestion that provides users a list of articles that can be translated for a certain language. Sometime soon, it will be possible for users to create collections that can be used for translathons or similar shared editing activity. If you have participated in such an event or organized one that involved article creation through translations, we would like to learn from you (via this form) more details about how Content Translation’s article lists can support this activity.
Please join us for the next online office hour on 25 November 2015 at 1300 GMT. We welcome your comments and feedback on the Content Translation project talk page and Phabricator.
Runa Bhattacharjee, Amir Aharoni, Language Engineering, Wikimedia Foundation