Try Content Translation: A quick way to create new articles from other languages

File:Content Translation Screencast (English).webm

Video: How to translate a Wikipedia article in 3 minutes with Content Translation. This video can also be viewed on YouTube (4:10). Screencast by Pau Giner, licensed under CC BY-SA 4.0

Wikimedia Foundation’s Language Engineering team is happy to announce the first version of Content Translation on Wikipedia for 8 languages: Catalan, Danish, Esperanto, Indonesian, Malay, Norwegian (Bokmål), Portuguese and Spanish. Content Translation, available as a beta feature, provides a quick way to create new articles by translating from an existing article into another language. It is also well suited for new editors looking to familiarize themselves with the editing workflow. Our aim is to build a tool that leverages the power of our multicultural global community to further Wikimedia’s mission of creating a world where every single human being can share in the sum of all knowledge.

Design

During early 2014, when the design ideas for Content Translation were being conceptualized, we came across an interesting study by Scott A.Hale of University of Oxford, on the influences and editing patterns of multilingual editors on Wikipedia. Combined with feedback from editors we interacted with, the data presented in the study guided our initial choices, both in terms of features and languages. We were fortunate to have met the researcher in person at Wikimania 2014, so we could learn more about his findings and references.

The tool was designed for multilingual editors as our main target users. Several important patterns emerged from a month-long user study, including:

  • Multilingual editors are relatively more active in Wikipedias of smaller size. Often the editors from smaller sized Wikipedias would also edit on a relatively large sized Wikipedia like English or German;
  • Multilingual editors often edited the same articles in their primary and non-primary languages.

These and other factors listed in the study impact the transfer of content between different language versions of Wikipedia; they increase content parity between versions — and decrease ‘self-focus’ bias in individual editions.

Languages

When selecting languages for the tool’s introduction, we were guided by several factors, including signs of relatively high multilingualism amongst the primary editors. The availability of high quality machine-translated content was an additional consideration, to fully explore the usability of the core editing workflow designed for the tool. Based on these considerations, Catalan Wikipedia, a very actively edited project of medium size was a logical choice. Subsequent language selections were made by studying possible overlap trends between language users — and the probability of editors benefiting from those overlaps when creating new articles. Availability of machine translation to speed up the process and community requests were important considerations.

How it works

The article Abel Martín in the Spanish Wikipedia doesn’t have a version in Portuguese, so a red link to Portuguese is shown.
Content Translation red interlanguage link screenshot by Amire80 , licensed under CC BY-SA 4.0

Content Translation combines a rich text translation interface with tools targeted for editing — and machine translation support for most language pairs. It integrates different tools to automate repetitive steps during translation: it provides an initial automatic translation while keeping the original text format, links, references, and categories. To do so, the tool relies on the inter-language connections from Wikidata, html-to-wikitext conversion from Parsoid, and machine translation support from Apertium. This saves time for editors and allows them to focus on creating quality content.

Although basic text formatting is supported, the purpose of the tool is to create an initial version of the content that each community can keep improving with their usual editing tools. Content Translation is not intended to keep the information in sync across multiple language versions, but to provide a quick way to reuse the effort already made by the community when creating an article from scratch in a different language.

The tool can be accessed in different ways. There is a persistent access point at your contributions page, but access to the tool is also provided in situations where you may want to translate the content you are just reading. For instance, a red link in the interlanguage link area (see image).

Next steps

Next steps for the tool’s future development include adding support for more – eventually all – languages, managing lists of articles to translate, and adding features for more streamlined translation.

In coming weeks, we will closely monitor feedback from users and interact with them to guide our future development. Please read the release announcement for more details about the features and instructions on using the tool. Thank you!

Amir Aharoni, Pau Giner, Runa Bhattacharjee, Language Engineering, Wikimedia Foundation

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

12 Comments
Inline Feedbacks
View all comments

Excellent 🙂
I’m really looking forward to using this tool to translate articles INTO English (I realise that’s not your priority, but it’s my personal use-case).
Can I suggest you add a time-coded transcript of that screencast – that makes it possible for people to translate the video, which would be an important thing to do for the languages that the tool already works in.

Regarding the next steps: Did you communicate somewhere which languages could be the next on your list? German maybe? ;o)

We are following what we currently call the ‘Graduation plan’ to add more languages. You can find it here: https://www.mediawiki.org/wiki/Content_translation/Languages . We will come up with the next set of languages in evaluation very shortly. Suggestions are very welcome. Thanks.

@Wittylama, Thanks for that suggestion. Meanwhile we also have an infographic that can be translated. You can find it here: https://www.mediawiki.org/wiki/Content_translation/Infographic

This is SO cool!

This is very cool – I wish someone can take this tool and make it available to a larger community beyond wikimedia as well

Great…but I need french and Japanese…Are these next in your list?

yes this is very nice ..but this way to make unique articles in searc engine?

[…] In use since January 2015, Content Translation automates many of the menial tasks of translating and, in some languages, uses open-source Apertium machine translation to create rough drafts. In the case of the closely related major languages of Scandinavia (Danish, Swedish, Nynorsk, and Bokmål, the last two being the written variants of Norwegian), the resulting draft text usually requires a minimal amount of effort before publication. […]

[…] recent times, you may have heard about and seen the Content Translation tool that helps editors create new encyclopedic articles by translating from a Wikipedia article in […]

[…] A few months ago, we heard news of the upcoming Wikimedia’s ContentTranslation tool, and we’re really happy to find that the very first language tests were planned between Spanish and Catalan. Our community responded to this news with great enthusiasm and we have been testing the tool for months now. The development team has kindly listened to our comments and demands, while implementing many of our shared recommendations. […]

[…] on the content translation tool, please see Mediawiki.org, the Signpost, and this very blog (1, […]