Where did that paragraph go? This software change helps volunteers hold up Wikipedia’s high quality

Translate this post

One key to Wikipedia’s high quality is a system of mutual checks, based on the fact that every version of every page is stored and accessible. Thousands of community members review the latest edits of others in order to find errors or inconsistencies, comparing each new version of a page to older page versions and checking if the new content complies with guidelines for citation, style, orthography and more. Edits are also scrutinized for subjectivity, copyright violations or vandalism. If a problem is found, it usually gets corrected within minutes.

Wikipedia editors know which articles may need monitoring through a variety of ways. Many logged-in contributors save the pages they’re interested in to their personal watchlists and routinely check them for quick overviews of changes made on those pages. Furthermore, each language version of Wikipedia has a recent changes page, a ticker that shows the latest edits to all of its pages. Users who closely monitor this page can use filters to view the types of edits they’re interested in, such as changes by unregistered users. There are also ways to check changes by looking at all edits a particular user has made, or by directly examining the version history of a specific Wikipedia page.

A widely used tool for comparing versions of a Wikipedia page is the wikitext diff, a two-column view that shows an older version of a page on one side and the newer version on the other side. The tool displays the two versions in the wikitext markup and highlights differences between them with a color code.

Screenshot of a Wikipedia "diff."
A simple wikitext diff: In the newer version, some text was removed (highlighted in yellow) and some other text added (highlighted in blue).

However, in the past, it was often hard and time-consuming to compare page versions. Due to a technical limitation, whenever a part of a text was simply moved to another position on the page, it was displayed as if it had been removed and some other text had been added. Even worse, there was no easy way to see if someone had changed the text that had been moved. In consequence, Wikipedia editors had to spend time checking whether a text had been moved or removed and then more time identifying changes between the different versions.

Screenshot of the Wikipedia "diff" resulting from moving an entire paragraph.
This is what the diff view now looks like on mobile devices. In this example, the paragraph “The smallest dog […]” was moved down on the page, and the word “merely” was replaced by “only”.
We wanted to create a wikitext diff view that would show both moved text chunks and the changes inside them. But what might sound like a simple change was actually a very delicate task for two reasons: first, changes to the diff code can affect the speed of MediaWiki software, and second, detecting moved pieces of text isn’t trivial: How much can a paragraph be changed to still qualify as the same, moved piece of text?

The Technical Wishes team from Wikimedia Germany (Deutschland), the German Wikimedia chapter in Berlin, took on this task, supported by software teams from the Wikimedia Foundation. Our project aims to improve the software behind Wikipedia, so our developers dove deep into the wikidiff code, and put a lot of effort into improving, fine tuning and testing it.[1]

After lots of programming, testing and even more testing, the wikitext diff now clearly indicates moved text chunks with the help of little arrows, and highlights changes that were made within them. This change has been active on most Wikipedias for a few weeks now.

Screenshot of the Wikipedia "diff" resulting from moving an entire paragraph. Individual words changed in that paragraph are now highlighted for further inspection.
Now it’s clearly indicated that two paragraphs were moved and which text was changed within them.

The most recent news from the world of diffs is on your phone: As of this week, moved text chunks are shown correctly on mobile devices as well. In order for this to happen, the Wikimedia Foundation’s Reading Web team took our recent changes in the diff code and developed styles for it in the mobile view.

See caption.
This is what the diff view now looks like on mobile devices. In this example, the paragraph beginning “The smallest dog” was moved down on the page, and the word “merely” was replaced by “only”.

And last but not least, a similar technical improvement was released in early 2018 by the Wikimedia Foundation: The Visual Diff, a tool for users who prefer a visual view over wikitext, also shows changes in moved text chunks. The code behind it, however, is completely independent from the code of the wikitext diff.

We’re hoping that all these improvements are making the life of many contributors easier and will support them in the vital work they do in quality assurance.

Johanna Strodt, Project Manager Communications
Wikimedia Germany (Deutschland)


1. If you’re interested in our challenges and learnings, this post is for you.

Archive notice: This is an archived post from the News section on wikimediafoundation.org, which operates under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?