How Turnitin helps the Wikimedia Foundation preserve the integrity of Wikipedia content

Translate This Post

“Our partnership with Turnitin has been crucial for the credibility and reliability of our content.”

Karolin Siebert, Engineering Manager at Wikimedia Foundation

The Wikimedia Foundation hosts a continuously growing library of content on Wikipedia, the largest and most popular free-use online encyclopedia. iThenticate, provided by Turnitin, allows Wikipedia editors to efficiently ensure content integrity and prevent copy infringement. The premium similarity detection tool empowers the volunteers who specialize in reviewing Wikipedia content for copyright infringement to process and vet hundreds of thousands of edits daily. This helps to uphold the reputation of the well-respected non-profit as well as protect it from copyright infringement issues.

Content integrity as a foundation of Wikipedia’s credibility 

For the Wikimedia Foundation, which seeks to cultivate and spread free and trustworthy educational information, content integrity is crucial. The key guiding principles here are as follows:

  1. Plagiarism undermines the credibility of encyclopedic content
  2. Copyright infringement undermines the integrity of Wikipedia’s free use licensing

As a recognized non-profit with a global mission, the Wikimedia Foundation relies on nearly 50 million registered editors for its articles and content. These volunteers, who upload their contributions under a free-use license, are largely anonymous and come from all over the world. Every passing minute brings another 350 edits on average among the nearly 7 million articles. Each one of these instances could mean a new potential issue related to plagiarism or copyright infringement, or both.

This is a challenge because the Wikimedia Foundation strives for free knowledge and credibility. Its global mission of broadening education possibilities and providing free-use content depends on the integrity of its licensing. For this reason, Wikipedia maintains a comprehensive definition of copyright violations. Any text that is copied word-for-word or closely paraphrased from non-free sources, or from free-use sources without proper attribution, is strictly unacceptable.

Simply stated, copyright infringement on Wikimedia projects is the use of material from protected works without the content being freely licensed, or in violation of a free license. As copyrights exist to preserve profit opportunities, it is not typical for copyright holders to desire that their material is published on a free-use platform such as Wikipedia.

Wikipedia’s partnership with Turnitin

Fortunately, the Wikimedia Foundation is not alone in championing the transformative power of education. Its goals align with those of Turnitin, the global leader in similarity checking and ensuring the integrity of education and research. The two began an official partnership in 2015 to bring Turnitin’s advanced similarity detection software to Wikipedia editors. Turnitin’s iThenticate offers them massively enhanced breadth and depth of scrutiny in vetting new content and edits. The partnership allows the Wikimedia Foundation to freely pursue and deliver on its goals at scale.

As Karolin Siebert, Engineering Manager at Wikimedia Foundation puts it, “our partnership with Turnitin has been crucial for the credibility and reliability of our content.”

Wikipedia uses iThenticate to detect potential plagiarism and copyright violations through CopyPatrol. CopyPatrol is a tool developed and maintained by the Community Tech team of the Wikimedia Foundation to flag Wikipedia edits that may contain possible copyright violations. The automated bot uses iThenticate to compare edits against Turnitin’s vast database of web content and scholarly publications. Its pinpointing of potential issues helps the moderating community to quickly ensure that content conforms to integrity norms and remains freely distributable. This speeds up the content creation process and at the same time minimizes vulnerability. Considering the sheer volume of information and the small group of editors responsible this is a significant component of the editing process.

“iThenticate makes this task much more feasible,” says CopyPatrol user and Wikipedia administrator Moneytrees, “distilling this enormous (and important task) into a much more manageable one.” That way, editors can “quickly handle [issues] as they come up.”

How CopyPatrol uses iThenticate

The particular capabilities of Turnitin’s iThenticate allow CopyPatrol to perform at a higher level than other alternative solutions. iThenticate is able to detect close paraphrasing better than competitor solutions and the coverage of the Turnitin content database is unparalleled. Not only does iThenticate allow CopyPatrol to discover similarities with paywalled sources such as journal articles but even historical versions of websites that may be difficult to find. “You wouldn’t necessarily be able to find these matches from a Google search,” says CopyPatrol user DanCherek. With a focus on preventing copyright infringement, this is crucial.

The CopyPatrol team can quickly comb through flags and eliminate copyrighted content from Wikipedia articles as it comes up. The detailed reports from iThenticate also make it obvious where an entire article needs checking. The efficiency iThenticate brings to the CopyPatrol bot even allows for the quick inspection of a particular editor’s entire edit history when necessary.

The level of accountability enabled by the system serves to uplift the entire Wikipedia community. It is not only reactive, it is proactive. iThenticate, according to Wikipedia administrator Diannaa, helps to prevent problems by providing “the opportunity to educate users as to our expectations.” 

Since the launch of CopyPatrol in June of 2016, Wikipedia has logged over 305,000 records of possible copyright violations in its database, for which it largely credits iThenticate.

“iThenticate itself doesn’t prevent copyright violations, but rather gives us the means to help new people who do not understand…before it becomes a problem,” says Sennecaster.

Advancing education together

The Wikipedia content which in turn is indexed by Turnitin is crucial for academic institutions that use Turnitin solutions to support academic integrity. The free encyclopedia is popular among students as a source of information and research but at times awareness of potential plagiarism and proper citation practices may fall short.

By adding Wikipedia content to its database, Turnitin offers robust text similarity detection solutions. Instructors can be confident assessing the originality of student work and using Turnitin data to help teach students how to work with sources properly. At the moment, the Turnitin database counts 53.5 million items from Wikipedia, across 264 languages. This has grown from 32.7 million in 2016.

The Wikimedia Foundation and Turnitin look forward to continuing the partnership, with plans to expand the coverage of CopyPatrol to other Wikipedia languages.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?