Privacy in public: a new whitepaper on privacy in Wikipedia research

Translate this post
Stone boundary wall running alongside a grassy field, with trees and hedgerows in the background under a cloudy sky

Wikipedia is a global knowledge resource visited more than 15 billion times every month by humans and machines. However, the usage of Wikipedia does not stop here. Wikipedia is a vital resource for researchers around the world. On average, every year researchers use or refer to Wikipedia in more than 130,000 research papers and publish approximately 500 papers about Wikipedia itself.1 However, conducting research using Wikipedia data requires careful navigation of privacy considerations on a project that is global, open, and transparent by design.

In our years of working with Wikipedia researchers and the Wikipedia community of editors we have identified two recurring points of tension: First, researchers may assume that because a piece of data about a Wikipedia editor is publicly or via consent available, they can use that data as they see fit. Second, the Wikipedia community of editors can have varying expectations of privacy as Wikipedia editors’ circumstances or the technological advancements change over the years. In 2023, English Wikipedia’s Arbitration Committee requested that the Wikimedia Foundation develop a whitepaper which guides researchers in navigating privacy of Wikipedia editors in their research.  

In response to a request from the English Wikipedia Arbitration Committee, the Wikimedia Foundation prioritized a white paper. Its goal is to offer guidance to researchers and editors. For researchers, guidance for how to study Wikipedia in ways that respect editor privacy and community values. And for editors, guidance on understanding the tradeoffs that researchers face, potential privacy risks of participation, steps to protect personal identity, and pathways to engage with researchers to provide feedback.

In collaboration between the Wikimedia Foundation’s Research Team, Human Rights Team, and a research ethicist from Marquette University, we recently published Privacy in Public: Navigating Research, Personal Data, and Safety on Wikipedia, following multiple rounds of review and feedback from Wikipedia researchers, editors, and Wikipedia Arbitration Committee members.

Here are a few highlights.

Key takeaways for researchers

In the paper, we provide recommendations to support researchers, such as: 

Know the norms and guidelines: Review Wikipedia’s Terms of Use and the Wikimedia Foundation’s Universal Code of Conduct before you start collecting or analyzing data.

Share your plans early: Create a public project page on Meta-Wiki.

Respect privacy: Do not publish usernames, diffs, or quotes that could identify contributors—especially without consent. Avoid combining Wikipedia data with external datasets that could lead to re-identification. Anonymity is not just a preference; it’s sometimes a safety measure.

Evaluate the risks: What’s low-risk in one country might be dangerous in another. Consider editors’ geographic, political, and social contexts when designing your research.

Finally, recognize the significance of trade-offs you may have to make when balancing your obligations as a researcher to navigate the tension between sharing transparently while also maintaining the privacy and anonymity of readers and editors.

Key takeaways for Wikipedians

The paper also contains tips and guidance for Wikipedians.

Learn about trade-offs researchers need to make: One of the biggest challenges researchers face when studying Wikipedia and communicating their findings is balancing their obligation to share the truth with the world in a transparent way while respecting Wikipedia communities’ privacy expectations. As a result, sometimes the choices researchers make may be in tension with what, as a Wikipedian, you would expect or wish for. In the white paper, we provide some guidance about how to plan for and take action in such cases.

Understand the risks and control what you share: Even your editing history can reveal clues about your identity. Be especially cautious if you edit on politically sensitive topics. Control what you share, and consider using a pseudonymous username and/or avoiding adding identifying details to your use page.

Explore the Wikimedia Foundation’s digital security trainings such as “Assess your digital security risks” and guidance from other contributors, such as “How Not to Get Outed on Wikipedia.”

Most importantly, engage on your terms: Ask questions, learn about projects, and raise concerns proactively with researchers.

Balancing the need for research and privacy

Research helps Wikipedia grow and improve, and so does protecting the people who make it possible. Researchers bring valuable expertise to help Wikipedia evolve, but that work only succeeds when balanced with the privacy and anonymity expectations of editors. By honoring those protections and recognizing the trade-offs involved, researchers and Wikipedians together help keep Wikipedia open, safe, and a global source of reliable encyclopedic knowledge.

How can you help?

Writing the guidance is one step in a journey for better collaborations between Wikipedia researchers and editors. We need your help to make sure as many people are aware of the guidance. Here are some ways you can help:

  • Are you a researcher? Read the recommendations for researchers (Section 4.1 and the full paper). Use them in your research and share them with other researchers.  
  • Are you a Wikipedia editor? Read recommendations for Wikipedia editors (Section 4.2
  • Are you a Wikimedia affiliate or contributor who onboards researchers to the projects? Familiarize yourself with the whitepaper and share it as a resource with researchers.
  1. As measured by the number of search results in https://scholar.google.com/ when searching for articles with the word “Wikipedia” in their title. ↩︎

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?