Privacy and transparency: Country and Territory Protection List Policy Update

Translate This Post

Recently, the Wikimedia Foundation has updated the Country and Territory Protection List Policy, a list of countries and territories about which we limit data publication for privacy reasons. At the Wikimedia Foundation, we are committed to enabling people to participate in the free knowledge movement without having to provide personal information. As such, our projects purposefully collect very little personal information and any personal information that is collected is retained for the shortest possible time. Paramount to our goals is ensuring a safe environment online for our community of readers, editors, and administrators. 

In addition to our commitment to privacy, we also believe that transparency and open access are foundational values of the Wikimedia movement. Nonetheless, there have been very reasonable privacy concerns about releasing certain kinds of data in the past. Any release of aggregated raw data from sensitive countries and territories has the potential to inadvertently reveal someone’s location or activity. The Country and Territory Protection List (CTPL) Policy was originally built in 2019 to help mitigate this risk. It is a blocklist of countries and territories that we don’t publish data for as a privacy mitigation. The first iteration of this blocklist was developed by relying on independent data provided by panels of subject matter experts by creating a composite score from Freedom House and Reporters Without Borders, and excluding any country or territory that was in the lowest category. Since 2019, this mitigation has allowed the geoeditors public monthly data release to be published and subsequently visualized in tools like Wikistats. As we have continued to publish more and more geolocated data since then, the country protection list became a de facto standard. 

Since 2021, WMF has started using differential privacy on some data releases. This statistical technique allows us to both quantify and strictly bound the privacy risk of a given data release to the individuals in the raw dataset. These new capabilities have prompted an update of the Country and Territory Protection List Policy. By using differential privacy, this update to the CTPL allows us to strike a balance between transparency and privacy, simultaneously enhancing WMF’s ability to publish information and preserving everyone’s ability to contribute to our projects online.

What has changed?

The biggest change in this iteration of the Country and Territory Protection List Policy is a move away from binary categorizations of countries and territories. In the past, if a country or territory scored in the lowest category in either of the Freedom House or Reporters Without Borders annual report, WMF put the country on the CTPL and did not publish data about it. Going forward, WMF will look at each country or territory’s scores in these reports and sort them into a risk framework that is not binary.

The result of this risk calculation is four categories: lower risk, medium risk, higher risk, and not published. Statistics about countries and territories with lower risks associated with publishing data can continue to be published as before, without differential privacy. Statistics about countries with medium and higher risks associated with publishing data, which were previously not published, can now be published using differential privacy. The CTPL Policy sets strict and conservative bounds on data releases about these countries and territories to ensure that privacy risks to users are low. Finally, there is a smaller set of countries and territories that are not published for safety reasons, even using differential privacy.

This gradation of risk allows for more nuance in our evaluation of data releases, and should ultimately enable the safe release of more Wikimedia platform data. Over the next few months, we hope to use this policy to enable the release of granular geoeditors data, pageview data, latency data, and more.

In the next section, we’ll dive into some of the findings from the differentially private geoeditors monthly release, the first dataset that will soon be compliant with the new CTPL Policy.

Case study: Russian Wikipedia

Under the previous version of the CTPL Policy, WMF did not release data about several countries where Russian is widely spoken. The new version of the policy allows us to release this data with stringent privacy guarantees. This means that Russian Wikipedia will see a large change in the amount of data visible. 

We’ve visualized some of this data below. For example, here’s a month-by-month GIF of the total number of editors on Russian Wikipedia in every country. Note that the scale on the right-hand side is logarithmic — that means the high end is on the order of ~35,000 editors and the low end is 1 editor.

We’ve also put together a series of line graphs that compare editor activity over time from nine communities of Russian Wikipedia editors. Note that, again, the y-axis scale is logarithmic. We’ve also plotted the release threshold for each country as a dotted gray line — this value represents the count below which we will not publish data, and varies based on whether or not the country is lower, medium, or higher risk.

This latest update to the Country and Territory Protection List will allow us to publish more data, aligning with the value of transparency that we hold so central at the Wikimedia Foundation, in a way that minimizes additional safety and privacy risk to our contributors, editors, and admins. Our projects could not operate without the everyday work of our volunteer contributors, and with this policy update, more information will be released to help our volunteers with this global work. We remain committed to enabling people to participate in the free knowledge movement without having to provide personal information and ensuring a safe online environment for our readers, editors and administrators.

Ellen Magallanes is Senior Counsel at the Wikimedia Foundation and Hal Triedman is Senior Privacy Engineer at the Wikimedia Foundation.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

No comments

Comments are closed automatically after 21 days.