Using privacy engineering to enable modern usability testing and defend against network attacks

Translate this post

The rapidly changing internet surfaces new opportunities and threats for the Wikimedia projects. We’re working with our communities to respond to these changes on many fronts. In this post, we’ll introduce “Edge Uniques”, a technical approach designed to help us understand our readers better, evolve our user experiences for people who read and participate on the projects, and defend against attacks that flood our websites with bad traffic. Now that we have completed the technical design phase, we are beginning implementation, with code development viewable in a public repository. This cookie is not yet active on Wikimedia sites.

Edge Uniques consists of a privacy-preserving first-party cookie that will enable usability testing of features, more accurate counting of site visits, and a way to stop distributed denial-of-service (DDoS) with better precision. 

The Edge Uniques solution reads, verifies, and discards a copy of the first-party cookie “at the edge” of our computing systems – meaning the first point where visitor traffic enters our network. This process minimizes the time this uniquely identifying information about a logged-out reader will be present in our system to seconds and typically milliseconds. 

Because we are not storing this cookie in our traffic logs or databases, we prevent the creation of user profiles that could be used to track the behavior of readers over time. Importantly, readers can block or clear these cookies at any time without negatively affecting their reading or editing experience. This solution will provide a standardized, privacy-preserving framework that developers can use when implementing new features, bots, tools, and gadgets that require continuity or analytics. 

Our challenges and solutions

The Edge Uniques solution helps us in the following ways:

1. Improve user experience through A/B testing 

Wikimedia editors work hard to create what is arguably the most important educational resource of our time. We must design experiences that present information effectively to many different kinds of readers. When we identify promising opportunities to improve the reader experience, we want to use controlled experiments known as A/B tests to evaluate our ideas. These usability tests are designed to expose one group of readers to a modified version of an experience while a control group continues using the unchanged version. This controlled experiment allows us to measure precisely how specific changes affect reader behavior and their overall experience. 

Consider an example feature idea: showing recommendations for more articles to read. We need to verify these recommendations work effectively for different readers – from desktop to mobile users, across languages including right-to-left scripts – while making sure they make the reading experience better rather than worse.

We have considered other approaches for A/B testing, but have found that they are not sufficiently private or accurate enough for our needs. Because of these limitations, we have had to use approaches less informative than controlled experiments: gradual rollouts spanning multiple months which would produce unreliable data due to uncontrollable external factors, or small-scale user studies that failed to represent our global readership.

The Edge Uniques solution will solve these problems by creating separate identification numbers for each test participant. These numbers are temporary and only work for one test at a time, which will allow us to run accurate tests without collecting any personal information or tracking reader browsing habits.

2. Enable protection against DDoS attacks

When we face distributed denial-of-service (DDoS) attacks that threaten to make Wikipedia unavailable, our servers are flooded with requests that attempt to overload the system so much that readers can’t access the website. In order to combat these attacks, the most readily available identifier we can work with today is IP addresses. We often limit or block IP addresses involved in attacks, but this creates problems because the same IP address can be shared by many users: university campus networks, public wifi, mobile networks where a lot of users share the same IP, and similar scenarios. This makes it difficult to limit attacks while avoiding impact on real users.

The Edge Uniques solution will improve this by providing a more reliable way to identify legitimate visitors than an IP address alone. The first-party cookies, which are stored in users’ browsers, will include historical information about how frequently a browser has visited our sites over time, but not what pages were visited. This history is difficult for bot attackers to fake, as it requires consistent interaction with our sites over time. This allows us to distinguish between genuine readers and sudden attack traffic, even when they come from the same IP address or network. During an attack, this additional context will help us maintain site availability while minimizing disruptions to real humans. 

3. Understanding our visitor trends

In order to plan how to improve Wikimedia products, we need to accurately count how many visits our wikis receive on different types of devices and in different geographies (among other dimensions).

With Edge Uniques, we will be able to count readership more precisely than we do today, without keeping any information that could be traced back to individual readers. We will use specialized counting methods that combine visitor numbers over different periods of time, allowing us to better understand how people use our sites while protecting their privacy.

Our next steps

Now that we have completed the technical design phase, we are beginning implementation, with code development viewable in a public repository. You can follow progress on the work in Phabricator. This cookie is not yet active on Wikimedia sites. 

To learn more about how this solution will process identifiers please read our full description on the Edge Uniques page on Meta-Wiki. If you have questions, please engage with us on the talk page or attend a community call to join the discussion.

Tajh Taylor, Vice President of Data Science and Engineering at the Wikimedia Foundation, at WikiConference North America 2023.
Tajh Taylor, Vice President of Data Science and Engineering at the Wikimedia Foundation, at WikiConference North America 2023.
Image by Jay Dixit, CC BY-SA 4.0, via Wikimedia Commons.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?