Algorithms and insults: Scaling up our understanding of harassment on Wikipedia

Visualization by Hoshi Ludwig, CC BY-SA 4.0.3D representation of 30 days of Wikipedia talk page revisions of which 1092 contained toxic language (shown as red if live, grey if reverted) and 164,102 were non-toxic (shown as dots). Visualization by Hoshi Ludwig, CC BY-SA 4.0.

“What you need to understand as you are doing the ironing is that Wikipedia is no place for a woman.” –An anonymous comment on a user’s talk page, March 2015

Volunteer Wikipedia editors coordinate many of their efforts through online discussions on “talk pages” which are attached to every article and user-page on the platform. But as the above quote demonstrates, these discussions aren’t always good-faith collaboration and exchanges of ideas—they are also an avenue of harassment and other toxic behavior.
Harassment is not unique to Wikipedia; it is a pervasive issue for many online communities. A 2014 Pew survey found that 73% of internet users have witnessed online harassment and 40% have personally experienced it. To better understand how contributors to Wikimedia projects experience harassment, the Wikimedia Foundation ran an opt-in survey in 2015. About 38% of editors surveyed had experienced some form of harassment, and subsequently, over half of those contributors felt a decrease in their motivation to contribute to the Wikimedia sites in the future.
Early last year, the Wikimedia Foundation kicked off a research collaboration with Jigsaw, a technology incubator for Google’s parent company, Alphabet, to better understand the nature and impact of harassment on Wikipedia and explore technical solutions. In particular, we have been developing models for automated detection of toxic comments on users’ talk pages applying  machine learning methods. We are using these models to analyze the prevalence and nature of online harassment at scale. This data will help us prototype tools to visually depict  harassment, helping administrators respond.
Our initial research has focused on personal attacks, a blatant form of online harassment that usually manifests as insults, slander, obscenity, or other forms of ad-hominem attacks. To amass sufficient data for a supervised machine learning approach, we collected 100,000 comments on English Wikipedia talk pages and had 4,000 crowd-workers judge whether the comments were harassing in 1 million annotations. Each comment was rated by 10 crowd-workers whose opinions were aggregated and used to train our model.
This dataset is the largest public annotated dataset of personal attacks that we know of. In addition to this labeled set of comments, we are releasing a corpus of all 95 million user and article talk comments made between 2001-2015. Both data sets are available on FigShare, a research repository where users can share data, to support further research.
The machine learning model we developed was inspired by recent research at Yahoo in detecting abusive language. The idea is to use fragments of text extracted from Wikipedia edits and feed them into a machine learning algorithm called logistic regression. This produces a probability estimate of whether an edit is a personal attack. With testing, we found that a fully trained model achieves better performance in predicting whether an edit is a personal attack than the combined average of 3 human crowd-workers.
Prior to this work, the primary way to determine whether a comment was an attack was to have it annotated by a human, a costly and time-consuming approach that could only cover a small fraction of the 24,000 edits to discussions that occur on Wikipedia every day. Our model allows us to investigate every edit as it occurs to determine whether it is a personal attack. This also allows us to ask more complex questions around how users experience harassment. Some of the questions we were able to examine include:

  1. How often are attacks moderated? Only 18% of attacks were followed by a warning or a block of the offending user. Even for users who have contributed four or more attacks, moderation only occurs for 60% of those users.
  2. What is the role of anonymity in personal attacks? Registered users make two-thirds (67%) of attacks on English Wikipedia, contradicting a widespread assumption that anonymous comments by unregistered contributors are the primary contributor to the problem.
  3. How frequent are attacks from regular vs. occasional contributors? Prolific and occasional editors are both responsible for a large proportion of attacks (see figure below). While half of all attacks come from editors who make fewer than 5 edits a year, a third come from registered users with over 100 edits a year.

Chart by Nithum Thain, CC BY-SA 4.0.Chart by Nithum Thain, CC BY-SA 4.0.

More information on how we performed these analyses and other questions that we investigated can be found in our research paper:

Wulczyn, E., Thain, N., Dixon, L. (2017). Ex Machina: Personal Attacks Seen at Scale (to appear in Proceedings of the 26th International Conference on World Wide Web – WWW 2017).

While we are excited about the contributions of this work, it is just a small step toward a deeper understanding of online harassment and finding ways to mitigate it. The limits of this research include that it only looked at egregious and easily identifiable personal attacks. The data is only in English, so the model we built only understands English. The model does little for other forms of harassment on Wikipedia; for example, it is not very good at identifying threats. There are also important things we do not yet know about our model and data; for example, are there unintended biases that were inadvertently learned from the crowdsourced ratings? We hope to explore these issues by collaborating further on this research.
We also hope that collaborating on these machine-learning methods might help online communities better monitor and address harassment, leading to more inclusive discussions. These methods also enable new ways for researchers to tackle many more questions about harassment at scale—including the impact of harassment on editor retention and whether certain groups are disproportionately silenced by harassers.
Tackling online harassment, like defining it, is a community effort. If you’re interested or want to help, you can get in touch with us and learn more about the project on our wiki page. Help us label more comments via our wikilabels campaign.
Ellery Wulczyn, Data Scientist, Wikimedia Foundation
Dario Taraborelli, Head of Research, Wikimedia Foundation
Nithum Thain, Research Fellow, Jigsaw
Lucas Dixon, Chief Research Scientist, Jigsaw

Editor’s note: This post has been updated to clarify potential misunderstandings in the meaning of “anonymity” under “What is the role of anonymity in personal attacks?”

Archive notice: This is an archived post from blog.wikimedia.org, and as such was written under a different editorial standard than Diff.

21 Comments
Inline Feedbacks
View all comments

In an exchange like “Go to hell!” “No, you go to hell!”, assuming the former counts as harassment, does the latter also?

The notion that a registered user is less anonymous than an IP editor is deeply flawed. The only editors who are not anonymous are those who openly disclose their real world identities. The ease of creating “throw away” Wikipedia accounts means that whether or not an insult is made from a registered account is a triviality unworthy of discussion.
Jim Heaphy
Cullen328

[…] [Link] [Comment] Source: New feed2 […]

[…] Wikimedia Foundation, die Organisation hinter der freien Enzyklopädie Wikipedia, veröffentlichte gestern auf ihrem Blog eine gemeinsam mit Jigsaw, einer Tochter der Google-Mutter Alphabet, erstellte Studie mit der […]

[…] Wikimedia Foundation, die Organisation hinter der freien Enzyklopädie Wikipedia, veröffentlichte gestern auf ihrem Blog eine gemeinsam mit Jigsaw, einer Tochter der Google-Mutter Alphabet, erstellte Studie mit der […]

Some examples of “harassment” would be nice. Please cite them out. Thanks.

[…] Algorithms and insults: Scaling up our understanding of harassment on Wikipedia. “Early last year, the Wikimedia Foundation kicked off a research collaboration with Jigsaw, […]

@Andy, depends. If I reply with, “No, I’m going to heaven, you’re not coming. . .” Would one automatically assume that one stay behind in hell while keep editing?

Fascinating work. Like we have a bot created list of copyright concerns, a bot created list of civility concerns would be useful.

[…] Algorithms and insults: Scaling up our understanding of harassment on Wikipedia. “Early last year, the Wikimedia Foundation kicked off a research collaboration with Jigsaw, […]

Some of the harasssment, comments and responses will have been in edit summaries, block logs and by email (some female editors have reported that Email harassment via throwaway accounts is one of the biggest parts of the problem). Am I correct in assuming that you only looked at talkpage posts? Did you get access to deleted posts? I’m assuming you can’t have had access to oversited posts, but this is something where deleted posts would be useful. If not the results will be somewhat skewed by omitting a bunch of the most egregious stuff. It would be interesting to know… Read more »

This type of definition of “harassment” reminds me of the time when someone posted a comment “Leviticus 20:13” under a Google Talk video, obviously referring to the host, and YouTube rejected my request to delete it because they did not consider that as inciting violence. Thinking of harassment in the way this article does favours the shy who do not dare to say anything and the well-educated native speakers while excluding the points of view of people who, often for economic or language reasons, either do not have the skills or time to disguise their disdain in ostensibly “acceptable” behaviour.… Read more »

Yes I agree 1,092 toxic comments out of 164,102 comments in total fits with my personal experience over the last 10 years. Most interactions by far are friendly or instructional. Would be interesting to look at what percentage of the toxic comments are from users who are not banned or blocked for sockpuppetry / paid editing?

[…] Wikimedia Foundation, die Organisation hinter der freien Enzyklopädie Wikipedia, veröffentlichte gestern auf ihrem Blog eine gemeinsam mit Jigsaw, einer Tochter der Google-Mutter Alphabet, erstellte Studie mit der […]

Interesting work, but as others have said, your reasoning in the passage “Registered users make two-thirds (67%) of attacks on English Wikipedia, contradicting a widespread assumption that anonymity is the primary contributor to the problem.” is deeply flawed. Take the example of Qworty, whose exploits were covered by Andrew Leonard in Salon: – salon.com/2013/05/17/revenge_ego_and_the_corruption_of_wikipedia/ – salon.com/2013/04/29/wikipedias_shame/ or Johann Hari: – newstatesman.com/blogs/david-allen-green/2011/09/hari-rose-wikipedia-admitted – independent.co.uk/voices/commentators/johann-hari/johann-hari-a-personal-apology-2354679.html Both had longstanding, very active registered Wikipedia user accounts. Yet their anonymity – no one knew their identity for years – was a key factor in enabling both their editing behaviour and their abrasive style of interaction… Read more »

[…] en Jigsaw hebben een onderzoek gedaan naar pesterijen en persoonlijke aanvallen op Wikipedia. Er wordt vaak gezegd dat anonimiteit online leidt tot slecht gedrag. Uit dit onderzoek blijkt dat […]

>>2. What is the role of anonymity in personal attacks? Registered users make two-thirds (67%) of attacks on English Wikipedia, contradicting a widespread assumption that anonymity is the primary contributor to the problem. No. Wrong. Sloppy. Anonymous registered accounts are still anonymous accounts. There is no real name registration process, there is no recording of registration information by WMF. What is being demonstrated is that 2/3 of attacks on En-WP are by registered accounts vs. IP accounts — which may or may not differ in a statistically significant manner from the populations of each of these groups editing at WP.… Read more »

Ars Technica now tells its readers, “New study of Wikipedia comments reveals most attackers aren’t anonymous.” https://arstechnica.com/information-technology/2017/02/one-third-of-personal-attacks-on-wikipedia-come-from-active-editors/ Nowhere does the piece explain that the actual truth is the exact opposite: most of the attackers ARE anonymous contributors. Just to be clear: writing under a pseudonym, while hiding your actual identity, is a time-honoured form of anonymous speech (see e.g. https://www.eff.org/issues/anonymity ). This is what happens when you use the word “anonymous” to mean something it doesn’t. Note that I’m not claiming that Wikipedia editors editing under their real names are automatically less likely to engage in toxic behaviour than anonymous… Read more »

Andreas, there are just different nuances of the term “anonymous”. Here is professor Ed Felten (former chief technologist of the FTC) explaining why “it’s clear that pseudonyms are not ‘anonymous'”, i.e. exactly the interpretation that you are so loudly denouncing as false in multiple venues right now: https://www.ftc.gov/news-events/blogs/techftc/2012/04/are-pseudonyms-anonymous And even the EFF piece you link to makes, at one point, the distinction between anonymous and pseudonymous: “people choose to speak using pseudonyms (assumed names) or anonymously (no name at all)”. That’s pretty close to the interpretation, on Wikipedia, of IP edits as “anonymous” (vs. logged-in edits which would be “pseudonymous”).… Read more »

For all those commenting about how “anonymous” is incorrect here, do you seriously expect a different result for real name accounts? I can think of several such identifiable people who will be in that top bracket of offenders. The idea that people knowing your real name stops you from being a jerk has been disproven massively by…….Wikipedia! All you need is a jerk who thinks ‘mission’ > ‘civility’. I could name a few admins with real name accounts who feel that way, and have said so, openly, in very public places on Wikipedia, and never faced any real collective rebuke… Read more »

Was there any related work done in any other languages? Any corpus or model?