Wikimedia Research Newsletter, June 2018

Translate this post

“On the Self-similarity of Wikipedia Talks: a Combined Discourse-analytical and Quantitative Approach”

Reviewed by Maik Stührenberg

This paper[1] is thoroughly structured and combines the theory of web genres with dialogue theory to examine Wikipedia talk pages. Since Wikipedia is a web genre, “Wikicussions” (as the authors call them) form a subgenre. In this context, talk pages are examined further, including the quality of cooperation between Wikipedia users, that can be linked to social differentiation regarding roles and statuses of Wikipedians (content- vs. administration-related users). These group-related processes can be seen as a mediating layer between external parameters (system requirements for Wikipedia’s user community) and the structure and dynamics of WP’s subgenres.

Unlike face-to-face dialogue, the authors argue that Wikicussions stand out due to a publicly available common ground (derived from dialogue theory), which may provide a reason for the structures they found.

The paper is enriched with a number of high-quality figures that support and underpin the findings.

Graph between November 2000 and November 2015 clearly demonstrating that most posts come from registered users

Frequency distribution of talk posts over time within the German Wikipedia (blue: registered users; red: anonymous users; green: bots; black: all users). Unsigned posts (without timestamps) are excluded. Posts dated by posters outside of the valid time-frame (before the date of creation of the discussion or after the date of its download) are also excluded. (Figure 7 from the paper, by Alexander Mehler, Rüdiger Gleim, Andy Lücking, Tolga Uslu and Christian Stegbauer, CC BY-SA 4.0 )

“How Sudden Censorship Can Increase Access to Information”

Reviewed by Bri and Tilman Bayer

Our intuition might tell us that government censorship causes reduced access to online information. But recent research indicates that the effect can be exactly the opposite. Using data gathered from Wikipedia page views and other sources, researchers William Hobbs and Margaret Roberts found that:

[…] citizens accustomed to acquiring this [forbidden] information will be incentivized to learn methods of censorship evasion […] millions of Chinese users acquire[d] virtual private networks, and subsequently […] began browsing blocked political pages on Wikipedia, following Chinese political activists on Twitter, and discussing highly politicized topics such as opposition protests in Hong Kong.[2]

Specifically, the authors studied the impact of a block of Instagram in China on September 29, 2014, following protests in Hong Kong, on Chinese Wikipedia pages that were already blocked in the country. (This predates the 2015 total block of the Chinese Wikipedia and the switch of all Wikimedia sites to full encryption with HTTPS around the same time, which made such per-page blocking impossible.) The censored Chinese Wikipedia pages with the largest increase in views “shows that new viewers accessed pages that had long been censored including those related to the 1989 Tiananmen Square protests”,[2] i.e. “viewing patterns that would be more typical of new users who had just jumped the firewall, rather than of old VPN users who had presumably consumed this information long ago.”[2] Here is an excerpt of the full list examined in the research, the top 10 for the second day of the block, linked here to their English Wikipedia equivalents:

  1. People’s Republic of China blocked websites list
  2. Jiang Zemin
  3. Radio Australia
  4. Hu Jintao
  5. Zeng Qing
  6. Wang Weilin (Tank Man)
  7. Li Peng
  8. Tiananmen Square Incident
  9. Zhou Yongkang
  10. Wu’erkaixi (June 4 leader)

The researchers propose to name this phenomenon the “gateway effect”, a “mechanism through which repression can backfire inadvertently, without political or strategic motivation”,[2] because it incentivizes people to learn how to evade censorship and thus “have more, not less, access to information and begin engaging in conversations, social media sites, and networks that have long been off-limits to them.”[2] They distinguish it from the Streisand effect, where individuals specifically seek out information that is being hidden.

The second author of the study, Margaret Roberts, is also the author of Censored: Distraction and Diversion Inside China’s Great Firewall (Princeton University Press, 2018; print ISBN 978-0-691-17886-8, e-book ISBN 978-1-400-89005-7).

Marketing, social media, and Wikipedia

Reviewed by Barbara Page

This study was able to “characterize” the interests of Wikipedia editors and the editors’ social media activity on Twitter to facilitate:

Photograph of person's left hand holding a smartphone that is accessing social media

A marriage between editor editing topics and Twitter (and possibly Facebook) will result in targeted marketing tailored just for you!
(Photo: Harland Quarrington/MOD, OGL)

[…] building rich user profiles, which can be conveniently used in order to provide personalized contents and offers.” and “[…], i.e., the detection of the user’s core interests and, therefore, allows for product and service recommendations far more tailored than those stemming from other (usually) extemporary actions on the Internet, like flight ticket purchases and hotel reservations. In this light, it is important to notice that such a profiling potential associated to social login remains nowadays largely unused and enabling its exploitation is one of the main goals of the present work.[3]

Conferences and events

See the community-curated research events page on Meta-wiki for other upcoming conferences and events, including submission deadlines.

WMF research showcase

Recent presentations at the monthly Research showcase hosted by the Wikimedia Foundation included the following:

“Conversations Gone Awry: Detecting Early Signs of Conversational Failure”
PDF of "Conversations Gone Awry" with first page depicted

Presentation slides (video)

Antisocial behavior can exist in online social systems and may include harassment and personal attacks. A new paper[4] by seven researchers from Cornell University, Jigsaw, and the Wikimedia Foundation describes how the prediction of undesirable negative exchanges may be able to prevent the deterioration of a discussion. Prediction may be possible at the start of a conversation to prevent its deterioration. One of the authors also gave an interview published on the Wikimedia Foundation’s blog,[supp 1] and the paper was covered in popular media; see In the media § In brief.

Case studies in the appropriation of ORES

From the announcement (by Aaron Halfaker):

PDF of "ORES appropriation and reflection" with first page depicted

Presentation slides about the use of the ORES platform (video)

ORES is an open, transparent, and auditable machine prediction platform for Wikipedians to help them do their work. It’s currently used in 33 different Wikimedia projects to measure the quality of content, detect vandalism, recommend changes to articles, and to identify good faith newcomers. The primary way that Wikipedians use ORES’ predictions is through the tools developed by volunteers. These javascript gadgets, MediaWiki extensions, and web-based tools make up a complex ecosystem of Wikipedian processes – encoded into software.

The presentation covered “three key tools that Wikipedians have developed that make use of ORES”: Wikidata’s damage detection models, exposed through Recent Changes; Spanish Wikipedia’s PatruBOT; and WikiEdu tools from User:Ragesoss that incorporate article quality models.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer
  • “On the Effects of Authority on Peer Motivation: Learning from Wikipedia”[5] – From the abstract: “We show that lateral authority, the legitimacy to resolve task‐specific problems, is welcomed by members of an organization in the resolution of coordination conflicts, the more so (1) the fiercer the conflict to be resolved, (2) the higher the competence‐based status of the authority, (3) the lower the tenure of, and (4) the more focused the organizational members are. Analyzing the discussion behavior of members of Wikipedia between 2002 and 2014, we corroborate our allegations empirically by analyzing 642,916 article–discussion pages.”
  • “A Comparison of the Historical Entries in Wikipedia and Baidu Baike[6] – From the abstract: “This research purposefully chose 6 entries and developed a framework to evaluate their performance in accuracy, breadth, depth, informativeness, conciseness and objectiveness. The result shows that: Wikipedia is superior in most cases while Baidu Baike is a little better in the entries on Chinese history. The operating mechanism is the main reason for it.”
  • “Sentiments in Wikipedia Articles for Deletion Discussions”[7] – From the abstract: “We performed sentiment analysis on 37,761 AfD discussions with 156,415 top-level comments and explored relationship between outcomes of the discussion and sentiments in the comments. Our preliminary work suggests: discussion that have keep or other outcomes have more than expected positive sentiment, whereas discussions that have delete outcomes have more than expected negative and neutral sentiment. This result shows that there tends to be positive sentiment in the comment when Wikipedia users suggest not to delete the article.”
  • What are these researchers doing in my Wikipedia?’: ethical premises and practical judgment in internet-based ethnography”[8] – From the abstract: “The article reflects on the heuristics that guided the decisions of a 4-year participant observation in the English-language and German-language editions of Wikipedia. […] it interrogates the technological, social, and legal implications of publicness and information sensitivity as core ethical concerns among Wikipedia authors. The first problem area of managing accessibility and anonymity contrasts the handling of the technologically available records of activities, disclosures of personal information, and the legal obligations to credit authorship with the authors’ right to work anonymously and the need to shield their identity. The second area confronts the contingent addressability of editors with the demand to assure and maintain informed consent.” (See also the Wikipedia essay “What are these researchers doing in my Wikipedia?“)
  • “Digging Wikipedia: The Online Encyclopedia As a Digital Cultural Heritage Gateway and Site”[9] – From the abstract: “[…] this article introduces Wikipedia as a digital gateway to and site of an active engagement with cultural heritage. We have developed the open source and freely available analysis architecture Contropedia [website ] to examine already existing volunteer user-generated participation around cultural heritage and to promote further engagement with it. Conceptually, we employ the notion of memory work, as it helps to treat Wikipedia’s articles, edit histories, and discussion pages as a rich resource to study how cultural heritage is received and (re)worked in and across languages and cultures. […] The analysis facilitated by Contropedia […] sheds light on the contentious articulation of perspectives on tangible and intangible heritage grounded by conflicting conceptions of events, ideas, places, or persons. Technologically, Contropedia combines techniques based on mining article edit histories and analyzing discussion patterns in talk pages to identify and visualize heritage-related disputes within an article, and to compare these across language versions.” (cf. earlier coverage: “‘Contropedia’ tool identifies controversial issues within articles“; “Towards better visual tools for exploring Wikipedia article development – the use case of ‘Gamergate controversy)
  • “Use of Louisiana’s Digital Cultural Heritage by Wikipedians”[10] – From the abstract: “This case study details an analysis of Wikipedia links to online resources from Louisiana cultural heritage institutions [also known among Wikimedians as GLAMs] in order to determine what types of cultural heritage resources users are citing on Wikipedia, what is the content of the Wikipedia articles with Louisiana CHI citations, and how this can influence the work of CHI. The results of the study include findings that digital library items and archival finding aids are the most cited sources from cultural heritage institutions on Wikipedia and are particularly popular for Louisiana-specific Wikipedia articles on society and the social sciences and culture and the arts.”
  • “The Conceptual Correspondence between the Encyclopaedia and Wikipedia”[11] – From the abstract: “This study […] focuses on the roles and attributes of both printed encyclopaedias and Wikipedia. First, we analyse the roles and attributes of an encyclopaedia by conducting a review of research related to them. Then we analyse whether or not Wikipedia fulfills the same roles and has the same attributes as the encyclopaedia by reviewing academic work that investigates and analyses Wikipedia from various perspectives. The results show that Wikipedia does not conceptually correspond to an encyclopaedia, except in cases where people use it for one-time searches. In the world of digital media, Wikipedia does not have the same status that the encyclopaedia holds in the world of print media.”
  • “Structural Differentiation in Social Media: Adhocracy, Entropy, and the ‘1 % Effect[12] – From the text: “Over the study period (2001–2010), we observed 235,701,162 edits completed by 22,792,847 unique contributors. Of these, 19,680,637 users were anonymous, identified only by their unique IP addresses. The rest (3,112,210) were registered users who were logged into their respective accounts. […] logged-in users were the clear minority group, yet they contributed far more edits than the anonymous users—all told, those logged-in individuals were responsible for almost two-thirds (68%) of the observed revisions. Even more importantly, the top 1% of all contributors were responsible for 77% of the collaborative effort based upon the extent to which the text of articles was actually changed (i.e., the contribution delta). [… The] simple answer to research question 2 (RQ2), ‘What is the social mobility (or its inverse, elite “stickiness”) of functional leaders on Wikipedia over time?’ is that on average, across the entire 9.5-year period, an individual who was a top contributor at a given point in time had a 40% probability of remaining in the top contributor group 5 weeks later. Twenty weeks later, that individual would have a 32% chance of still being a top contributor, and after 30 weeks, this figure would be at 28%.”

    In a press release by Purdue University, one of the authors commented: “What we saw is that a clear leadership has emerged, but it’s a leadership that cycles. We have a group of individuals who shape the content by working the hardest and clocking the most hours. The agenda is shaped by these people, and they’re driven by a sense of mission, much like political or religious movements.”[supp 2]


  1. Mehler, Alexander; Gleim, Rüdiger; Lücking, Andy; Uslu, Tolga; Stegbauer, Christian (January 30, 2018). “On the Self-similarity of Wikipedia Talks: a Combined Discourse-analytical and Quantitative Approach” (PDF). Glottometrics (RAM-Verlag, published January 2018). 40: 1–45. ISSN 1617-8351. OCLC 7493144471. Archived (PDF) from the original on June 28, 2018. Retrieved June 28, 2018 – via ResearchGate.  Open access
  2. a b c d e Hobbs, William R.; Roberts, Margaret E. (April 2, 2018). “How Sudden Censorship Can Increase Access to Information”. American Political Science Review (Cambridge University Press): 1–16. ISSN 0003-0554. OCLC 7435466814. doi:10.1017/S0003055418000084.  Closed access
  3. Torrero, Christian; Caprini, Carlo; Miorandi, Daniele (April 9, 2018). “A Wikipedia-based approach to profiling activities on social media”. arXiv:1804.02245v2 [cs.IR].  Free to read
  4. Zhang, Justine; Chang, Jonathan P.; Danescu-Niculescu-Mizil, Cristian; Dixon, Lucas; Yiqing, Hua; Thain, Nithum; Taraborelli, Dario (May 14, 2018). “Conversations Gone Awry: Detecting Early Signs of Conversational Failure”. arXiv:1805.05345v1 [cs.CL].  Free to read
  5. Klapper, Helge; Reitzig, Markus (May 7, 2018). “On the Effects of Authority on Peer Motivation: Learning from Wikipedia” (PDF). Strategic Management Journal (John Wiley & Sons). OCLC 7586436764. doi:10.1002/smj.2909. Retrieved June 28, 2018.  Open access
  6. Shang, Wenyi (March 15, 2018). “A Comparison of the Historical Entries in Wikipedia and Baidu Baike”. In Chowdhury, Gobinda; McLeod, Julie; Gillet, Val; et al. Transforming Digital Worlds. International Conference on Information (iConference 2018; March 25–28 at Sheffield, United Kingdom). Lecture Notes in Computer Science. 10766 (Online ed.). Cham, Switzerland: Springer International Publishing AG. pp. 74–80. ISBN 978-3-319-78105-1. OCLC 7357407865. doi:10.1007/978-3-319-78105-1_9.   Closed access
  7. Xiao, Lu; Sitaula, Niraj (March 15, 2018). “Sentiments in Wikipedia Articles for Deletion Discussions”. In Chowdhury, Gobinda; McLeod, Julie; Gillet, Val; et al. Transforming Digital Worlds. International Conference on Information (iConference 2018; March 25–28 at Sheffield, United Kingdom). Lecture Notes in Computer Science. 10766 (Online ed.). Cham, Switzerland: Springer International Publishing AG. pp. 81–86. ISBN 978-3-319-78105-1. OCLC 7357407963. doi:10.1007/978-3-319-78105-1_10.   Closed access
  8. Pentzold, Christian (May 3, 2017). What are these researchers doing in my Wikipedia?’: ethical premises and practical judgment in internet-based ethnography” (PDF). Ethics and Information Technology (Springer Science+Business Media, published May 5, 2017) 19 (2): 143–155. ISSN 1388-1957. OCLC 7039749181. doi:10.1007/s10676-017-9423-7. Archived (PDF) from the original on June 28, 2018. Retrieved June 28, 2018 – via  Free to read
  9. Pentzold, Christian; Weltevrede, Esther; Mauri, Michele; Laniado, David; Kaltenbrunner, Andreas; Borra, Erik (March 13, 2017). Scopigno, Roberto, ed. “Digging Wikipedia: The Online Encyclopedia as a Digital Cultural Heritage Gateway and Site” (PDF). Journal on Computing and Cultural Heritage. Special Issue on Digital Infrastructure for Cultural Heritage, Part 1 (New York: Association for Computing Machinery, published April 14, 2017) 10 (1): 5:1–5:19. ISSN 1556-4673. OCLC 7006965721. doi:10.1145/3012285. Retrieved June 28, 2018 – via ResearchGate.  Free to read
  10. Kelly, Elizabeth Joan (November 28, 2017). “Use of Louisiana’s Digital Cultural Heritage by Wikipedians”. Practical Communication. Journal of Web Librarianship (Taylor & Francis) 12 (2): 85–106. ISSN 1932-2909. OCLC 7566358637. doi:10.1080/19322909.2017.1391733.  Closed access
  11. Yamada, Shohei (December 29, 2017). “The Conceptual Correspondence between the Encyclopaedia and Wikipedia”. Journal of Japan Society of Library and Information Science (Japan Society of Library and Information Science) 63 (4): 181–195. ISSN 1344-8668. OCLC 7261862873. doi:10.20651/jslis.63.4_181.  Closed access
  12. Matei, Sorin Adam; Britt, Brian C. (September 21, 2017). “Analytic Investigation of a Structural Differentiation Model for Social Media Production Groups”. In Alhajj, Reda; Glässer, Uwe. Structural Differentiation in Social Media: Adhocracy, Entropy, and the ‘1 % Effect’. Lecture Notes in Social Networks (1st ed.). Cham, Switzerland: Springer Nature. pp. 73, 75. ISBN 978-3-319-64424-0. ISSN 2190-5436. LCCN 2017948031. OCLC 7138124671. doi:10.1007/978-3-319-64425-7_5. 
Supplementary references:
  1. Zhang, Justine; Chang, Jonathan (June 13, 2018). Conversations gone awry’—the researchers figuring out when online conversations get out of hand”. Wikimedia Blog (Interview). Interviewed by Melody Kramer; Dario Taraborelli. Wikimedia Foundation. Archived from the original on June 28, 2018. Retrieved June 28, 2018. 
  2. Bush, Jim (November 6, 2017). “Results of Wikipedia study may surprise”. Purdue News Service and Agricultural Communications (Press release). West Lafayette, Indiana: Purdue University. OCLC 7177119166. Archived from the original on June 28, 2018. Retrieved June 28, 2018. 

Wikimedia Research Newsletter
Vol: 8 • Issue: 06 • June 2018
This newsletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed
Email WikiResearch on Twitter WikiResearch on Facebook[archives] [signpost edition] [contribute] [research index]

Archive notice: This is an archived post from, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?