Wikimedia Research Newsletter, September 2013

Translate this post
Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 9 • September 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Automatic detection of “infiltrating” Wikipedia admins; Wiki, or ‘pedia?

With contributions by: Brian Keegan, Piotr Konieczny, Aaron Halfaker, Jonathan Morgan and Tilman Bayer

Wiki, or ‘pedia? The genre and values of Wikipedia compared with other encyclopedias

Wikipedia and Encyclopaedism: A Genre Analysis of Epistemological Values[1] is a new Masters’ Thesis that analyzes the values that influenced how knowledge is presented on Wikipedia, in comparison with other encyclopedias that have been created throughout history. The author uses genre analysis to compare the epistemological values that are represented in the kind of knowledge that different encyclopedias present and in the way they present that knowledge. The author first conducts a literature review to compare the epistemology of two genres: wikis and encyclopedias. The wiki epistemology is composed of six values: self-identification, collaboration, co-construction, cooperation, trust in the community, and constructionism. By contrast, the values of major current and historical encyclopedias—such as Diderot’s Encyclopedia, Pliny’s Natural History, and the Encyclopædia Britannica—prioritize trust in experts, authority, and consistency.

Despite being based on different, and even somewhat contradictory, value systems, the purpose of Wikipedia and the way it presents knowledge are shown to be similar to other works in the encyclopedia genre. The author analyzes the frequency of common words in section headings of 25 heavily edited English Wikipedia articles that had a corresponding article in Britannica. He compares the evolution of section headings within these Wikipedia articles and multiple editions of Britannica, and shows that the gradual process by which a Wikipedia article becomes more structured through the addition and alteration of headings is similar to the process for Britannica articles, which also tend to become longer and more formally structured over subsequent editions. This thesis presents some interesting parallels between the way articles are developed within Wikipedia and other encyclopedias, despite vastly different timescales and some differing underlying values. It also offers an engaging, in-depth discussion of the concept of genre, the purpose of the encyclopedia genre, and the history of several major historical encyclopedias.

Identifying trending topics of yesteryear

In a paper titled “Temporal Wikipedia search by edits and linkage”[2], the authors develop a method to identify Wikipedia articles associated with topics around a date based on changes the length of the article as well as patterns of the other articles to which it links. This paper expands on prior work in temporal information retrieval and anomaly detection and uses modifications to the HITS and PageRank to return a list of the most relevant documents for a topic on a date. This work has implications for not only using Wikipedia data to identify trending topics, but also to retrospectively identify trending topics. A downloadable Java client allows test searches (for the months of September and October 2011) and the display of the resulting page networks.

Automatic detection of “infiltrating” Wikipedia admins

A paper titled “Manipulation Among the Arbiters of Collective Intelligence: How Wikipedia Administrators Mold Public Opinion”[3], to be presented at next month’s ACM Conference on Information and Knowledge Management (CIKM), makes a rather serious claim: “We find a surprisingly large number of editors who change their behavior and begin focusing more on a particular controversial topic once they are promoted to administrator status.” This reviewer does not find it shocking, as he has written about this problem years ago. The authors note that those editors are difficult to understand based on their pattern of edits, but are more easily spotted by analyzing the pattern of votes at RfA, though they also suggest that a relatively simple fix may be helpful – simply increasing the threshold of success votes required for a successful RfA may increase the quality of the Wikipedia admin corps.

One may however quibble with “enforcement of neutrality is in the hands of comparatively few, powerful administrators”, another attention-drawing claim in the abstract, that however finds little discussion or support in the body. Discussions about NPOV topics are hardly limited to mop‘n’bucket wielders, and thus this claim, and the article abstract, may be exaggerating the importance of the findings. Some admins wait until getting the nearly-impossible-to-remove mop before becoming, well, regular editors. As long as they are not abusing their powers – this reviewer is not sure why we should care. What is more relevant, certainly, is how this entire process shows the inefficiency of RfA, which forces people to hide behind false “I am perfect” personas, as any sign of being a real person (i.e. making errors, being human, etc.) is often enough to threaten to derail that process. Still, this review is not a place for beating that nearly dead horse – but those interested in the RfA reform process should likely read this article in more detail.

Academic role models important for promoting the use of Wikipedia in higher education

“An Empirical Study on Faculty Perceptions and Teaching Practices of Wikipedia”[4] is a new paper in the emerging subfield of “academics and educators attitudes on Wikipedia”, which we have covered before (links). This paper benefits from a respectable sample (about 800 respondents from the faculty body of the Open University of Catalonia). The paper confirms a number of previous findings, namely the importance of one’s perception of Wikipedia’s usefulness and quality, which is significantly and positively correlated to whether one will consider using it as a teaching resource. Correspondingly, poor knowledge about Wikipedia in particular, and about open access and collaborative knowledge creation models in general, are negatively correlated with views on Wikipedia. Having a respected figure (role model) using Wikipedia in teaching is also likely to influence others, through the usual informal peer networks. Individual characteristics (academic rank, teaching experience, age or gender) are not seen as significant. As the authors conclude, there is much work to be done in educating the worlds of education and academia about the basics of Wikipedia – something we should never take for granted.

Was Steve Jobs an inventor? WP:V as “delegated voice”

In a paper titled “Learning through Massively Co-Authored Biographies: Making Sense of Steve Jobs on Wikipedia through Delegated Voice”[5], the authors performed a qualitative analysis of discussions about whether or not to describe Steve Jobs as an “inventor” from the articles talk page. They use the discussion as an example of Wikipedians’ use of WP:Verifiability to write articles from a “delegated voice”. While mostly critical of the limitations and contradictions that this approach to encyclopedia construction entails, they admit that Wikipedia articles “do indeed illustrate a variety of voices and points of view.” They draw a contrast with Encyclopedia Britannica‘s entry on Steve Jobs which does not contain any critical comments while Wikipedia’s contains several nuanced discussions critical of Jobs’ life and work.


  • Historical gender ratio of Wikipedia biographies: Swedish Wikipedian LA2 has evaluated biography articles on the German and Swedish Wikipedia by birth date and gender.[6] The results show a ratio of more than 10:1 males to females for births before 1900, decreasing to less than 3:1 in the late 20th century.
  • Readability rating tool based on AFT data: A paper titled “Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features” [7] uses readability ratings that Wikipedia readers submitted via the Article Feedback Tool to construct an automatic text readability test.
  • Wikipedia usage in Germany: According to an annual survey by ARD and ZDF (Germany’s two main public broadcasters),[8], 74% of German Internet users access Wikipedia at least occasionally (among academics, the number is 88%), and 32% use it once per week or more often.
  • BALL CHICKEN BRITISH WOMAN: A master’s thesis submitted last year to the University of Georgia, titled “A Large Scale Study of Edit Patterns in Wikipedia and its Applications to Vandalism Detection”[9] contains a wealth of statistical results about vandalism edits on the English Wikipedia, for example a “List of top 25 vandal words”, starting as follows: Ball, chicken, British, woman, hole, handicap, meat, kiss…
  • Automatic classification of edits: A paper titled “Automatically Classifying Edit Categories in Wikipedia Revisions”[10] presents a system to automatically classify Wikipedia edits into categories such as the fixing of spelling mistakes, the adding of citations, markup changes or vandalism.
  • Analysis of product comparison matrices on Wikipedia: A conference paper from the International Conference on Automated Software Engineering[11] analyzes more than 300 “Product comparison matrices (PCMs)” on Wikipedia (such as in the article Comparison of webmail providers).
  • Legendary, acclaimed, world-class text analysis method finds you promotional Wikipedia articles really easily: Training a machine learning algorithm to distinguish 13,000 articles in the w:Category:All articles with a promotional tone from articles not tagged as promotional, and from good or featured articles, three researchers from the University of Texas at Austin conclude that “stylometric features … work very well for detecting promotional articles in Wikipedia.”[12] Among the many features used in the classifier is the “percentage of special phrases such as peacock terms (‘legendary’, ‘acclaimed’, ‘world-class’), “weasel terms” (‘many scholars state’, ‘it is believed/regarded’, ‘many are of the opinion’, ‘most feel’, ‘experts declare’, ‘it is often reported’) , editorializing terms (‘without a doubt’, ‘of course’, ‘essentially’).”


  1. Steven J. Jankowski: Wikipedia and Encyclopaedism: A Genre Analysis of Epistemological Values
  2. Julianna Göbölös-Szabó, András A. Benczúr: Temporal Wikipedia search by edits and linkage. TAIA’13 August 1, 2013, Dublin, Ireland.
  3. Sanmay Das, Allen Lavoie, Malik Magdon-Ismail: Manipulation Among the Arbiters of Collective Intelligence: How Wikipedia Administrators Mold Public Opinion. CIKM’13, Oct. 27–Nov. 1, 2013, San Francisco, CA, USA.
  4. Josep Lladós; Eduard Aibar; Maura Lerga; Antoni Meseguer; Julià Minguillon: An Empirical Study on Faculty Perceptions and Teaching Practices of Wikipedia
  5. Rughinis, C.; Matei, S.: Learning through Massively Co-Authored Biographies: Making Sense of Steve Jobs on Wikipedia through Delegated Voice. Control Systems and Computer Science (CSCS), 2013, 19th International Conference on 29-31 May 2013, Closed access
  6. User:LA2:
  7. Zahurul Islam and Alexander Mehler: Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features
  8. Annual survey by ARD and ZDF
  9. Deepika Sethi: A Large Scale Study of Edit Patterns in Wikipedia and its Applications to Vandalism Detection
  10. Johannes Daxenberger and Iryna Gurevych: Automatically Classifying Edit Categories in Wikipedia Revisions
  11. Nicolas Sannier, Mathieu Acher, and Benoit Baudry: From Comparison Matrix to Variability Model: The Wikipedia Case Study. 8th IEEE/ACM International Conference on Automated Software Engineering (2013)
  12. Shruti Bhosale, Heath Vinicombe, Raymond Mooney: “Detecting Promotional Content in Wikipedia

Wikimedia Research Newsletter
Vol: 3 • Issue: 9 • September 2013
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email @WikiResearch on WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]

Archive notice: This is an archived post from, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

1 Comment
Inline Feedbacks
View all comments

Thank you for the writeup, Piotr! I’m one of the authors of the “Manipulation Among the Arbiters of Collective Intelligence” paper. Two quick clarifications. We are not suggesting that simply raising the threshold for RfAs would help. In fact, it would probably make behavior changes more common: successful candidates who get a higher vote percentage are somewhat more likely to change their behavior upon becoming an administrator. I can see how our “just above” and “just below” threshold comparison could give the wrong impression. However, there isn’t anything special about the current threshold: the most significant difference between just-above-threshold users… Read more »