Wikimedia Research Newsletter, March 2012

Translate this post

Wikimedia Research Newsletter

Vol: 2 • Issue: 3 • March 2012 [archives] Syndicate the Wikimedia Research Newsletter feed

Predicting admin elections by editor status and similarity; flagged revision debates in multiple languages; Wikipedia literature reviewed

With contributions by: Tbayer, DarTar, Jodi.a.schneider, Njullien and Piotrus


How editors evaluate each other: effects of status and similarity

A team of social computing researchers based at Stanford and Cornell University studied how users evaluate each other in social media.[1] Their paper, presented at the 5th ACM Web Search and Data Mining Conference (WSDM ’12), focuses on three main case studies: Wikipedia, StackOverflow and Epinions. User-to-user evaluations, the authors note, are jointly influenced by the properties of the evaluator and the target; as a result, differences in properties between the target and the evaluator should be expected to affect the evaluation. The study looks specifically at how differences in topic expertise and status affect peer evaluations. The Wikipedia case focuses on requests for adminship (RfAs), the most prominent example of peer evaluation in Wikipedia and a topic that has attracted considerable attention in the literature (Signpost research coverage: September 2011, October 2011, January 2012). Similarity is measured based on article co-authorship, and status as a function of an editor’s number of contributions. Previous research by the same authors showed that the probability an evaluator will evaluate a target user positively drops dramatically when the status of the two users is very similar, and there is general evidence that homophily and similarity in editing activity have a strong influence on peer evaluation in RfAs. The study identifies two effects that jointly account for this singular finding:

  • “Elite” or high-status users are more likely to participate in evaluations about other users who are active in their areas of interest or expertise.
  • Low-status users tend to be judged differently than those with moderate or high status

In a direct application of these results, dubbed ballot-blind prediction, the authors show how the outcome of an RfA can be accurately predicted by a model that simply considers the first few participants in a discussion and their attributes, without looking at their actual evaluations of the target.

Sociological analysis of debates about flagged revisions in the English, German and French Wikipedias

Icon for acceptedFlaggedRevs-1-1.svg

At the center of debates on “Coercion or empowerment”: Icons signifying accepted (left) and not yet accepted (right) revisions under a flagged revisions scheme

In an article to appear in Ethics and Information Technology, Paul B. de Laat analysed debates occurring in the English, German and French Wikipedias about the evolution of the rules governing new edits.[2] As noted by the analysis of the English Wikipedia’s rules, by Butler et al., 2008,[3] these rules are numerous and have increased in number and complexity; they range from the more formal and explicit (intellectual property rights) to the more informal.

De Laat’s work is based on a study of the discussions around the proposal to introduce a system of reviewing edits before they appear on screen (flagged revisions, discussed on English Wikipedia at Wikipedia:Flagged revisions). It focuses on the perennial debate around the construction of knowledge commons theorized by Elinor Ostrom:[4] being a collective, open project, it must be accessible to most, but as its production becomes important for its “owners” (readers and producers), boundaries have to be set to protect its integrity. De Laat’s article describes and analyzes the tensions and permanent adjustments needed to manage these apparently opposed goals.

In a Weberian analysis of bureaucracy, applicable to Wikipedia policies, he shows that two views can be invoked to explain the intensity of the discussions. He summarizes the debate as a clash between (i) those who saw the flagged revisions as “a useful tool for curbing vandalism”, enabling and empowering users and editors, and (ii) those who denounced it as “a superfluous bureaucratic device that violates egalitarian principles of participation”, designed to introduce a more controlled and hierarchical environment. He muses that “an intriguing question that remains to be answered, of course, is: What brought the three language communities to ultimately choose or reject such a review system? Why is it that, each in their own ways, the Germans voted for acceptance, the French for rejection, while the English have been wavering all the time between acceptance and rejection”? (p. 11) This question, and Wikipedians’ views of flagged revisions, can shine light onto what kind of community Wikipedia should be, according to various factions of editors. As De Laat answers it, “many of those who reject the system of review do so from a vision of Wikipedia as an unbounded community that shares knowledge without mutual control and suspicion, while many of those who embrace the review system do so because they have a vision of Wikipedia as an organization producing reliable knowledge that keeps vandalism outside its borders”. De Laat suggests that further research is needed to fully understand the factors affecting the decisions on different Wikipedias taken with regard to flagged revisions, postulating a hypothesis to be tested in further research that “those whose mother tongue is German may possibly be more deferential to hierarchy than those who speak either French or English, and therefore may prefer the order and respectability introduced by a system of reviewing”.

In a paper published by the European chapter of the Association for Computational Linguistics,[5] Oliver Ferschke and coauthors describe a study of talkpages on the Simple English Wikipedia. This paper uses speech act theory and dialog acts as a theoretical framework for studying how authors use discussion pages to collaborate on article improvement. They have released a freely downloadable corpus of 100 segmented and annotated talk pages, called the Simple English Wikipedia Discussion Corpus and based on a new annotation schema for coordination-related dialog acts. Their schema uses 17 categories, grouped into these four top-level categories: article criticism, explicit performative announce, information content, and the interpersonal. The authors use their corpus to develop a machine-learning-based UIMA pipeline for dialog act classification, which they describe but which is not freely available. They provide a useful discussion of conversational implicature theory and good pointers to seminal and new research in dialog acts. (A longer, editable summary is available on AcaWiki.)

Majority of UK academics prohibit students from using Wikipedia, but use it just as frequently themselves

An article appearing in “Teaching in Higher Education”[6] “discusses the use of Wikipedia by academics and students for learning and teaching activities at Liverpool Hope University, [considering] the findings to be indicative of Wikipedia use at other British universities”. Having sent email invitations to all staff and students at the university, they received responses from 133 academics and 1222 students. 75% of the student respondents said they used Wikipedia for “some purpose”, which according to the authors indicates that Wikipedia use “has risen appreciably in a short period of time” among British university students, citing a 2009 study[7] which had put that number at only 17.1%. “However”, they cautioned, usage was “significantly lower than usage in the USA.”

Among the surveyed teaching staff, almost the same percentage (74%) used Wikipedia “for some purpose” as their students—but just 24% of them “tell their students to use Wikipedia for Learning and Teaching purposes, with 18% having not mentioned it to students and 58% having expressly told them not to.” The independence of academics’ answers to these two questions is highlighted by the authors as

“a key finding of the survey: there is little difference between academics that permit their students to use Wikipedia and those who do not in respect to their own use. In particular, amongst both groups, academics that used Wikipedia ‘frequently’ seem to exhibit similar usage profiles. It was indicated in the commentary that the critical difference was that they have the scholarly expertise to determine what material on Wikipedia was ‘correct’ and that which was not.”

In the conclusion the authors observe that “a significant proportion of what we would see as enlightened academics at Liverpool Hope and no doubt elsewhere realise that it is pointless to try to hold back the online tide of Wikipedia. Instead, they try to give guidance in the way that students consult it: for clarification, references, comparison and definitions.”

A systematic review of the Wikipedia literature

“The people’s encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia”[8] is the title of a working paper which promises to be a major milestone in Wikipedia research. It is an attempt to synthesize a broad-based literature review of scholarly research on Wikipedia. The task of creating a comprehensive database of such publications has seen several efforts before and its difficulties were explored in a well-attended workshop at last year’s WikiSym conference (see the October issue of this research report).

The authors intend to release their findings in a “Web 2.0” format through their wiki by the end of May 2012. The current paper is impressive in scope, but at 71 pages badly in need of a table of contents (the current version does not seem to adhere to any consistent manual of style, with headings using different font sizes and even colors) and clarifications (the current distinction between findings on p.12 and discussion on p. 19 seems somewhat arbitrary; the authors at one point promise a discussion of over 2,000 articles and in other places talk of a sample of 139) – perhaps due to its genesis (see below). Keeping in mind this is just a draft paper, we hope the final paper will have an improved flow and transparency. The presented methodology is useful for those interested in learning how to analyze large, thematic bodies of work using online databases. In one of their major contributions, the authors intend to present an overview of Wikipedia research grouped by themes (keywords), such as for example discussing research done on “vandalism reversion”, “thesaurus construction” or “attitude towards Wikipedia”. While the current draft is not yet comprehensive, it shows much potential, and in practice their wiki, which already groups the content with categories, may prove more useful as a reference work.

As explained by one of the authors, the paper merges two existing efforts, both of which already published drafts last year. And by choosing as their platform, they embrace the work of a third party, Wikimedian User:Emijrp‘s “Wikipapers” wiki on the same domain. This follows discussions between the three parties reported in the January issue of this research report (“New effort at comprehensive wiki research literature database“). On the wiki, the authors acknowledge the modest efforts of a fourth party, namely this research report (which just released a dataset of all publications covered until the end of 2011): “We do not include any items published after June 2011, after which the Wikimedia Research Newsletter was formally inaugurated; we’re letting them pick up from where we stop.”


  • External links in the English Wikipedia: A short paper by a team of Greek researchers presents statistics on the nature and quality of external links in the English Wikipedia, based on the October 2009 XML dump.[9] The analysis, although based on an outdated dataset, reveals insights into the distribution of links per article, the relation between external links and article length, and the proportion of dead links, which they quantify as 18% of the links in their corpus.
  • Wikipedia articles on StumbleUpon: Two short papers which are to be presented at a workshop titled “Searching 4 Fun!” collocated with the upcoming European Conference on Information Retrieval concern Wikipedia: “Serendipitous Browsing: Stumbling through Wikipedia”[10] examines which Wikipedia articles are being featured by users of the social bookmarking site StumbleUpon. Based on a sample consisting of a random selection of half of the articles from the October 2011 dump, 15.13% of the articles of the English Wikipedia are contained in StumbleUpon’s index (as opposed to less than 1% of both the French and the German Wikipedia, according to an initial investigation). The 100 articles with the most views by StumbleUpon users contained only one featured article, but twelve lists – among them the number one, the list of unusual deaths, which belongs to the “Bizarre/Oddities” category on StumbleUpon, as do four of the other top ten articles. A second w:position paper is titled “Searching Wikipedia: Learning the Why, the How, and the Role Played by Emotion”[11] proposes to examine users’ search behavior, employing diary studies and a custom-built Firefox extension asking Wikipedia readers to record details about their informational requirement and the motivating situation driving the search.
  • Citations of open access articles in Wikipedia: An ArXiv preprint by researchers based at UNC-Chapel Hill and the National Evolutionary Synthesis Center, studying “indicators of scholarly impact in social media” (or “altmetrics”), reports on the number of citations to the open access scholarly literature that can be found in Wikipedia.[12] The study suggests that 5% of all 24,331 articles ever published until November 2010 in the seven journals of open-access publisher Public Library of Science (PLoS) are cited in Wikipedia. More statistics on the number of scholarly citations in Wikipedia by language and by publisher are available via the Wikipedia cite-o-meter.
  • First results from Article Feedback v5: The Wikimedia Foundation reported new results from the first stage of experiments with a fully redesigned article feedback tool.[13] The full report indicates that 45% of all reader suggestions (sampled from a random list of AFT5-enabled articles) were considered useful via a blind assessment performed by Wikipedia editors. The report also identifies differences in the overall volume of feedback posted via different designs and finds that asking readers to suggest what they were looking for outperforms comments with ratings.
  • Augmenting Wikipedia articles with Europeana items: A paper titled “Enabling the Discovery of Digital Cultural Heritage Objects through Wikipedia”[14] proposes a mechanism that allows users “to browse Wikipedia articles, which are augmented with items from the cultural heritage collection. Using Europeana as a case-study we demonstrate the effectiveness of our approach for encouraging users to spend longer exploring items in Europeana compared with the existing search provision.”
  • Semantics for genes: Biochemists from the Gene Wiki project on Wikipedia report on “Building a biomedical semantic network in Wikipedia with Semantic Wiki Links”.[15] Among other things, the paper mentions the introduction of {{SWL}}, an attempt to emulate some aspects of Semantic MediaWiki using Wikipedia’s existing (non-semantic) MediaWiki version.
  • Live version of DBpedia: A paper[16] by the team behind the DBpedia project (which extracts structured data from Wikipedia) promises to explain the techniques behind a recent improvement, avoiding lags caused by infrequent updates. According to the abstract, “Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of recently updated articles. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia.”


  1. Anderson, A., Huttenlocher, D., Kleinberg, J., & Leskovec, J. (2012). Effects of user similarity in social media. Proceedings of the fifth ACM international conference on Web search and data mining – WSDM ’12(p. 703). New York, New York, USA: ACM Press. DOIPDF Open access
  2. de Laat, P. B. (2012). Coercion or empowerment? Moderation of content in Wikipedia as ‘essentially contested’ bureaucratic rules. Ethics and Information Technology, 1–13. Springer Netherlands. DOI Open access
  3. Butler, B., Joyce, E., & Pike, J. (2008). Don’t look now, but we’ve created a bureaucracy: The nature and roles of policies and rules in Wikipedia. Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems – CHI ’08 (p. 1101). New York, New York, USA: ACM Press. DOIPDF Open access
  4. Hess, Charlotte and Ostrom, Elinor (2006) A Framework for Analyzing the Knowledge Commons, in Hess, C., & Ostrom, E. (Eds.). Understanding Knowledge as a Commons: From Theory to Practice. MIT Press, 2006, pp. 41–81 Closed access
  5. Ferschke, O., Gurevych, I., & Chebotar, Y. (2012). Behind the Article: Recognizing Dialog Acts in Wikipedia Talk Pages. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012). PDF Open access
  6. Knight, C., & Pryke, S. (2012). Wikipedia and the University, a case study. Teaching in Higher Education, 1–11. Routledge. DOI Closed access
  7. Hampton-Reeves, S., Mashiter, C., Westaway, J., Lumsden, P., Day, H., Hewertson, H., & Hart, A. (2009). Students’ Use of Research Content in Teaching and Learning Behaviour. JISC, PDF Open access
  8. Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2012). The people’s encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia. SSRN eLibrary. SSRN. HTML Open access
  9. Tzekou, P., Stamou, S., Kirtsis, N., & Zotos, N. (2011). “Quality assessment of Wikipedia external links.” In J. Cordeiro & J. Filipe (Eds.), WEBIST 2011, Proceedings of the 7th International Conference on Web Information Systems and Technologies (pp. 248-254). PDF Open access
  10. Hauff, C., & Houben, G.-J. (2012). Serendipitous Browsing: Stumbling through Wikipedia. In D. Elsweiler, M. L. Wilson, & M. Harvey (Eds.), Proceedings of the “Searching 4 Fun!” workshop, collocated with the annual European Conference on Information Retrieval (ECIR2012) Barcelona, Spain, April 1, 2012. (pp. 21-24) PDF Open access
  11. Knäusl, H. (2012). Searching Wikipedia: Learning the Why, the How, and the Role Played by Emotion. In D. Elsweiler, M. L. Wilson, & M. Harvey (Eds.), Proceedings of the “Searching 4 Fun!” Workshop, collocated with the annual European Conference on Information Retrieval (ECIR2012) Barcelona, Spain, April 1, 2012. (pp. 14-15). PDF Open access
  12. Priem, J., Piwowar, H. A., & Hemminger, B. H. (2012). Altmetrics in the Wild: Using Social Media to Explore Scholarly Impact. ArXiV. PDFOpen access
  13. Florin, F., Fung, H., Halfaker, A., Keyes, O., & Taraborelli, D. (2012). Helping readers improve Wikipedia: First results from Article Feedback v5. Wikimedia Foundation blog. HTML Open access
  14. Hall, M. M., Clough, P. D., Lopez de Lacalle, O., Soroa, A., & Agirre, E. (2012). Enabling the Discovery of Digital Cultural Heritage Objects through Wikipedia. PDF Open access
  15. Good, B. M., Clarke, E. L., Loguercio, S., & Su, A. I. (2012). Building a biomedical semantic network in Wikipedia with Semantic Wiki Links. Database : The Journal of Biological Databases and Curation, 2012, DOIOpen access
  16. Morsey, M., Lehmann, J., Auer, S., Stadler, C., & Hellmann, S. (2012). DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and information systems, 46(2), 2. Emerald Group Publishing Limited. PDF Closed access

Wikimedia Research Newsletter
Vol: 2 • Issue: 3 • March 2012
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email @WikiResearch on WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]

Archive notice: This is an archived post from, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

Inline Feedbacks
View all comments

This is in regards to Frank Smith the Pyscholinguist, Wikipedia write-up. First, Keith Stanovich and Richard West’s study proved Smith’s ‘psycholinguistic theory’ of reading 180 degrees removed from reality, (see ‘Romance and Reality’ by Stanovich. Classic article. Next, a letter from MIT signed by Steve Pinker, and the rest of the Linguists, warns against Whole Language and specifically mentions Frank Smith’s error in judgement, (that the invention of the alphabet is a handing tool for spelling but has nothing to do with the sounds of words. They point out the the fact that the alphabet was invented to represent the… Read more »

The paper on external links contains some dubious statements. The autors believe that “a large amount of external links signifies incomplete article contents” and “Wikipedia editors are instructed to point to external resources if their content is proper in the article’s context and is not yet part of the article. Thus, one would expect lengthy articles to contain fewer links than short ones in the sense that the more text the article contains the decreased the need to link it with supplemental non-wiki material.”, which is completely wrong. Better articles need many external links to provide with lots of verifiable,… Read more »