Wikimedia Research Newsletter, October 2014

Translate this post
Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 10 • October 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Informed consent and privacy; newsmaking on Wikipedia; Wikipedia and organizational theories

With contributions by: Maximilian Klein, Piotr Konieczny, Kim Osman, Pine and Tilman Bayer

Tl;dr: Users, informed consent and privacy policies online

Reviewed by Kim Osman

In new research[1] conducted in light of proposed changes to data protection legislation in the European Union (EU), authors Bart Custers, Simone van der Hof, and Bart Schermer conducted a comparative analysis of social media and user-generated content websites’ privacy policies along with a user survey (N=8,621 in 26 countries) and interviews in 13 different EU countries on awareness, values, and attitudes toward privacy online. The authors state consent regarding personal data use is an important concept and observe, “There is mounting evidence that data subjects do not fully contemplate the consequences and risks of personal data processing.”

Custers, van der Hof and Schermer developed a set of criteria for giving informed consent about the use of personal data including: “Is it clear who is processing the data and who is accountable?” and “Is the information provided understandable?” When existing privacy policies were applied to these criteria, Wikipedia was the worst performing of the sites analyzed and recommends that it makes clear how minors are dealt with and to provide additional clarity around security measures. It also notes that IP addresses may be traced, therefore making “anonymous” Wikipedia users identifiable.

The study did acknowledge issues around self-presentation and identity in different online contexts and the actual need for a site like Wikipedia to have an extensive privacy policy as users afford criteria regarding privacy different value in these different contexts. The authors do note however, “Wikipedia does collect opinions that may be attributable to individuals and that may be considered privacy sensitive.”

This paper is a well-researched summary of the privacy policies of online sites (including major international platforms like Facebook, Twitter and YouTube), and although from a European perspective (where data collection practices are arguably more stringent than in other places in the world), it raises important questions about how Wikipedia approaches its privacy policy in terms of informed user consent, and would be useful reading for anyone with an interest in how online practices are shaping approaches to user privacy.

For researchers requiring more information about ethics in online research visit the Association of Internet Researchers’ wiki.


Holocaust articles compared across languages

We tell ourselves that Wikipedia works well for the most part, but that finding consensus might break down on controversial articles. Of all article topics, perhaps none is potentially more fraught than the Holocaust, and that is precisely what Rudolf Den Hartogh has tackled in his Master’s thesis “The future of the Past: A case study on the representation of the Holocaust on Wikipedia”.[2] It is an in-depth compare and contrast analysis of the Holocaust topic in the English, German, and Dutch. Several curious facts come out of this. For instance the average vandalism rate on these articles is 4%, compared with 7% globally – as these articles have been locked at some point, although the Dutch version is no longer protected. Other analyses show edit activity over time, since the articles’ inception. The German version saw the height of its shaping 2 years after it was started in 2004, whereas the English and Dutch articles saw their main spurts 5 and 3 years later respectively. Moreover the author finds “that there does not exist one representation of the Holocaust, but each language version has its own unique account of events and phenomena.” Finally they “found that none of the Holocaust entries under study is rated ‘good quality’,” so we still have not definitively addressed the hardest parts of our encyclopedia.

Semantic role label features for all records, colours are based on event tag in the Lensing Wikipedia dataset.
(“SRL-Full-p40” by Jasneet.sabharwal, under CC-BY-SA-4.0)

Lensing Wikipedia

A project[3] with this title aims to extract date, location, event and role semantic data from historical English Wikipedia articles. Of course making grand sense of that automatic extraction work requires visualization. Such visualization is difficult on high-dimensional data consisting of e.g. a date, location, multiple events and roles – all at the same time. A short proof of concept “Visualizing Wikipedia using t-SNE” by Jasneet Singh Sabharwal [4] has done just this using a Barnes-Hut simulation variation of the T-distributed stochastic neighbor embedding algorithm. This image shows the closeness of the semantic roles of features found in Wikipedia article text, with colors indicating similar events that articles are describing.

“Infoboxes and cleanup tags: Artifacts of Wikipedia newsmaking”

An article[5] in Journalism: Theory, Practice and Criticism looks at use and abuse of cleanup tags and infobox elements as conceptual and symbolic tools. Based on ethnographic observations and several interviews, the author provides a lengthy description of the formative first three or so weeks in the 2011 Egyptian Revolution article. It is a valuable study of how articles are developed, and the collaboration and conflicts that are common in high-activity articles. The author provides a valuable observation that “Classification work… is intensely political” and “the editing of Wikipedia articles involves continuous linking and classifying.” The choice of words, categories, article titles, but also specific tags or infoboxes (though a particular example discussed – whether to use Template:Infobox uprising or not – seems to concern a template that does not, in fact, exist) can be quite controversial. The author also puts forth an interesting argument that removal of cleanup tags may give false impressions of stability in articles that are not yet stable; and that infoboxes carry significant, perhaps undue weight, compared to other elements of the article.

Wikipedia’s identity “based on freedom”

This paper[6] looks at Wikipedia through a number of organizational theory lenses, in particular theories of organizational identity. Of particular interest to Wikipedians is one of the aspects analyzed by the editors – identify of the project. The authors state that “the organizational identity at Wikipedia is based on freedom”. Next, they discuss the utopian ideals of freedom (such as “anyone can edit”), as contrasted with the freedom-reducing tendencies of censorship, administrative control, and bureaucratization. The authors argue that the common solution to criticism of Wikipedia, within the community, is concealment and marginalization of said criticism. The authors point to the practical defanging of the Wikipedia:Ignore all rules policy, which has went through a number of meaning shifts, in which it was redefined to be virtually toothless, even though the name remained the same. Another way that freedom is limited is through end-justifies-the-mean utopian vision of “free access [to Wikipedia] for everyone”, replacing the older “anyone can edit” “freedom of editing meaning. Unfortunately, the author’s discussion of “the subjugation of contesting voices” is very short on details and specifics; the authors allude to administrator power abuse, but fail to provide any specific discussion of how it occurs; an example they used of “deleted content” can be interpreted as nothing more sinister then admin ability to delete content that does not meet Wikipedia’s site policies, including uncontroversial content such as spam.

“Copyright or Copyleft? Wikipedia as a Turning Point for Authorship”

This paper[7] touches upon a very interesting yet understudied area: what Wikipedia’s existence means for copyright law. As the authors note, Wikipedia “appears to challenge some of the notions at the heart of copyright law.”

Critique of Wikipedia’s dispute resolution procedures

This paper[8] claims to presents an ethnographic analysis of and a strong critique of Wikipedia’s dispute resolution procedures, and states upfront its goal as “to tease out systemic discrimination or injustice”. The strongly worded abstract is attention-drawing, promising that “A number of flaws will be identified including the ability for vocal minorities to dominate the Wikipedia community consensus”. Unfortunately, while the paper provides a very detailed description of Wikipedia’s dispute resolution scene, it doesn’t seem to present any new data; its critique of “vocal minorities”, for example, is composed of few sentences, and the entire argument is based on, and essentially a repetition of a similar passage in Reagle’s Good Faith Collaboration book. While the paper is well written and presents a number of valid arguments, it does not seem to contribute anything new to our understanding of Wikipedia, being in essence a literature review focused on the topic of dispute resolution on Wikipedia. Which this reviewer finds disappointing, considering that the almost tabloid-style abstract and the introductory section promise ethnographic research, which – like anything else going beyond synthesis of existing, published research – is sadly very much absent from the paper.

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

  • “Insights from the Wikipedia Contest (IEEE Contest for Data Mining 2011)”[9] (earlier coverage: “Predicting editor survival: The winners of the Wikipedia Participation Challenge“)
  • “A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection”[10] (constructs a dispute corpus from Wikipedia talk pages)
  • “Extracting Imperatives from Wikipedia Article for Deletion Discussions”[11] (without conclusions or published dataset, apparently)
  • “Use of Wikipedia by Legal Scholars: Implications for Information Literacy”[12]
  • “Guiding Students in Collaborative Writing of Wikipedia Articles – How to Get Beyond the Black Box Practice in Information Literacy Instruction”[13] (received the EdMedia Outstanding Paper Award)
  • “Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project”[14] (project home page, allowing the live creation of a taxonomy graph for an arbitrary Wikipedia article: )
  • “Analysis of the accuracy and readability of herbal supplement information on Wikipedia”[15]
  • “Maturity Assessment of Wikipedia Medical Articles”[16]
  • “Computer-supported collaborative accounts of major depression: Digital rhetoric on Quora and Wikipedia”[17]


  1. Custers, Bart; Simone van der Hof, Bart Schermer (2014-09-01). “Privacy Expectations of Social Media Users: The Role of Informed Consent in Privacy Policies“. Policy & Internet 6 (3): 268-295. doi:10.1002/1944-2866.POI366. ISSN 1944-2866. 
  2. Den Hartogh, Rudolf (2014). The future of the Past: A case study on the representation of the Holocaust on Wikipedia (Masters). Erasmus University Rotterdam.
  3. Lensing Wikipedia. Simon Fraser University Natural Language Laboratory.
  4. Jasneet Singh Sabharwal: Visualizing Wikipedia using t-SNE
  5. Ford, Heather (2014-08-31). “Infoboxes and cleanup tags: Artifacts of Wikipedia newsmaking“. Journalism: 1464884914545739. doi:10.1177/1464884914545739. ISSN 1741-3001 1464-8849, 1741-3001.  Closed access
  6. Kozica, Arjan M. F.; Christian Gebhardt, Gordon Müller-Seitz, Stephan Kaiser (2014-10-13). “Organizational Identity and Paradox An Analysis of the ‘Stable State of Instability’ of Wikipedia’s Identity“. Journal of Management Inquiry: 1056492614553275. doi:10.1177/1056492614553275. ISSN 1552-6542 1056-4926, 1552-6542.  Closed access
  7. Simone, Daniela (2013-07-01). “Copyright or Copyleft? Wikipedia as a Turning Point for Authorship”. Rochester, NY: Social Science Research Network. 
  8. Ross, Sara (2014-03-01). “Your Day in ‘Wiki-Court’: ADR, Fairness, and Justice in Wikipedia’s Global Community”. Rochester, NY: Social Science Research Network. 
  9. Desai, Kalpit V.; Roopesh Ranjan (2014-01-07). “Insights from the Wikipedia Contest (IEEE Contest for Data Mining 2011)“. arXiv:1405.7393 [physics, stat]. 
  10. Lu Wang, Claire Cardie: A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 693–699, Baltimore, Maryland, USA, June 23-25 2014
  11. Fiona Mao,Robert E. Mercer, Lu Xiao: Extracting Imperatives from Wikipedia Article for Deletion Discussions Proceedings of the First Workshop on Argumentation Mining, pages 106–107, Baltimore, Maryland USA, June 26, 2014.
  12. Darryl Maher: Use of Wikipedia by Legal Scholars: Implications for Information Literacy. Master’s thesis, School of Information Management, Victoria University of Wellington, submitted June 2014
  13. Sormunen, E. & Alamettälä, T. (2014). Guiding Students in Collaborative Writing of Wikipedia Articles – How to Get Beyond the Black Box Practice in Information Literacy Instruction. In: EdMedia 2014 – World Conference on Educational Media and Technology. Tampere, Finland: June 23-26, 2014
  14. Flati, Tiziano; Daniele Vannella, Tommaso Pasini, Roberto Navigli (2014). “Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project“. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 945-955. 
  15. Phillips, Jennifer; Connie Lam, Lisa Palmisano (2014-07-01). “Analysis of the accuracy and readability of herbal supplement information on Wikipedia“. Journal of the American Pharmacists Association 54 (4): 406-414. doi:10.1331/JAPhA.2014.13181. ISSN 1544-3191.  Closed access
  16. Conti, Riccardo; Emanuel Marzini, Angelo Spognardi, Ilaria Matteucci, Paolo Mori, Marinella Petrocchi (2014). “Maturity Assessment of Wikipedia Medical Articles”. Proceedings of the 2014 IEEE 27th International Symposium on Computer-Based Medical Systems. CBMS ’14. Washington, DC, USA: IEEE Computer Society. pp. 281–286. DOI:10.1109/CBMS.2014.69. ISBN 978-1-4799-4435-4.  Closed access
  17. Rughinis, Cosima; Bogdana Huma, Stefania Matei, Razvan Rughinis (June 2014). “Computer-supported collaborative accounts of major depression: Digital rhetoric on Quora and Wikipedia”. 2014 9th Iberian Conference on Information Systems and Technologies (CISTI). pp. 1-6. DOI:10.1109/CISTI.2014.6876968.  Closed access

Wikimedia Research Newsletter
Vol: 4 • Issue: 10 • October 2014
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email @WikiResearch on WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]

Archive notice: This is an archived post from, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?