Wikimedia Research Newsletter, March 2015

Translate this post
Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 5 • Issue: 3 • March 2015 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Most important people; respiratory reliability; academic attitudes

With contributions by: Piotr Konieczny, Anwesh Chatterjee and Tilman Bayer.

Most important people of all times, according to four Wikipedias

Most prominent person on the English, Chinese, Japanese, and German Wikipedia, according to the paper’s PageRank method

This social network analysis[1] looks at the entire corpus of Wikipedia biographies (with data from English, Chinese, Japanese and German Wikipedias). The authors created several thousand networks (unfortunately, this short conference paper does not discuss precisely how) and used the PageRank algorithm to identify key individuals.

The authors attempt to answer the question “Who are the most important people of all times?” Their findings clearly show that different Wikipedias give different prominence to different individuals (the most prominent people, for the four Wikipedias, appear to be George W. Bush, Mao Zedong, Ikuhiko Hata and Adolf Hitler, respectively). The Eastern cultures seem to prioritize warriors and politicians; Western ones include more cultural (including religious) figures. Interesting findings concern globalization: “While the English Wikipedia includes 80% non-English leaders among the top 50, just two non-Chinese made it into the top 50 of the Chinese Wikipedia … Japanese Wikipedia is slightly more balanced, with almost 40 percent non-Japanese leaders”. Findings for the German Wikipedia are not presented. Though the authors don’t make that point, it seems that no women appear in the Top 10 lists presented. Overall, this seems like an interesting paper (it also received a writeup in Technology Review), through the brief form (two pages) means that many questions about methodology remain unanswered, and the presentation of findings, and analysis, are very curt. On a side note, one can wonder whether this paper is truly related to anthropology; given that the only time this field is referred to in this work is when the authors mention that they are “replacing anthropological fieldwork with statistical analysis of the treatment given by native speakers of a culture to different subjects in Wikipedia.”

See also our earlier coverage of similar studies:

“Wikipedia a reliable learning resource for medical students? Evaluating respiratory topics”

A paper in Advances in Physiology Education[2] claims to assess the suitability of Wikipedia’s respiratory articles for medical student learning. Forty Wikipedia articles on respiratory topics were sampled on 27 April 2014. These articles were assessed by three researchers with a modified version of the DISCERN tool. Article references were checked for accuracy and typography. Readability was assessed with the Flesch–Kincaid and Coleman–Liau tools.

The paper found a wide range of accuracy scores using the modified DISCERN tool, from 14.67 for “[Nail] clubbing” to 38.33 for “Tuberculosis”. Incorrect, incomplete or inconsistent formatting of references were commonly found, although these were not quantified in the paper. Readability of the articles was typically at a college level. On the basis of these findings, the paper declares Wikipedia’s respiratory articles as unsuitable for medical students.

The researcher apparently uses an arbitrary unvalidated modification of the DISCERN tool to assess the accuracy of articles. The nature of this modification is not specified; nor is it available at the journal’s website as claimed in the paper.

The DISCERN tool does not assess accuracy; rather, it is designed to assess “information about treatment choices specifically for health consumers”. As such, the use of this tool is inappropriate to assess the suitability for medical students.

There is no acknowledgement that Wikipedia is an encyclopedia. Several of the DISCERN tool’s questions are unsuitable for an encyclopedia. DISCERN questions such as “Does it describe how each treatment works?” and “Does it describe the risks of each treatment?” would be answered on other Wikipedia pages, not on the disease article’s page. The author makes an a priori assumption that the medical textbooks used for comparison are perfect sources. The author does not assess those textbooks with the DISCERN tool.

The paper states: “[t]he number of citations from peer-reviewed journals published in the last 5 yr was only 312 (19%).” However this is far superior to the number of citations in the textbooks listed. The chapter on “Neoplasms of the lung” in Harrison’s Principles of Internal Medicine (18th ed.) contains no citations at all. Seven sources are listed in its “Further readings” section, of which only one is from the last five years.

The claim that the article on “clubbing … had no references or external links” is incorrect. On 27 April 2014, Wikipedia’s article on “Nail clubbing” had ten references.

Several of the articles are at a rudimentary stage, containing limited information and lacking appropriate references. However two articles, “Lung cancer” and “Diffuse panbronchiolitis“, were assessed by Wikipedia’s editors at the highest standard and awarded “Featured article” status. Five more articles, “Asthma“, “Chronic obstructive pulmonary disease“, “Pneumonia“, “Pneumothorax” and “Tuberculosis“, reached “Good article” standard. These articles are exceptionally detailed, accurate, and well-referenced. Azer’s paper makes no mention of the high quality of these articles.

The research uses an unvalidated tool for an inappropriate purpose without applying a suitable comparator, and inevitably draws incorrect conclusions.

Wikipedia is an encyclopedia. It is not a medical textbook; nor is it intended to replace medical textbooks. Rather, it should be used as a starting point by medical students. The quality of an individual article should be quickly assessed by the reader, and information can be confirmed in the references provided. Missing information should be sought from other sources, such as textbooks. Students should be encouraged to use Wikipedia alongside medical textbooks to assist their learning.

Disclosure: I (Axl) am a Wikipedia editor, a pulmonologist, the main author of Wikipedia’s “Lung cancer” article, and a major contributor to other respiratory articles.

Most academics are not concerned about Wikipedia’s quality – but many think their colleagues are

This recent study[3] is a valuable contribution to the small body of work on academics attitudes towards Wikipedia, and is the largest-scale survey in that field so far, with nearly a 1000 valid responses from the faculty at two Spanish universities. The authors find that Wikipedia is generally held in a positive regard (nearly half of the respondents think it is useful for teaching, while less than 20% disagree; similar numbers use it for general information gathering, though the numbers are split at about 35% on whether they use it for research in their own discipline). Almost 10% of the respondents say they use it frequently for teaching purposes. The numbers of those who discourage students from using it and those who encourage student to consult the site are nearly equal, at about a quarter each. Almost half have no strong feelings on this, and fewer than 15% strongly disagree with students’ use of Wikipedia – suggesting that the past few years have witnessed a major shift in universities (less than a decade ago, the stories of professors banning Wikipedia were quite common). Unsurprisingly, the faculty is much less likely to cite Wikipedia, with only about 10% admitting they do so.

Almost 90% of the academics think Wikipedia is easy to use, but only about 15% think editing is easy – with more than 40% disagreeing with that statement. Some 2% of respondents describe themselves as very frequent contributors to the side, and 6% as frequent. More than 40% have no thoughts on Wikipedia’s editing and reviewing system, which leads the authors to suggest that “most faculty do not actually know Wikipedia‘s specific editing system very well nor the way the [site’s] peer-review process works”. Asked about Wikipedia’s quality, those who think its articles are reliable outnumber those who disagree by two to one (40% to 20%), with an even higher ratio (more than three to one) agreeing that Wikipedia articles are up to date. The respondents are equally divided, however, on whether the articles are comprehensive or not. The authors thus conclude that the impression that most academics are concerned about Wikipedia’s quality is not proven by their data. Nonetheless, the artifacts of Wikipedia early poor reception within academia linger: more than half of the respondents think the use of Wikipedia is frowned on by most academics, even though only 14% say they frown on it themselves.

The study goes beyond presenting simple descriptive statistics, giving us a number of interesting findings based on correlations: strongest correlation for teaching use is related to making edits (r=0.59), followed by opinions that it improves student learning (r=0.47), perception of and use by colleagues (r=0.41), Wikipedia’s perceived quality (r=0.4), and its passive use (r=0.3). The researchers find that the use of Wikipedia is higher, and views of the site more favourable, among the STEM fields than in the “soft”, social sciences. This also explains the Wikipedia’s higher popularity among male instructors (which disappears when controlled for discipline and the corresponding much lower population of women teaching in the STEM fields). Interestingly, the influence of age was not found to be significant: “faculty’s decision to use Wikipedia in learning processes does not follow the usual pattern of other Web 2.0 tools where young people tend to be more frequent users.”

Of immediate practical value to the Wikipedia community are the findings on what would help the respondents design educational activities using Wikipedia: 64% would like to see a “catalog presenting best practices”, with similar numbers (~50%) pointing to “getting greater institutional recognition”, “having colleagues explaining their own experiences”, and “receiving specific training”.

Wikipedia assignments at Finnish secondary schools

A conference paper titled “Guiding Students in Collaborative Writing of Wikipedia Articles – How to Get Beyond the Black Box Practice in Information Literacy Instruction”[4] (already briefly mentioned in our October issue) reports on the use of Wikipedia student assignments in a somewhat different environment than the usual American undergraduates: this one instead deals with Finnish secondary school students. The authors use the guided inquiry framework, postulating that “information literacies are best learned by training appropriate information practices in a genuine collaborative process of inquiry”, and asking how collaborative Wikipedia writing assignments fit into this approach. The findings tie in with the previous research on this subject: students are more motivated than in traditional writing assignments, develop skills in and understanding of wikis and Wikipedia (including its reliability) and more broadly encyclopedic writing. However, students are less likely to develop skills such as identifying reliable sources without specific additional instructions. The researchers note that “the limitation of encyclopaedic writing is that it is not intended to generate new knowledge but to synthesize knowledge from existing sources (i.e., a type of literature review)”; hence teachers who aim to develop skills in generating new knowledge might consider alternative assignments. The paper stresses the need to tailor the Wikipedia assignment (or any other) to the specific class.


Detecting the location of an editing controversy within a page

Researchers at Google, AT&T, Purdue University and the University of Trento have developed[5] an algorithm that “in contrast to previous works in controversy detection in Wikipedia that studied the problem at the page level […] considers the individual edits and can accurately identify not only the exact controversial content within a page, but also what the controversy is about and where it is located.” As an example, the paper names the article about Chopin where “our method detected not only the known controversy about his origin but also the controversies about his date of birth and his photograph by Louis-Auguste Bisson.”

7.8% of Germans use Wikipedia on any given day

In a survey[6] by the German state media authorities, 26.8% of all Germans who had been seeking information on Internet on the preceding day had used Wikipedia for that purpose. In absolute terms, this means that 7.8% of Germans use Wikipedia on any given day to obtain information, compared to 11.2% for Facebook, 8.1% for YouTube, and 6.3% for Twitter.
A separate study[7] found that 40% of German teenagers use Wikipedia daily or several times per week (compared to 38% in 2013[supp 1]).

Vandals’ lack of spelling discipline hampers automatic detection of vulgar words

A student project[8] at the University of Maryland, Baltimore County trained a vandalism detector on the well-known PAN 2010 vandalism corpus. The author concludes that compared to features based on the metadata of the revision (e.g. the size change, or whether the edit was made by an IP editors), or on quantiative features of the inserted text (e.g. the frequency of upper case character), “Language Features provide the least information gain. It is expected that language features would provide the maximum information gain. But the problem is if anyone wants to vandalize a page, he or she would not care to spell the words correctly and so in most cases vulgar/slang dictionaries fall short identifying the bad words. “

New Wikimedia open access policy

At the recent CSCW conference (see also an overview of Wikimedia-related events and presentations there), the Wikimedia Foundation announced its new Open Access Policy to ensure that all research work produced with support from the Foundation will be openly available to the public and reusable on Wikipedia and other Wikimedia sites. See also coverage in this week’s Signpost

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

  • “Reproduction of male power structures in the online encyclopedia Wikipedia” (in German; original title: “Reproduktion männlicher Machtverhältnisse in der Online-Enzyklopädie Wikipedia”)[9]
  • “Links that speak: The global language network and its association with global fame”[10] From the abstract: “we use the structure of the networks connecting multilingual speakers and translated texts, as expressed in book translations, multiple language editions of Wikipedia, and Twitter, to provide a concept of language importance that goes beyond simple economic or demographic measures.” (See also coverage in the Economist)
  • “Queripidia: Query-specific Wikipedia Construction”[11] (demo)
  • “Using Wikipedia to enhance student learning: A case study in economics”[12] (preprint without paywall:[13])
  • “Automatically Assessing Wikipedia Article Quality by Exploiting Article–Editor Networks”[14]
  • “Quality assessment of Arabic web content: The case of the Arabic Wikipedia”[15]
  • “Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions”[16] (see also discussion and published rebuttal[17] by medical Wikipedia editors, and media coverage summary)
  • “Do Experts or Collective Intelligence Write with More Bias? Evidence from Encyclopædia Britannica and Wikipedia”[18] (cf. Harvard Business Review coverage and our reviews of related papers by the same authors: “Language analysis finds Wikipedia’s political bias moving from left to right“, “Given enough eyeballs, do articles become neutral?“)
  • “Improving Wikipedia-based Place Name Disambiguation in Short Texts Using Structured Data from DBpedia”[19]


  1. (2015-02-18) “Cultural Anthropology Through the Lens of Wikipedia – A Comparison of Historical Leadership Networks in the English, Chinese, Japanese and German Wikipedia“. arXiv:1502.05256 [cs]. 
  2. Azer, Samy A. (2015-03-01). “Is Wikipedia a reliable learning resource for medical students? Evaluating respiratory topics“. Advances in Physiology Education 39 (1): 5-14. doi:10.1152/advan.00110.2014. ISSN 1043-4046. PMID 25727464. 
  3. Factors that influence the teaching use of Wikipedia in Higher Education (Article) (2014-12-11).
  4. Sormunen, E. & Alamettälä, T. (2014). Guiding Students in Collaborative Writing of Wikipedia Articles – How to Get Beyond the Black Box Practice in Information Literacy Instruction. In: EdMedia 2014 – World Conference on Educational Media and Technology. Tampere, Finland: June 23-26, 2014
  5. Siarhei Bykau, Flip Korn, Divesh Srivastava,Yannis Velegrakis: Fine-Grained Controversy Detection in Wikipedia.
  6. MedienVielfaltsMonitor Ergebnisse 2. Halbjahr 2014. Die Medienanstalten, Berlin, March 19, 2015 PDF
  7. JIM 2014: Jugend, Information, (Multi-) Media. Medienpädagogischer Forschungsverbund Südwest. Stuttgart, November 2014 PDF (in German, with English summary)
  8. Atul Mirajkar: Predicting Bad Edits to Wikipedia Pages. Master project, University of Maryland, Baltimore County. PDF
  9. Kemper, Andreas; Charlott Schönwetter (2015-01-01). “Reproduktion männlicher Machtverhältnisse in der Online-Enzyklopädie Wikipedia”. In Andreas Heilmann, Gabriele Jähnert, Falko Schnicke, Charlott Schönwetter, Mascha Vollhardt (eds.). Männlichkeit und Reproduktion. Kulturelle Figurationen: Artefakte, Praktiken, Fiktionen. Springer Fachmedien Wiesbaden. pp. 271-290. ISBN 978-3-658-03983-7.  Closed access
  10. Ronen, Shahar (2014-12-15). “Links that speak: The global language network and its association with global fame“. Proceedings of the National Academy of Sciences: 201410931. doi:10.1073/pnas.1410931111. ISSN 0027-8424. PMID 25512502. 
  11. Laura Dietz, Michael Schuhmacher and Simone Paolo Ponzetto: Queripidia: Query-specific Wikipedia Construction PDF
  12. Freire, Tiago (2014-12-23). “Using Wikipedia to enhance student learning: A case study in economics“. Education and Information Technologies: 1-13. doi:10.1007/s10639-014-9374-0. ISSN 1360-2357.  Closed access
  13. Freire, Tiago; Li, Jingping (2014-02-11). “Using Wikipedia to Enhance Student Learning: A Case Study in Economics”. Rochester, NY: Social Science Research Network. 
  14. Li, Xinyi; Tang, Jintao; Wang, Ting; Luo, Zhunchen; Rijke, Maarten de (2015-03-29). “Automatically Assessing Wikipedia Article Quality by Exploiting Article–Editor Networks”. In Allan Hanbury, Gabriella Kazai, Andreas Rauber, Norbert Fuhr (eds.). Advances in Information Retrieval. Lecture Notes in Computer Science. Springer International Publishing. pp. 574-580. ISBN 978-3-319-16353-6.  Closed access Author copy: PDF
  15. Yahya, Adnan; Ali Salhi (2014). “Quality assessment of Arabic web content: The case of the Arabic Wikipedia”. 2014 10th International Conference on Innovations in Information Technology (INNOVATIONS). 2014 10th International Conference on Innovations in Information Technology (INNOVATIONS). pp. 36-41. DOI:10.1109/INNOVATIONS.2014.6987558.  Closed access
  16. (2014-05-01) “Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions“. JAOA: Journal of the American Osteopathic Association 114 (5): 368-373. doi:10.7556/jaoa.2014.035. ISSN 0098-6151. PMID 24778001. 
  17. Anwesh Chatterjee, Robin M.T. Cooke, Ian Furst, James Heilman: Is Wikipedia’s medical content really 90% wrong? Cochrane blog, June 23, 2014
  18. Do Experts or Collective Intelligence Write with More Bias? Evidence from Encyclopædia Britannica and Wikipedia (2014-11-07). HBS Working Paper Number: 15-023, October 2014
  19. Yingjie Hu , Krzysztof Janowicz, Sathya Prasad: Improving Wikipedia-based Place Name Disambiguation in Short Texts Using Structured Data from DBpedia. GIR’14, November 04 2014, Dallas, TX, USA. PDF
Supplementary references and notes:
  1. JIM-STUDIE 2013. Jugend, Information, (Multi-) Media. Medienpädagogischer Forschungsverbund Südwest, 2013 PDF (in German, with English summary)

Wikimedia Research Newsletter
Vol: 5 • Issue: 3 • March 2015
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email @WikiResearch on WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]

Archive notice: This is an archived post from, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

Inline Feedbacks
View all comments

“Disclosure: I (Axl) am a Wikipedia editor, a pulmonologist, the main author of Wikipedia’s “Lung cancer” article, and a major contributor to other respiratory articles.” Yup. Another medical practitioner who believes a cancer article is excellent with information on the underlying mechanisms that can be called, at best, rudimentary. That’s a fundamental problem in many medical articles, and I’ve seen physicians engaged in more than one turf war trying their best to keep those pesky molecular biologists out of “their” articles, insisting that it’s sufficient to describe diagnosis and treatment while keeping the actual physiology on a cursory level. Excellence?… Read more »

tyelko, I accept your criticism that the “Pathophysiology” section of the “Lung cancer” article is lacking sufficient details. This is not an area of expertise for me, so I did my best to summarize the available information from the sources. I have certainly not tried to “keep those pesky molecular biologists out of my article”. Azer’s paper criticizes Wikipedia’s respiratory articles for being “inaccurate” and “unsuitable for medical student learning”. Despite the limited details on pathophysiology, do you believe that the content is inaccurate? Is the “Lung cancer” article unsuitable for medical student learning? I shall have another look at… Read more »

Sorry for replying only now, Axl, but somehow, I must have overlooked the notification for a new comment. My point was not about Azer’s paper but about your pointing out that the “lung cancer” article had been declared “featured”. You ADMIT now that the pathophysiology section is lacking in information, but the article reached featured status anyway – despite the fact that that normally requires that nothing substantial is missing from it. The whole process of “featured” and “good” status is highly problematic, as it is essentially a peer review within a community so small that there’s a vested conflict… Read more »