Growing free knowledge through open data

This Sankey diagram shows how readers reach the English Wikipedia article about London and where they go from there, based on the Wikipedia Clickstream data set. Graph by Ellery Wulczyn and Dario Taraborelli, CC0.

Wikipedia and Wikimedia projects are among the most visited repositories of human knowledge. They are also a unique source of data for understanding how we collaborate to create that knowledge, access it and share it with others.
The Wikimedia Foundation’s Research and Data Team has recently published a number of open data sets about Wikimedia projects, making them freely available to everyone – researchers, developers and community members – under a CC0 license. These aggregate data sets were collected to show general trends about how people use Wikimedia projects and do not include any personal information about users, as required by Wikimedia’s privacy policy.
We invite you to turn this data into useful insights, applications and visualizations, and help our communities and projects thrive. If you have any questions on these releases, feel free to reach out to the Research and Data team via the Analytics mailing list or our #wikimedia-research channel on IRC.
Dario Taraborelli
Senior Research Scientist, Research and Data Team Lead
Wikimedia Foundation
Open Data Sets
Scholarly citations in Wikipedia
A data set of citations to scholarly articles in the English Wikipedia. Includes all citations with DOIs and PubMed identifiers added to Wikipedia articles as of the most recent content dump.
Halfaker, A., Taraborelli, D. (2015). Scholarly article citations in Wikipedia. figshare.
doi:10.6084/m9.figshare.1299540
Wikipedia clickstream
This data set shows how people get to a Wikipedia article and what links they click on next. The most recent release captures 22 million pairs (referer, resource), extracted from a total of 3.2 billion requests to the English Wikipedia. We wrote a step-by-step tutorial and IPython notebook to get you started with this data.
Wulczyn, E., Taraborelli, D. (2015). Wikipedia Clickstream. figshare.
doi:10.6084/m9.figshare.1305770
Browser choices of Wikimedia users
This data set provides statistics on the top browsers and platforms used by readers and editors on Wikimedia projects, obtained from the Wikimedia HTTP request logs during a 90-day window. You can also explore this data online via this application.
Keyes, O. (2015). Browser Choices of Wikimedia Readers and Editors. figshare.
doi:10.6084/m9.figshare.1326739
Where in the world is Wikipedia?
This data set includes the proportion of traffic to Wikimedia projects originating from a specific country, computed from all HTTP requests collected over the course of 2014. You can also explore this data online via this application.
Keyes, O. (2015). Geographic Distribution of Wikimedia Traffic. figshare.
doi:10.6084/m9.figshare.1317408
Wikipedia Article Feedback corpus
The Article Feedback experiment invited readers to participate on Wikipedia by leaving comments on articles, to help editors improve them. This data set includes over 1.5 million messages posted to the English, French and German Wikipedia during the pilot.
Florin, F., Mullie, M., Taraborelli, D. (2014). Wikipedia Article Feedback corpus. figshare.
doi:10.6084/m9.figshare.1277784

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

2 Comments

Inline Feedbacks

View all comments

juliamorgan987

9 years ago

#23677

Google also using content of Wikipedia to display in knowledge graph

I Congresso Científico Brasileiro da Wikipédia discute pesquisa e disseminação de conteúdo | IBPAD

7 years ago

#23678

[…] de pesquisa. Para além disso, também preocupam-se em disponibilizar em livre acesso seus dados e pesquisa para que outras pessoas possam produzir trabalhos e fomentar a discussão sobre os […]

Diff

Welcome to Diff

Subscribe to Diff via Email

Wikimania Katowice

Wikimedia CEE Meeting 2024

Celtic Knot 2024

Wikimedia Foundation News

Wikimedia Technology Blog

Down the Rabbit Hole

	This comment is spam
	This comment is a violation of the Code of Conduct
	Other

Can you help us translate this article?

Related

Related

Subscribe to Diff via Email