Data Competition: Announcing the Wikipedia Participation Challenge

We are pleased to announce the launch of the Wikipedia Participation Challenge, a data modeling competition to develop an algorithm that predicts future editing activity on Wikipedia. The competition is hosted by Kaggle, a platform for data modeling and prediction competitions.  The Participation Challenge is open to community members and anyone else who is interested in analyzing Wikipedia data.  This is the first of two data competitions the Wikimedia Foundation will sponsor this year.
The goal of this competition is to gain a better understanding of the factors that encourage or discourage people from editing Wikipedia. Increasing the number of active editors is one of our strategic priorities. Both the Wikipedia communities and the Wikimedia Foundation stand to benefit from models that quantify the factors that determine whether a Wikipedia editor is likely to continue contributing. The competition asks contestants to develop a model to predict the number of edits a given editor will make in six month’s time.
The data used in this competition comes from the publicly available English Wikipedia XML data dump.  An anonymous donor has generously contributed $10,000 as prize money. There will be a Grand Prize for the best prediction, as well as special prizes awarded for the use of open source software. The Grand Prize winner will also be given the opportunity to present their prediction model at the 2011 IEEE International Conference on Data Mining.  The competition starts today and will continue until September 20, 2011.
Head over to our competition portal, download the data, and start crunching the data! And don’t forget to follow us on Twitter: #wikichallenge and @dvanliere.
Howie Fung
Senior Product Manager, Wikimedia Foundation
Diederik van Liere
Research Consultant, Wikimedia Foundation

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

10 Comments
Inline Feedbacks
View all comments

We can’t get pending changes (or any initiative to prevent the massive vandalism that occurs daily), but things like this and http://www.wired.co.uk/news/archive/2011-01/10/making-wikipedia-more-welcoming is the focus.

RE JS Pending changes is available, it is just that the community on the English language wiki rejected it. I would like to see it implemented, or alternatively some sort of flagged revisions as has been deployed in many language versions of wikipedia such as German. But we can’t blame the foundation if one of its projects opts out of an anti-vandalism tool, nor should we exaggerate and say we can’t get any initiative to deal with vandalism when so much has been done in recent years. The improved edit filters are preventing much if not most vandalism from happening… Read more »

JS, massive vandalism by anons is easy to deal with. The bots have massively cut down on the load, as has widespread rollbacker and undo. After a few months, you don’t even notice it. What you notice are other editors fact-bombing your articles, putting them up for deletion, chopping them up in futile reformattings, and deleting random lines. That’s the incredibly dispiriting part of working on Wikipedia. Look at the parting messages in WP:MISSING. How many of them are complaining about the petty anonymous vandalism? Now how many sound like my little summary just now… I will be *very* interested… Read more »

Reading more, I’m pretty troubled by the selection of data: http://www.kaggle.com/c/wikichallenge/forums/t/674/sampling-approach What’s the point of predicting only about recent editors, whose ranks have already been thoroughly harrowed by the endless tightening of policy and rise of deletionists? Wikipedia already has a horrendous reputation for screwing over contributors*, so anyone who does much editing (and whose departure would be noticed by the criterion) is self-selecting now. * just the other day cryonics researcher Mike Darwin told me he had no interest in contributing because he was sure all his contributions would be reverted under an extremely narrow reading of WP:RS, and… Read more »

[…] WikiViz 2011 is the second of two data challenges the Wikimedia Foundation is organizing this summer. If you are interesting in building predictive models of Wikipedia editor activity, check out the Wikipedia participation challenge […]

[…] WikiViz 2011 is the second of two data challenges the Wikimedia Foundation is organizing this summer. If you are interesting in building predictive models of Wikipedia editor activity, check out the Wikipedia participation challenge […]

[…] – WikiMedia announces ‘a data modeling competition to develop an algorithm that predicts future editing activity on […]

Here you go: Are you nerdy? If yes, Wikipedia editting += 3 Are you in a relationship? If yes, Wikipedia editting -= 2 Do you have kids? If yes, Wikipedia editting -= (number of kids) Are you currently out of work? If yes, Wikipedia editting += 2 Are you upper or middle class? If yes, Wikipedia editting += 1 Are you a young male between the age of 15 and 35? If yes, Wikipedia editting += 2 Listen, WMF, it’s obvious: people edit Wikipedia if they care, if they have the time, and if they can have the ability. A… Read more »

Jason, what you offer is speculation, not data modeling, which is a whole different game. Your cynicism and preemptive, gratuitous over-generalizations are unwarranted. Data modeling is about trying to produce an abstract representation of patterns that demonstrably ARE in the data. The validity of a model is measurable, in several ways. The interpretation is dependent upon what addition assumptions one brings to the model, and like all argumentation is open to analysis and critique. Why do you appear to assume bad faith relative to this who initiative? The WMF people I know are earnest, eager, serious, hard working, and want… Read more »

[…] particular one that correctly predicts who will stop editing and who will continue to edit (see the call for submissions). The response was overwhelming, with 96 participating teams, comprising in total 193 people who […]