Over the past few months, the Wikimedia Foundation has been gearing up a variety of new initiatives, and measuring success has been on our minds. It should come as no surprise that we’ve been building an Analytics Team at the same time. We are excited to finally introduce ourselves and talk about our plans.
We’ve got quite a few projects under way (and many more ideas), and we’d like to briefly go over them — expect more posts in the future with deeper details on each.
First up: a revamp of the Wikimedia Report Card. This dashboard gives an overview of key metrics representing the health and success of the movement: pageviews, unique visitors, number of active editors, and the like.
Kraken: A Data Services Platform
But we have bigger plans. Epic plans. Mythical plans. A generic computational cluster for data analytics, which we affectionately call Kraken: a unified platform to aggregate, store, analyze, and query all incoming data of interest to the community, built so as to keep pace with our movement’s ample motivation and energy.
How many Android users are there in India that visit more than ten times per month? Is there a significant difference in the popularity of mobile OS’s between large cities and rural areas of India? Do Portuguese and Brazilian readers favour different content categories? How often are GLAM pictures displayed off-site, outside of Wikipedia (and where)?
As it stands, answering any of these questions is, at best, tedious and hard. Usually, it’s impossible. The size of the success of Wikimedia projects is a double-edged sword, in that it makes even modest data analysis a significant task. This is something we aim to fix with Kraken.
More urgently, however, we don’t presently have infrastructure to do A/B testing, measure the impact of outreach projects, or give editors insight into the readers they reach with their contributions. From this view, the platform is a robust, unified toolkit for exploring these data streams, as well as a means of providing everyone with better information for evaluating the success of features large and small.
This points toward our overarching vision. Long-term, we aim to give the Wikimedia movement a true data services platform: a cluster capable of providing realtime insight into community activity and a new view of humanity’s knowledge to power applications, mash up into websites, and stream to devices.
Privacy: Counting not Tracking
The Kraken is a mythical Nordic monster with many tentacles, much like any analytics system: analytics touches everything — from instrumenting mobile apps to new user conversion analysis to counting parser cache lookups — and it needs a big gaping maw to keep up with all the data coming in. Unfortunately, history teaches us that mythical cephalopods aren’t terribly good at privacy. We aim to change that.
We’ll be talking a lot more about the technical details of the system we’re building, so check back in case you’re interested or reach out to us if you want to provide feedback about how to best use the data to answer lots of interesting questions while still preserving users’ privacy. This post only scratches the surface, but we’ve got lots more to discuss.
Talk to Us!
Sound exciting? Have questions, ideas, or suggestions? Well then! Consider joining the Analytics mailing list or #wikimedia-analytics on Freenode (IRC). And of course you’re also very welcome to send me email directly.
We’re definitely excited about where things are going, and we are looking forward to keeping you all up to speed on all our new developments.
Finally, we are hosting our first Analytics IRC office hours! Join us on July 30th, at 12pm PDT (3pm EDT / 9pm CEST) in #wikimedia-analytics to ask all your analytics and statistics related questions about Wikipedia and the other Wikimedia projects.
David Schoonover, Analytics Engineer
Andrew Otto, Analytics Engineer
Erik Zachte, Data Analyst
Diederik van Liere, Product Manager