Wikimedia engineering March 2011 report

Major news this month include:

  • The publication of a Product whitepaper by the Strategic product team (and the associated update from Sue Gardner) that will guide future engineering efforts.
  • The return of Brion Vibber, Wikimedia’s first employee, as Lead Architect for MediaWiki.
  • The deployment of Article Feedback 2.0 to the English Wikipedia, and of Upload Wizard 1.0 to Wikimedia Commons.

Events

Upcoming events

  • Berlin Hackathon 2011 (May 13-15, Berlin) — Daniel Kinzler announced the dates and location of the Berlin Hackathon. Registration is open until April 10. Participants are also listing topics to work on.
  • Summer of Code 2011 — Sumana Harihareswara sent a call for students for the upcoming summer of code. Developers are now signing up as students and mentors, and projects are being discussed. Read the dedicated article to learn more and join us.
  • Wikimania (August 2-7, Haifa, Israel) — This year’s Wikimania will be preceded by two days of hacking (August 2-3); the actual conference (August 4-7) will also include Technology tracks.

Personnel

Job openings

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
The following positions have opened this month:

The following positions are still open:

In addition, we hope to post the following positions over the next few months:

  • Rich Text Editor Engineer
  • Release Engineer
  • Technical Writer

Short news

Operations

Site operations

Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.

  • Status: The last pieces of hardware arrived at the data center and were racked. The network routers and switches were setup, and the configuration is about 60% done. The first servers are being brought up while we wait for our network connectivity to be installed. We expect to be able to serve limited live traffic and services starting in May.
  • Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.

  • Status: A test cluster of three machines running OpenStack Swift will soon be deployed, and will serve a small portion of media traffic. Contractor Russ Nelson is also developing MediaWiki FileRepo support for Swift, so new media uploads can be pushed to the Swift cluster directly.
  • Program manager: Mark Bergsma

Testing environment

Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).

  • Status: The deployment of the virtualization test cluster hardware (which was slightly delayed) is now ready for service. Ryan Lane released version 1.2 of his OpenStackManager extension and created detailed documentation on the setup. He will be finishing the deployment of the virtual test cluster in the first weeks of April.
  • Program manager: Mark Bergsma

Backups and data archives

Backups — Improvement of backup coverage of Wikimedia-hosted data.

  • Status: Backup coverage of Wikimedia hosted data will see a major increase as soon as connectivity between our two primary data centers is available and data can be copied and replicated. As reliability, fail-over and backup are the primary goals of the new primary data center, setting up live replicas and frequent backups of all our data will have the highest priority of service deployments there.
  • Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.

  • Status: The dumps server is back, hardware repaired and running, and we have started to move data over as a live backup of the XML dumps. The new server for the English Wikipedia dumps arrived and is being set up.The January run of the English Wikipedia dumps completed in March and the history files are available for download in two formats. The March run is almost complete and the history files are ready for download in one format already. We’re also working with Google to enable regular mirroring of the most recent dumps to Google storage for download.
  • Program manager: Mark Bergsma

Short news

  • Thumbnail issues — Our existing, non-scalable media storage architecture hit a performance limit again, which caused image thumbnail download slowdowns around Monday March 28th. This is a known problem that will finally be resolved by our Media Storage redesign described above. In the meantime, we have been working on fixing the existing problems by fine tuning the performance and behavior of the existing systems, and increasing the memory capacity of the current media servers. We are also working on deploying a second thumbnail server to take on some load, as a temporary solution.

Features Engineering

Content Quality and Editorial Tools


Phase 2 of the Article feedback feature

Article Feedback (phase 2) — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.

Article feedback (extended review) — An interface for quality reviews of Wikipedia content.

  • Status: The “Open wiki review system” is now considered as a possible evolution of the Article feedback feature. It would offer an interface to submit detailed quality reviews, as well as a system to sort and assess reviews. Ways to surface quality indicators for readers are also being explored.
  • Commissioned by: Erik Möller

Pending Changes — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.

  • Status: Development is in maintenance mode; work will resume when developer resources become available, and after the English Wikipedia community makes a decision regarding the future of the this trial. Steven Walling requested additional data to help the community come to a consensus.
  • Program manager: Alolita Sharma

Personal image filter — A feature to allow users to selectively hide media files on a wiki.

Discussions and Interactions


Interface for giving a kitten with the Wikilove script

Wikilove 0.1 — A user script to encourage praise and virtual gifts between users.

  • Status: Because many automated patrolling tools and gadgets are focused on making it easy to warn or reprimand users, Ryan Kaldari wrote a user script to facilitate nice behavior between editors. For example, it is now possible, on the English Wikipedia and other wikis, to give a “virtual kitten” to another editor. The script was adapted for use by the Russian and Tamil communities, and Ryan is helping support other communities willing to use it.
  • Program manager: Alolita Sharma

Multimedia Tools

Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.

Community feature prototyping

As the first engineer “embedded” in the Community department, Trevor Parscal completed the first experiment, related to the location and appearance of the edit link. The results are not available yet, but will be published in the coming weeks. He’s now turning to the account creation improvement project (and the associated A/B testing) with Frank Schulenburg & Lennart Guldbrandsson.
Nimish Gautam and Roan Kattouw also provided support for the A/B testing and deployment respectively.

Engineering support

Editor survey — Integration work between LimeSurvey and MediaWiki to support

  • Status: In preparation for the upcoming Editors survey conducted by the Global development department, work was done to integrate the survey software (LimeSurvey) with Wikimedia’s infrastructure. Arthur Richards and Nimish Gautam worked on the back-end to allow LimeSurvey to pull information directly from our database, and automatically provide useful stats about editors, hence simplifying and shortening the survey. Ryan Kaldari worked on integrating LimeSurvey with CentralNotice.
  • Program manager: Alolita Sharma

Other projects

Example form used in the proposed style guide

Wikimedia Labs

Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.

General Engineering

MediaWiki development and tools

MediaWiki 1.17 release — The upcoming MediaWiki release.

  • Status: Developers continued to fix bugs discovered after the deployment of MediaWiki 1.17 to Wikimedia sites. A few issues remain, notably related to the new installer and the support of alternative database management systems. We plan to release a beta in early April.
  • Program manager: Rob Lanphier

Code review — Review of changes made to the MediaWiki code.

  • Status: After the 1.17 code review sprint, the number of unreviewed new revisions started to increase again (see the automatically generated chart). Mark Hershberger started to assign name tags to revisions, to help developers track reviews that are requested from them.
  • Program manager: Rob Lanphier

Bugzilla 4.0 upgrade — Upgrade of our bug tracker to the latest version of Bugzilla.

  • Status: Priyanka Dhanda coordinated with Rob Halsell to prepare for the upgrade. A prototype was set up, the Vector skin was cleaned up, and some old tweaks were moved into extensions. Chad Horohoe also used the prototype to try out a summary report script shared by the KDE community.
  • Program manager: Rob Lanphier

Performance optimization

PoolCounter — A MediaWiki extension to avoid parser deadlocks on high-traffic pages.

  • Status: Tim Starling deployed this extension, written by Platonides to controls the number of simultaneous parses that happen on a single page (to avoid the “Michael Jackson” effect). It was later disabled because of a bug now fixed; Platonides also added integrated statistics to this tool. We plan a second deployment attempt early the week of April 4.
  • Program manager: Rob Lanphier

Ehcache deployment — Deployment of a disk-backed object cache to increase parser cache hit ratio.

  • Status: Tim Starling investigated Wikimedia’s low parser cache hit ratio and suggested to increase the parser cache size to reduce Apache CPU usage. After researching available options for disk-backed object caches, he selected EHcache and wrote a MediaWiki client for it. Our test deployments showed promising results, but also surfaced additional problems that we need to sort out.
  • Program manager: Rob Lanphier

Wikimedia analytics

udp2log — A custom data analytics logging system.

  • Status: A second logging machine was installed and a load balancer set up to handle the amount of data. Data is now being collected, sampled, filtered and cleaned up. The long-term plan is still to use multicast, in order to allow for growth.
  • Program manager: Rob Lanphier

A/B testing — A set of tools to perform A/B testing on Wikimedia sites.

Report card — A monthly report of key metrics to measure community health.

  • Status: Erik Zachte tweaked his code on page view statistics. Future improvements include mining the CentralAuth database to identify accounts of the same user across wikis, and use this information to refine editor counts.
  • Program managers: Rob Lanphier

Technical communications

Development process improvement — A project to increase transparency and organize Wikimedia Foundation’s engineering efforts more efficiently.

  • Status: Guillaume Paumier revived this project and focused on summary pages and versions & phases for Wikimedia-funded engineering projects. The goal is to make it easier to find this information and keep it up-to-date, for the benefit of staff, volunteer developers and users.
  • Program manager: Rob Lanphier

Wikimedia blog overhaul — A project to consolidate and improve the Wikimedia blogs.

  • Status: After assessing the current situation of Wikimedia blogs, Guillaume Paumier worked with the Communications team, and other departments, to collect requirements. A technical proposal was then created and a prototype set up. Implementation should now happen shortly.
  • Project manager: Guillaume Paumier

Other projects

  • MediaWiki 1.17 deployment — Some bugs and other minor issues were fixed following the deployment of MediaWiki 1.17 to Wikimedia sites.
  • Test framework deployment — Work on this automated test environment for MediaWiki (based on Selenium and PHPUnit) is currently on hold. It will resume when the virtualization cluster is in place, and resources become available.
  • OpenWebAnalytics — We’re wrapping up our work on OWA until we’re able to hire our new dedicated analytics team. In the short term, we’re focusing our efforts on A/B testing and other immediate needs, allowing the future analytics team to map out a long-term strategy.
  • API maintenanceSam Reed continued to work on the backlog of bugs and feature requests. He is also investigating appropriate APIs for monitoring system health.
  • Shell bugs — Site requests that require shell access to the servers are mostly handled by Rob Halsell and a few dedicated volunteers. Priyanka Dhanda is going to join the team and help out where possible.
  • Access to SubversionRob Lanphier, Priyanka Dhanda and Chad Horohoe have joined Tim Starling to handle requests for commit access to Subversion.
  • Migration to Git — Migrating from Subversion to Git was discussed on the wikitech-l list and issues were raised. The engineering staff is interested in supporting this migration once consensus is formed amongst developers.
  • Heterogeneous deployment — The deployment of MediaWiki 1.17 across Wikimedia sites confirmed the need for a way to target software changes and upgrades to specific sets of wikis. Progress is expected to be done by the deployment of MediaWiki 1.18.
  • Software deployments tracking — A new page on the wikitech wiki is now tracking recent and upcoming software changes, besides the server admin log.
  • WikistatsErik Zachte checked in the source code of many of his tools (that provide general statistics on Wikimedia wikis) into our code versioning system.

Mobile

Mobile — All things Mobile and Wikimedia.

Offline

Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.

  • Status: We finished assessing the existing tools and are actively working with their original author (User:CBM) to plan our next steps. The project is going to focus on making it easier to create collections for schools, and is an excellent fit for a Summer of Code project. We are also discussing with one of the most active offline project members (User:Walkerma) to make sure our use cases are capturing what’s needed.
  • Program manager: Tomasz Finc

OpenZim for Collections — Integration of OpenZim into the Collections extension.

  • Status: After a successful deployment, we collected both email feedback and bugs. We are now exploring where else we might engage with PediaPress for further work to improve the workflow of our offline projects.
  • Program manager: Tomasz Finc

Kiwix UX study — Evaluation of the user experience of the Kiwix mobile app to access offline Wikimedia content.

  • Status: We finished our first development sprint of the Kiwix UX improvements. Our next step is to work with testers from Wikimedia Kenya, Wikimedia India and WMF staff members to find bugs in the beta. If you would like to help us, please sign up as a tester. We’re now looking at adding an integrated download manager to facilitate the download of new openZim collections.
  • Program manager: Tomasz Finc

This article was written by Mark Bergsma, Tomasz Finc, Alolita Sharma, CT Woo, Rob Lanphier & Guillaume Paumier. See the full revision history. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

2 Comments
Inline Feedbacks
View all comments

Awesome summary of the ongoing projects. I feel enlightened 🙂

thanks very nice !