Wikimedia engineering April 2011 report

Translate This Post

Major news this month include:

  • the completion of the editor survey, which was supported by the engineering staff;
  • major progress on Article Feedback 3.0 and Upload Wizard 1.0, which will both be deployed in May;
  • mobile projects taking off, with field research in India and progress on the new mobile platform;
  • work on the budgeting exercise for the 2011-2012 fiscal year.

Logo of the Berlin hackaton
The tech crowd is preparing for the Berlin meet-up in May.

Events

Upcoming events

  • Berlin Hackathon 2011 (May 13-15, Berlin) — This event will be almost entirely devoted to hacking, with short presentations happening throughout the week-end. The overall schedule is now available. One of the main topics at the Hackathon will be the Parser work planned to support Rich Text Editing features. There has already been much discussion on this topic on the mail lists, indicating great interest in the general MediaWiki community. We are exploring ways to broadcast discussions from the Hackathon.
  • Wikimania (August 2-7, Haifa, Israel) — The engineering staff was encouraged to submit proposals for Wikimania, and they did so on a variety of topics to provide the rest of the community with opportunities to learn and discuss their work.
  • Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites. This page has been populated since mid-April and will be maintained as another way for the community to see what Foundation engineering is accomplishing.

Personnel

Job openings

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
The following positions have opened this month:

The following positions are still open:

In addition, we hope to post the following positions over the next few months:

  • Release Engineer
  • Technical Writer

Short news

Operations

Site operations

Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.

  • Status: In the next few days we expect our connectivity between Tampa and Ashburn to be installed. This will allow all data to be copied to the new data center for backup and fail-over purposes. During the Hackaton in Berlin we will address many of the challenges in making all of our services redundant and making optimal usage of our new data center.
  • Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.

  • Status: A test wiki is pushing new media uploads to the Swift cluster, and producing pages which fetch from it as well. For thumbnail generation, there are too many handlers to try to teach them all about Swift. Thus, this month’s work is on fetching the original into the local filesystem, running the handler, then pushing the thumb that it wrote back into Swift.
  • Program manager: Mark Bergsma

Testing environment

Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).

  • Status: Production hardware is set up, and initial configuration of the software is done. We should have a basic environment demo-able in time for Berlin hack-a-thon. Ryan Lane gave a keynote at the OpenStack Developer’s conference about this in late April (see the slides).
  • Program manager: Mark Bergsma

Backups and data archives

Backups — Improvement of backup coverage of Wikimedia-hosted data.

  • Status: Now connectivity between our two data centers is finally being installed, we can start making use of the new hardware and storage space to ensure full backup coverage of all data. We expect to have live replication or daily backups of all important data by the end of May.
  • Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.

  • Status: We are investigating some kernel messages on the old dumps server, but in the meantime Google has come through with the account setup we need for copying dumps to Google storage. The April run of the English Wikipedia dumps completed in just over two weeks, even after having to rerun two pieces. The next run will be on the new, more powerful server.
  • Program manager: Mark Bergsma

Short news

  • A new search indexer has been installed last month, resolving the space issues we had with the older server.
  • Five new database machines and a new snapshots generation server were installed and are being deployed.
  • We enhanced our WatchMouse setup with more service uptime monitoring and reporting.
  • We upgraded our etherpad software and server.
  • We encountered (and resolved) a few production issues in April:
    • The thumbnail server experienced ZFS problem (memory leak bug) after getting a surge in uploads.
    • Our Squid software upgrade caused caching problem and was reverted.
    • We experienced site performance issues when we enabled click-tracking for our Article Feedback Tool, before we disabled it.

Features Engineering

Content Quality and Editorial Tools

Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.

  • Phase 2: Trevor Parscal implemented the expiration of ratings, added error handling mechanisms and fixed IE bugs. Roan Kattouw reviewed and deployed the code to production.
  • Phase 3: Timo Tijhof and Trevor Parscal implemented the EmailCapture extension, a tool that allows unregistered users rating an article to leave their e-mail address if they want to be contacted later by the Community department. Weekly deployments were performed by Roan Kattouw. The team started to work on the dashboard, a summary page to surface general rating trends.
  • Program manager: Alolita Sharma
Wireframes of the proposed system
The extended review and feedback system will allow to flag specific issues.

Article feedback (extended review) — An interface for quality reviews of Wikipedia content.

  • Status: The specifications and wireframes are now stable, and this phase of the project is considered completed. The system will provide an expanded interface for readers to provide feedback, to praise authors and to report abuse. A “quality page” will show aggregated summary data, as well as a list of reviews & praise. Users will be able to promote particularly relevant reviews to the talk page of the article reviewed. The system will also include a mechanism for credentialed experts belonging to a specific organization to attach their credentials to the review.
  • Commissioned by: Erik Möller

FlaggedRevs — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.

  • Status: Aaron Schulz continued to refactor the extension and to fix bugs. He improved the API error messages, added other features to the API, and worked on performance improvements.
  • Program manager: Alolita Sharma

Discussions and Interactions

Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.

  • Status: Lead developer Andrew Garrett wrote a new object model for LiquidThreads, with support for channels, topics, posts, summaries and respective version objects. He also began work on integrating the new object model with the rest of the extension, starting with a more maintainable reimplementation of the display layer. Integration with the rest of the extension proved to require more time than expected, and an updated schedule was published.
  • Program manager: Alolita Sharma

WikiLove — An extension to encourage praise and virtual gifts between users.

Multimedia Tools

Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.

  • Status: Neil Kandalgaonkar and Ryan Kaldari continued to fix bugs, and added new features as well. The upload wizard now offers a configurable & localizable license picker; it is also possible to abort uploads, and recover from errors. A built-in feedback form allows users to report bugs and issues directly. Roan Kattouw reviewed and deployed the new code every week in April. Last, Brandon Harris provided design recommendations to improve the interface.
  • Program manager: Alolita Sharma

Engineering support

Editor survey — Integration work between LimeSurvey and MediaWiki to support the Editor survey.

  • Status: The Editor survey conducted by the Global development department was supported by Ryan Kaldari and Arthur Richards in early April. Ryan completed the CentralNotice hooks to display the banner once for every logged in user, and developed the link between the respondents and their user login. Arthur worked on the back-end and infrastructure of the software, LimeSurvey.
  • Program manager: Alolita Sharma

Other projects

Wikimedia Labs

Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.

Special projects

Mobile

Mobile projects — All things Mobile and Wikimedia.

  • Status: We expanded on the features and roadmap definition, and started diagramming the interface. We are currently reviewing a features list with the community.
  • Program manager: Tomasz Finc

Mobile Research — A research project to help determine our Mobile strategy.

  • Status Mani Pande and Parul Vora led a series of 30 interviews in New Delhi and Bangalore with Wikipedia readers and editors to assess mobile user experience and needs. Preliminary results indicate that many respondents prefer to access Wikipedia on their phone instead on their computer. However, technical and editorial issues can make this difficult; for example, limited bandwidth causes articles with many or large images to load very slowly. Readers also stated that scrolling was tedious, and emphasized their preference for good introductory summaries. Last, users expressed the wish to be able to download Wikipedia articles or save them to read later.
  • Program manager: Tomasz Finc

Mobile site rewrite — Port of our Ruby-based mobile gateway to PHP.

Fundraising support

2011 Fundraiser — Support and development for the annual fundraiser of the Foundation.

Offline

Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.

  • Status: Yuvi Panda was accepted as one the Foundation’s students for Google Summer of Code. Arthur Richards will be mentoring him to port the existing collection tools to a Mediawiki extension.
  • Program manager: Tomasz Finc

OpenZim for Collections — Integration of openZim into the Collections extension.

  • Status: PediaPress patched the current extension so that generated openZIm fields now have a navigable table of contents. More than a thousand openZim files were downloaded in the last week alone.
  • Program manager: Tomasz Finc
The new download manager allows to browse collections.
The new download manager allows to browse collections.

Kiwix — Improvement of the user experience of the Kiwix app to access offline Wikimedia content.

  • Status: We ran through a second development sprint, working on a content download manager to download offline archives directly from Kiwix (see mockups). We also worked with the Wikimedia operations team to connect the content manager with the Wikimedia infrastructure. Since Wikimedia adopted this project, downloads of Kiwix have almost doubled.
  • Program manager: Tomasz Finc

General Engineering

MediaWiki development and tools

MediaWiki 1.17 release — The upcoming MediaWiki release.

Code review — Review of changes made to the MediaWiki code.

Bugmeistering — Management of our bug tracker.

  • Status: Mark Hershberger reached out to other open-source communities (like Mozilla) to look for best practices in bug management and workflow; he started to experiment with a new “unprioritized” value for the “priority” field. He has also been organizing weekly bug triage sessions at different times to allow for participation from different timezones.
  • Program manager: Rob Lanphier

Summer of Code 2011 — A sponsored community program allowing students to join the community as developers.

  • Status: More than 25 proposals were submitted. Sumana Harihareswara announced the eight students and projects that were selected for this year’s Google Summer of Code. The projects include interface improvements using AJAX, extension release management and work on Semantic MediaWiki. Students and mentors have now entered the “community bonding” period. (Read more.)
  • Program manager: Rob Lanphier

Parser & gadgets — Groundwork for the next generation visual editor of MediaWiki.

  • Status: Brion Vibber is laying the groundwork for exploratory tools for the upcoming parser work, integral to the future Visual editor. He created a JavaScript tool to compare the parse tree and output of several parsers. On a related note, he also worked on tools to facilitate the development and use of gadgets, for example by embedding a JavaScript syntax highlighting editor.
  • Program manager: Rob Lanphier

Performance optimization

PoolCounter — A MediaWiki extension to avoid parser deadlocks on high-traffic pages.

  • Status: This extension was deployed and is now in production. We’ve observed a reduction of roughly 2% in total parse time due to the pool counter being active. We believe the biggest benefits come when there is a lot of editing and view traffic directed at a single page. While we don’t have good metrics yet for proving that assertion, we’ve had a few major events that might have triggered performance issues prior to PoolCounter that didn’t pose a problem for us.
  • Program manager: Rob Lanphier

Disk-backed object cache — Deployment of a disk-backed object cache to increase the parser cache hit ratio.

  • Status: Issues that arose during the testing of EHcache convinced Tim Starling to use another tool. His next trial will involve implementing a thin caching layer on top of a MySQL-based disk store. Implementation is planned to happen after the MediaWiki 1.17 release.
  • Program manager: Rob Lanphier

Wikimedia analytics

udp2log — A custom data analytics logging system.

  • Status: Nimish Gautam completed a patch for our Squids to implement multicast logging, but issues with the Squids upgrade (which caused a site outage) delayed the deployment of the patch. The operations team is now reviewing the patches to diagnose the issue before redeploying.
  • Program manager: Rob Lanphier

A/B testing — A set of tools to perform A/B testing on Wikimedia sites.

Technical communications

Development process improvement — A project to increase transparency and organize Wikimedia Foundation’s engineering efforts more efficiently.

A collage of the four banners used for the different blogs
Wikimedia now offers topic-specific blogs.

Wikimedia blog overhaul — A project to consolidate and improve the Wikimedia blogs.

  • Status: Rob Halsell set up a test blog and documented the blog our configuration management system. This will facilitate backups and redeployment, and further streamlining and automating our operations processes. Technical issues with the back-end delayed the implementation, but Rob resolved them with Ryan Lane‘s help. Deployment happened on May 7 and went smoothly. This project is considered completed, even though we’ll continue to improve the blogs incrementally in the future (read more).
  • Project manager: Guillaume Paumier

Other projects

  • Bugzilla upgrade to 4.0 — Priyanka Dhanda fixed a few bugs following the upgrade to Bugzilla 4.0 back in March. Our bug tracker is pretty stable now.
  • OpenWebAnalytics — Integrating a full-fledged OWA framework with our infrastructure proved to be difficult, so we decided to scale down our efforts. A postmortem will be published, notably to help the new dedicated analytics team decide if they want to use individual components of OWA for specific uses like heatmaps.
  • API maintenance — Besides general maintenance and bug fixing, Sam Reed started work on app-level system health monitoring, by creating a job queue monitor.
  • Shell bugs — Mark Hershberger organized triage meetings specifically for shell bugs, with Priyanka Dhanda, Rob Halsell and CT Woo. Priyanka and Rob have been moving through the backlog of issues filed there.
  • Access to Subversion — The team (composed of Rob Lanphier, Priyanka Dhanda, Chad Horohoe and Tim Starling) are now meeting briefly every Wednesday to go through the commit access requests.
  • Migration to Git — The migration to git will be a major topic of discussion during the upcoming Berlin Hackathon.
  • Heterogeneous deployment — Priyanka Dhanda is working on a project plan. Implementation is scheduled to happen after the deployment of the disk cache component.
  • Report card — Erik Zachte, Nimish Gautam and Erik Möller are investigating visualization toolkits to use in the report card (a monthly report of key metrics to measure community health). Additionally, they are streamlining and modularizing the report creation process.
  • HipHop support — Tim Starling implemented basic support for HipHop for PHP in MediaWiki, and invited other developers to improve and continue his work. We will pick this work back up later this year after the completion of some of the other projects above.

This article was written by Mark Bergsma, Tomasz Finc, Danese Cooper, Alolita Sharma, CT Woo, Rob Lanphier & Guillaume Paumier. See full revision history. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?