Wikimedia engineering March 2012 report

Translate this post

Major news in March include:

Hover your mouse over the green question marks ([?]) to see the description of a particular project.

Engineering metrics in March:

Some metrics were disrupted this month by the move to git.

Events

Recent events

  • Chennai Hackathon (17 March 2012, Chennai, India) — Yuvaraj Pandian and volunteer Srikanthlogic held this one-day hackathon for experienced developers. Yuvaraj’s report praised the 21 participants for coming up with 13 completed hacks, including 2 core MediaWiki patches, 3 Tamil Wikipedia userscript updates, and 2 new deployed tools.

Upcoming events

  • Berlin hackathon (1–3 June 2012, Berlin, Germany) — Registration opened in March for this three-day “inreach” hackathon for the Wikimedia technical community, including MediaWiki developers, Toolserver users, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The event, hosted by Wikimedia Deutschland, will mostly involve focused sprints, bugbashing, and other coding, with a few focused tutorials and trainings on Git, Lua, Gadgets changes, or other topics of interest. Wikimedia Deutschland will also use this event to consult on and discuss the Wikidata structured data project. Developers are encouraged to register now, and to mention in the registration form if they will need financial
    subsidies or help with accommodation or visa. Developers who will need that sort of assistance are urged to register as soon as possible, preferably before May 1st.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

New hires

  • Pau Giner joined the Product team as Interaction Designer (announcement).
  • Wikimedia Deutschland announced the composition of their team working on the Wikidata project (announcement).

Operations

Site infrastructure

  • Ashburn data center [?]Mark Bergsma completed the Squid to Varnish conversion for image caching, and successfully deployed Varnish on 8 servers in our Ashburn data center for about half a day. During that time, he monitored and assessed the behavior of the software and the impact on the servers. Where there are currently 24 Squid servers, 8 varnish servers would provide sufficient capacity to replace them. However, there are concerns about overloading the NIC cards and the risk of concentrating too much cache on each server. Mark is now working on improving the Varnish implementation and possibly adding a few more servers. Also, because the Ashburn data center seem to be experiencing a higher server outage ratio than the Tampa site, Rob Halsell reviewed and added extra earth grounding to the cabinets, as a precaution. We are monitoring the situation to see if that does reduce server issues. Peter Youngmeister and Jeff Green are making good progress in testing, preparing and bringing up the Ashburn Search clusters. Full scale testing has just started and results have been quite promising. In the coming weeks, they will conduct limited trial deployments of the Ashburn clusters, running in parallel to the ones at the Tampa data center. The Ashburn data center added network peering, and Leslie Carr peered over 10 other big sites/ISPs with our network shortly after that, thus reducing latency especially to Europe, Japan and Hong Kong
    for many of our users there (and reducing bandwidth costs too).
  • Amsterdam data center [?]Mark Bergsma restacked, re-arranged and decommissioned servers, and started racking the new router and switches. The actual network switchover at Evoswitch is still to be scheduled; however, Mark did replace the old core router in Vancis and deployed the new one there.
  • Media Storage [?] — After addressing earlier issues with the Swift deployment, Ben Hartshorne re-deployed it and it has been stable since. Ben removed the original testing hardware from the cluster and added the final production node to the cluster, bring a total of 5 new Swift nodes to be the thumbnails object store at Tampa. Swift is also now running in the Labs environment and ready to be used by other Labs projects that interact with Swift in production. Volunteer attention to the Swift Labs cluster is welcome to
    improve monitoring, analyze the configuration, and in any other way understand this component of our infrastructure better.

Testing environment

  • Wikimedia Labs [?] — Gluster project storage is now available. In total 71TB are available for use. Each project has a default quota of 300GB that can be increased on request. Soon, public datasets (such as XML dumps) will also be available within Labs. There were two Labs downtime events this month. Both were due to glusterfs instance storage. The first was due to a limitation in the FUSE filesystem (in regards to recreating deleted directories) and was relatively short (roughly 2 hours). The second was due to malfunctioning hardware, which caused the glusterfs storage to go into a splitbrain situation that was unresolvable. There was no dataloss, but the
    instance’s images had to be recovered manually from gluster’s backend. Total downtime for the second outage was roughly 24 hours. Andrew Bogott has finished his work on the SharedFS support in Nova, with a gluster driver. Proposal for this for inclusion to nova is set for the folsem release; this will be discussed at the upcoming OpenStack design summit. Andrew has begun work on adding support for updating MediaWiki on nova changes.

Backups and data archives

  • Data Dumps [?] — We sorted out the network issues to our mirror sites on our end by replacing a switch. We set up a new host to hold a copy of all uploaded media for copying to our mirror sites, and and the first copy of this media to an external mirror is now underway. Mirror sites will also be able to pick up a list of dump files to copy (the last 1, 2 or 5 good dumps) in a few different formats, produced by a new script. The first copy of recent dumps to a gluster share available to Labs users is available, but already out of date; one process is too slow, so a script is being tested that will dispatch copy requests to several processes running at once. Christian Aistleitner is working on PHPUnit tests now
    for the maintenance scripts used for the dumps. We’ve improved our process for deployment of new versions of the XML dump scripts, so that new code can be rolled out more often.

Other news

  • Performance engineer Asher Feldman published an article explaining how site performance is measured at the Wikimedia Foundation. He notably presented graphite and a limited version available at http://gdash.wikimedia.org.
  • Operations engineer Ryan Lane, who is leading the Wikimedia Labs project, was featured on the Wikimedia Blog this month.
  • We started investigating the possibility of a caching center on the West Coast of the US. We believe it would improve the experience for users in Asia and America’s West Coast.
  • Readers reported intermittent performance issue on March 25th. Tim Starling investigated and determined it was a network problem. Leslie Carr quickly found the root cause, redirected the traffic and thus resolved the problem. Rob Halsell later swapped and replaced the problematic fiber and transceiver.

Features Engineering

Editing tools

  • Visual editor [?] — A big decision in March was to move forward with contentEditable (CE), implemented by Wikia developers Inez Korczynski and Christian Williams, instead of Editable Surface (ES). Trevor Parscal and Roan Kattouw focused on the data model. Rob Moen worked on the user interface, first on right-to-left support in ES, then on getting the UI working in CE. Gabriel Wicke and Audrey Tang continued their work on Parsoid and need to decide on RDFa vs. microdata. They created a dump grepper with syntax highlighting, and used it to analyze existing wikilink/image syntax use.

Editor engagement

Multimedia Tools

  • TimedMediaHandler [?]Michael Dale and Jan Gerber have TimedMediaHandler set up on beta. It is running into issues related to the Labs beta setup that are preventing the test plan from being run. Labs and QA leads are working with them to get to the point where testing can be run. QA support has been lined up.

MediaWiki infrastructure

Feature support

  • 2012 Wikimedia fundraiser [?] — The team continued to work on GlobalCollect recurring donations, with the code review remaining to be done. They also engaged in cleanup after an eventually successful upgrade of our production instance of CiviCRM from 3.4 to 4.1.1, the migration to git, and Mingle training. There was an issue with an imbalance of chargebacks, due to a spinning down of the Winter fundraising flagging fraud in GlobalCollect, that was resolved.

Internationalization and Editor Engagement Experimentation

  • Internationalization and localization tools [?] — The team started to develop (with UI/UX contractors) the UI for a Universal language selector for desktop and mobile. They also added keymaps for language support to Narayam, added Lohit font updates from upstream to WebFonts, fixed bugs, reviewed code for localization support in MediaWiki 1.19, and discussed language support metrics. Niklas Laxström migrated the
    Translatewiki.net workflow to reflect the move to git/gerrit.
  • Editor Engagement Experimentation [?] — The newly created, cross-functional Editor Engagement Experimentation team will focus on engineering for experimentation around strategies to reverse stagnating/declining participation in Wikimedia projects, and will effectively launch on April 16. It will be composed of people from the Community and Engineering/Product departments, tasked specifically with conducting small, rapid experiments designed to improve editor retention. This is intended to go beyond the projects that are already being worked on; the purpose of this team will be to identify the possible changes we don’t yet know about. The engineering team will report to Alolita Sharma, with two new software developer positions to be hired in the current fiscal year.

Mobile and Special Projects

Mobile

Offline Projects

  • Kiwix UX initiative — The team decided not to use Mozilla Gecko as the platform to port Kiwix to Android; an alternative is cordova-qt. Work continued on Kiwix 0.9 RC1, the largest release ever made for Kiwix. New ZIM files are regularly released for offline reading using Kiwix. In particular, for the first time this month, a full ZIM version of the English Wikipedia was made available, containing about 4 million articles, 11 million redirects, and 300,000 math images (see online demo).

Platform Engineering

MediaWiki Core

  • MediaWiki 1.19 [?] — We have now finished deploying MediaWiki 1.19 to all Wikipedia sites, including the Chinese language wikis (zh*). However, we are monitoring some post-deploy issues. We are keeping an eye on site performance; there’s been a slight regression in our parser cache hit rate. The new diff colors have been temporarily reverted, and Trevor Parscal and Timo Tijhof plan to look into the subject. Marcin Cieślak and Aaron Schulz have cleaned up areas where the CheckUser feature briefly stopped working properly.
  • Continuous integration [?] — This activity was somewhat deprioritized in March in favor of the git migration. Nonetheless, Jenkins is now running the PHPUnit test suite and reporting tests results in Gerrit interface. This will help catch possible culprits as soon as a patch is submitted. Timo Tijhof wrote workflow specifications for continuous integration. Over the course of April, the Jenkins/Gerrit interaction will be polished and we will start looking at Selenium and bringing Testswarm back in action.
  • Git conversion [?] — We’ve now moved MediaWiki core and WMF-deployed extensions over to Git and Gerrit, and for those directories Subversion is now read-only. We’ve communicated links and workflow planning, and the new procedure to add and remove people from Gerrit project owner groups. A summary of the move was published in the Wikipedia Signpost.
  • SwiftMedia [?] — Swift is deployed for thumbnails. There are still some corrupted thumbnails in the Squid cache, but all known issues with new thumbnail corruption have been resolved. Work is underway to test and deploy Swift for original images, with work scheduled to complete in late May.

Wikimedia analytics

  • Report Card [?] — The analytics team is finetuning the interface of the new Report card. The test site in Labs is currently unavailable. The team is working towards showcasing a first report card prototype by April 6th, the date of the next metrics meeting for the Wikimedia Foundation. This prototype will replicate readers and pageviews. The team will also make a serious attempt at getting editor data up and running, and add the ability to add and signal benchmarks, for the April 6th meeting.

Technical Liaison; Developer Relations

Future

The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. In March, a particular focus of the engineering management team was also the annual goal and budgeting process.


This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.


This article was edited on December 1st, 2012. The following content was changed: the number of processed shell requests was corrected.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?