Wikimedia engineering June 2013 report

Translate this post

Major news in June include:

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in June:

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

  • Sean Pringle joined the Technical Operations team as our Storage and Database Engineer (announcement).
  • Brian Wolff joined the Wikimedia Platform Engineering group as Software developer for the Summer, working on multimedia contribution and review (announcement).
  • Ken Snider joined the Technical Operations team as an international contractor, poised to fill the Director of Technical Operations position (announcement).
  • Toby Negrin joined the Engineering department as Director of Analytics (announcement).

Technical Operations

Site infrastructure

As part of our capacity planning work, Mark Bergsma upgraded most of our Varnish infrastructure (in EQIAD & ESAMS) with newer and faster servers. He will be adding new mobile Varnish servers in ESAMS next, this coming month. Rob Halsell and Daniel Zahn are pushing ahead with the migration of the other applications from Tampa to EQIAD. New Parsoid application and Varnish servers were also deployed in anticipation of the coming VisualEditor deployment. Meantime, Alexandros Kosiaris is starting the backup project work; read more about the project and the technology.
Mark also put in the finishing touches to deploy all the new network infrastructure at ESAMS. With help from Mark and Leslie Carr, we finally got approval from ARIN for some new IPv4 addresses, needed for our new ULSFO buildup.
Many people are refactoring Puppet code with the ultimate goal of having everything organized into Puppet modules. Andrew Bogott, Antoine Musso and Alexandros are setting up an automated testing infrastructure to support these efforts.

Data Dumps

Our GSoC student, Petr Onderka, is set up in gerrit and committed his first contributions to the Incremental Dumps project; you can follow his code, read his progress reports and check the current discussion on the mailing list. Additionally, we hold IRC meetings on weekdays at about 4:15 pm (UTC) in #wikimedia-tech; lurkers and contributors are welcome.

Wikimedia Labs

Wikimedia Labs saw a lot of improvements in June, including the deployment of AJAX improvements for OpenStackManager to wikitech (added actions: console output; improvements: reboot), and a new interface for displaying quotas for projects in OpenStackManager. We ensured that all instances were properly running Puppet and Salt; Many instances were running puppetmaster::self and needed to have local puppet repo merges or rebases. We upgraded Salt everywhere and re-issued keys to fix a vulnerability in Salt. The team also worked on stabilizing the NFS server. We’ve encountered a kernel bug with NFS; we have changed the scheduler from cfq to deadline and have decreased the read and write sizes of clients to 8k. Progress has been made towards making the Labs database replicas available to the Labs at large (as opposed to only the Tool Labs project). Last, much work has been done towards user request fulfillment in Tool Labs, including work towards WSGI support.

Features Engineering

Editor retention: Editing tools

VisualEditor

In June, the VisualEditor team completed the major new features that we prioritised over the past few months, in preparation for making VisualEditor available to most Wikipedia users in July. We have built an editor that is capable of letting users edit the majority of content without needing to use wikitext — text support, as well as adding and editing inclusions of references, templates, categories and media items. The deployed alpha of VisualEditor was updated four times as part of the transition to weekly deployments (1.22-wmf6, 1.22-wmf7, 1.22-wmf8 and 1.22-wmf9), with several mid-deployment releases as the code was developed to patch urgent issues. Part of this involved running an A/B test for new user accounts on the English Wikipedia, with half of the users getting opt-in to VisualEditor ahead of the wider release. Generally, there were a number of user interface improvements, and fixing a number of bugs uncovered by the community.

Parsoid

Early this month, we deployed Parsoid to the new cluster and started to track all edits and template / image updates from all Wikipedia sites, which is close to the full load we’ll see when VE is deployed to all of them. Our earlier optimization work paid off as the Parsoid cluster and the associated Varnish caches are handling the load very well. The extra load we put on the API cluster is low enough to not cause a problem. As expected, the VisualEditor deployment to the English Wikipedia hardly showed up in the load graphs.
Despite being very short-staffed this month (only two full-time developers), the absence of performance issues left us enough time for a lot more polishing before the VisualEditor release on July 1. As a result, the release went very well with clean diffs on almost all pages.
While more work is left to do, it is now clear that we have fundamentally achieved our goal of a clean translation between WikiText and HTML + RDFa. This does not only enable visual HTML editing, but also makes Wikipedia’s content easily accessible in a standardized format. It also opens up new opportunities for MediaWiki’s core architecture, which we’ll pursue this fiscal year.

Editor engagement features

Notifications

In June, we released more features and bug fixes for Notifications on the English Wikipedia and mediawiki.org. Ryan Kaldari added a confirmation button for the ‘Thanks feature‘, and updated notification fly-outs to show diff links for talk page and interactive notifications, based on a design by Vibha Bamba. Benny Situ continued development of HTML Email notifications and deployed a variety of feature updates. Erik Bernhardson developed a special ‘Suppressed’ content feature, while Matthias Mullie developed a range of new metrics dashboards. Dario Taraborelli and Aaron Halfaker ran a week-long A/B test of new user activity; results show that new users who received Echo notifications made more edits than those who did not, but their edits were reverted slightly more often. Fabrice Florin led the planning process for Notifications, as outlined in the 2013 roadmap, and hosted a day-long roundtable discussion to improve editor engagement features in collaboration with Wikipedia users (see Echo demo and Q&A video on YouTube). Later this summer, we plan to start deploying Notifications on more wiki projects, starting with Meta and the French Wikipedia. To learn more, visit the project portal, read the FAQ page and join the discussion on the talk page.

Article feedback

In June, we deployed final features and bug fixes for the Article Feedback Tool (AFT5) on the English, French and German Wikipedias. Matthias Mullie released an opt-in feature to enable or disable feedback on a page, based on designs by Pau Giner and specifications by Fabrice Florin. In collaboration with Dario Taraborelli, Matthias also developed an updated set of metrics dashboards showing how the new moderation tools are being used: for example, about half of moderated feedback is marked as ‘no action needed’, while about a tenth is marked as ‘useful’ (these results are generally consistent across different languages). The team also supported a wider deployment of AFT5 on over 40,000 articles on the French Wikipedia, as well as a poll by the German community, which elected not to adopt the tool. Now that feature development has ended for this project, we plan to make AFT5 available to other wiki projects in coming weeks, as outlined in the release plan. For tips on how to use Article feedback, visit the testing page, and let us know what you think on this talk page.

Editor engagement experiments

Editor engagement experiments

In June, the Editor Engagement Experiments (E3) team continued work on its experiments related to onboarding new Wikipedians, and launched several new extensions to Wikimedia projects.
First, the new Campaigns extension was added to all wikis. This analytics tool helps identify internal or external sources of new registrations, by adding a “campaign” name to the signup page URL. This month, E3 began running campaigns to learn about how many anonymous editors sign up on the top 10 Wikipedias, as well as how many sign up via the invitation to “Join Wikipedia” on the login page (see the list of active campaigns and analysis). Another piece of analytics infrastructure by the team is the new CoreEvents extension, which houses logging of MediaWiki core activity, like preference updates and page saves across all projects.
For the Getting Started project, the team conducted usability testing (see results and documentation) of new designs. E3 also refactored and refined the guided tours extension in June, including adding usability enhancements like new interface animations, support for community tours, and bug fixing. The team also planned and began work on an experiment to deliver guided tours to all first-time editors.
The team also assisted with A/B testing and research for VisualEditor before its July 1 launch date, assisting with experimental design, EventLogging instrumentation, and other work. After the VisualEditor launch, E3 started a week-long micro-survey of newly-registered users on English Wikipedia, to give us a first systematic look at the gender diversity of those creating accounts.

Support

2012 Wikimedia fundraiser

The initial work on the Adyen payments gateway was finally completed and deployed to production, though we have not yet used the gateway in a campaign. Plans for a mobile fundraising campaign and workflow continued to move forward: We expect to do the first mobile-targeted campaign in mid to late July. Some last-minute tweaking was done to the payments cluster in preparation for the resumption of continuous fundraising on July 1, coinciding with the start of the fiscal year. Payments listener (thulium) deploy was completed, db1013 was moved into the firewalled fundraising cluster and rebuilt as a fundraising QA server, and work continued on the new CiviCRM server (barium). Fundraising backups were overhauled.

Mobile

Wikipedia Zero

This month, the team launched Wikipedia Zero with Dialog in Sri Lanka, patched logic and user interface bugs, enhanced the configuration editor, expanded logging and debugging for identification of anomalous access, further decoupled ZeroRatedMobileAccess from MobileFrontend, and proposed ESI- and JavaScript-based software re-architecture.

Mobile Web Photo Upload

This month, we focused on improving education around uploads, including an interactive Commons tutorial and first-time user copyright and scope check. We also released our “Nearby” feature to production, allowing users to find articles near them that are in need of images, take photos and upload them via mobile.

Mobile Nav

In beta, we started working on an update to our site and article navigation, including design tweaks to the left navigation menu and a new in-article contributory navigation that combines article actions (edit, upload, and watch) with a talk page link. We also experimented with Echo integration and successfully got Notifications up and running on the English Wikipedia mobile site. We hope to push all of this work to production next month.

Platform Engineering

MediaWiki Core

MediaWiki 1.22

In June, the Platform Engineering group switched to a weekly deployment cycle for MediaWiki to the Wikimedia Foundation servers. This means that we have almost halved our previous cycle of 2 weeks. As such, we are progressing through wmfXX versions of MediaWiki at a faster rate now. In June, MediaWiki versions 1.22-wmf6 through wmf9 were branched and deployed.

Git conversion

Chad Horohoe and Christian Aistleitner upgraded our Gerrit instance from a pre-release version of 2.6 to a pre-release version of 2.7 on the last week of June. They’ve additionally published a new version of the Bugzilla/Gerrit integration plugin. Details about new functionality can be found in the Gerrit 2.7 draft release notes.

Multimedia

In June, we started expanding our multimedia team: Fabrice Florin joined as product manager, and Brian Wolff began a summer contract as software engineer. We started work on improving the display of images in galleries and are now planning our next development steps in consultation with community members. Some of the first features under consideration include file curation and feedback tools, as well as media viewers, new video formats and other platform improvements, to be prioritized based on user feedback and technical feasibility. We are also recruiting for two more positions: a multimedia systems engineer and a senior software engineer. Please spread the word about this unique opportunity to create a richer multimedia experience for Wikipedia and MediaWiki sites!

Admin tools development

In June, the team worked on making the last changes to enable global AbuseFilter rules, and on the global account renaming tool. Some additional work was done on Single User Login finalisation, which will mean that all user accounts will be global across all of Wikimedia’s public wikis, and so allowing for cross-wiki notifications and better tools for editors.

Search

Work has pretty much shifted from supporting MWSearch/lsearchd to investigating and implementing Solr. Nik Everett and Chad Horohoe have begun writing an extension to implement Solr searching for MediaWiki, and we’ve gotten a lot of the initial basic functionality completed. Peter Youngmeister and Andrew Bogott will be handling the operations tasks for the new setup. Initial operations tasks will involve packaging Solr 4 and working with Chad to puppetize the whole design. Additionally, we’re going to do some investigation into ElasticSearch, as it’s been suggested as an alternative to Solr.

Auth systems

In June, the team worked with the Wikimedia Foundation’s user experience team to improve SUL2. The improvements were pushed to test wikis on July 1, and will be rolled out to other wikis in July. Implementation of OAuth is well underway, and planned for roll-out in July as well.

HipHop deployment

A Labs instance of MediaWiki running on HipHop is now available at http://hhvm.wmflabs.org.

Security auditing and response

The team continued to respond to reported security issues, and gave security-oriented tech talks on emerging DoS techniques and using OWASP’s ZAP tool for vulnerability scanning.

Quality assurance

Quality Assurance

This month saw a QA focus on automated browser tests. Besides creating new tests and new builds, and reporting issues identified by tests, we conducted a training session in San Francisco to create automated tests for the Wikilove feature. We continue to support all WMF software development projects, with the VisualEditor being a particular focus in June.

Beta cluster

Max Semenik wrote a script to synchronize CSS from production on beta. Steinsplitter and Antoine Musso fixed the AbuseFilter configuration to have a global list of filters on the labswiki. Filters should be configured there and will be used by all the wikis. The PHP fatal errors catched by the wmerrors extension are now sent to the beta udp2log instance. That will largely improve our troubleshooting process.

Continuous integration

Timo Tijhof and Antoine Musso triaged continuous integration bugs. Antoine has setup a Jenkins slave and migrated most jobs on it. It will be very easy to add new servers.

Browser testing

This month, the QA team added new browser tests for UniveralLanguageSelector and for Mobile (contributed by the Language engineering and Mobile engineering teams, respectively), as well as browser test contributions from volunteers. We created new builds in Jenkins to run browser tests against IE10. We created tests for VisualEditor, including some with our intern with the Outreach Program for Women.

Analytics

Analytics infrastructure

We made significant progress with our preparations for replacing udp2log with Kafka in our logging infrastructure. The C library librdkafka has now support for the 0.8 protocol, there is a first version of varnishkafka ready that will replace varnishncsa, the Apache Kafka project released their first beta of Kafka 0.8, and we have a Debianized and Pupppetized version. We keep on adding new metrics and alerts to monitor all the different parts of the webrequest dataflows into Kraken. We expect to keep making improvements in the coming months, until we have a fully reliable data pipeline into Kraken. We also continued our efforts of moving Kraken out of beta: we puppetized Zookeeper, JMXtrans, and the Hadoop client nodes for Hive, Pig and Sqoop. We started reinstalling the Hadoop Datanode workers with a fully puppetized Hadoop installation; so far, we have replaced 3 nodes, and we’ll replace the other seven in the coming weeks. Last, we enabled Jenkins continuous integration for the Grantmaking & Evaluation dashboards.

Analytics Visualization, Reporting & Applications

This month, we completed the end-user documentation of UserMetrics (v1). We rebranded UserMetrics as Wikimetrics, and we will slowly start to use that as the new name when referring to UserMetrics v2 or UserMetrics replatforming. We focused on laying out the foundation of Wikimetrics: a new database design, a new job queue design and lots of unit tests. In addition, we started working on porting over some of the features of UserMetrics v1 (like the ‘namespace edits’ metric and UI components), we added user roles (so users can only see their own metrics) and authentication using OAuth. Last, we fixed some minor issues in UserMetrics v1, among which handling of user names with comma, single and double quotes.

Data Releases

We delivered many following analyses in June, including one of Arabic cohort using UMAPI v1. Erik Zachte provided an analysis of Commons uploaders, and we provided the Wikipedia Zero team with a number of datasets to help them in tracking adoption of the Wikipedia Zero project across the globe. We supported the VisualEditor and Editor Engagement teams with experimental design, data modeling and data analysis for two controlled experiments: a test of the impact of impact of notifications and a first test of the impact of Visual Editor on new contributors. The tests were carried out in June and the reports are being updated with the results of the analysis. We started using the EE-dashboard instance on Labs to host dashboards related to editor engagement projects, that were previously hosted on the Toolserver (see the metrics and features dashboards for the English Wikipedia). Last, we worked with the Features engineering team to expand MediaWiki’s instrumentation and collect data on cluster-wide user preference changes and edit-related events to support VisualEditor analysis.

Engineering community team

Bug management

Andre Klapper published the Bugzilla administrator policy and documented for which specific tasks Bugzilla admin rights are actually needed (which might be also helpful for other projects using Bugzilla). He started publishing weekly “Bugzilla tips and best practices” blog posts and reproposed introducing a “PATCH AVAILABLE” status in Bugzilla (as requested by several developers at the Amsterdam Hackathon) whilst work is ongoing to fulfill prerequisites. On the code side of Bugzilla, a new Bugzilla frontpage went live, providing useful links. Furthermore, the misleading term “login” was replaced by “email address”, it is now possible to set the “Assigned” status directly when filing a new bug report, and smaller issues with the “Weekly Bugzilla Report” email sent to the wikitech-l mailing list were fixed. In Bugzilla’s taxonomy, open tickets in the dormant “Wiktionary tools” product were retriaged and the product closed for new bug entry, and Security-related components in Bugzilla were reorganized after a meeting with the Wikimedia Foundation’s security engineer.

Mentorship programs

The 20 Google Summer of Code and the 1 Outreach Program for Women interns have completed the bonding period (with 3 exceptions, 2 of them justified) and they are now working on their projects. One OPW accepted candidate declined her participation due to a job offer. Monthly status updates are available for these projects:

We also met with SocialCoding4Good, who are relaunching their activities, and we refreshed the Wikimedia page. We expect this to become a regular channel for new technical contributors working in corporations with social/training programs.

Technical communications

In June, work on this topic mostly focused on perennial activities like Tech news and ongoing communications support to engineering staff, as Guillaume Paumier was lent to the VisualEditor deployment effort, working on communications, documentation and liaising with the French Wikipedia.

Volunteer coordination and outreach

The decision of focusing on fewer activities better executed and based on demand seems to be working out, although it’s too soon to confirm the trend. Browser test automation is the number one priority to recruit new contributors, and any help to succeed here is welcome. We created the QA mailing list as an umbrella to host people and discussions focusing on software quality assurance in all its aspects. We have more than 40 subscribers and an initial flow of activity. We had a successful first Browser Test Automation Workshop, with 40 participants in San Francisco and a few more online; we will iterate on this model. We have also helped organizing a Tech Talk on Attack vectors & MediaWiki and OWASP ZAP, and the upcoming Solr-based Search. The project to get automated community metrics based on vizGrimoire and provided by Bitergia has been approved, and a first prototype can be seen at http://korma.wmflabs.org. The project starts effectively on July 1 and includes a one-year period of maintenance. We agreed with the Analytics team that they will assume the responsibility of this area during this period.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.

Development of a new MediaWiki HTML dumper in nodeJS has started. This tool exports Wikipedia articles in static files based on the Parsoid output. This solution looks really promising, and new JavaScript developers are welcome.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

June in Wikidata was all about the sister projects. The development team published proposals for how Wikidata can support Commons and Wiktionary. Additionally, they worked on the ability of Wikidata to store language links to Wikivoyage in addition to Wikipedia; as a result, Wikivoyage will soon also be able to manage their language links via Wikidata. Another important step was the deployment of the geocoordinate datatype. This makes it possible, for example, to indicate the location of a city. Geocoordinates that are already in Wikidata can be seen on this map (huge version, updated daily).
In a blog entry, Denny Vrandečić explained his understanding of the relation of Wikidata and the truth.
In other news, further development of Wikidata has been supported through a large donation by the search engine company Yandex.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. Annual goals for the 2013–2014 fiscal year are currently being drafted.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?