Wikimedia engineering report, December 2013

Translate This Post


Major news in December include:

  • a retrospective on Language Engineering events, including the language summit in Pune, India;
  • the launch of a draft feature on the English Wikipedia, to provide a gentler start for Wikipedia articles.

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in December:

  • 152 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from around 1230 to about 1386.
  • About 25 shell requests were processed.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

  • Sherah Smith joined the Wikimedia Foundation as Fundraising Engineer in Features Engineering (announcement).
  • Kunal Mehta joined the Wikimedia Foundation as contractor in Features Engineering (announcement).
  • Andrew Green joined the Wikimedia Foundation as a contractor in Features Engineering, working on the Education Program (announcement).

Technical Operations

Datacenter RFP

As part of our ongoing work in selecting a new location for our next Datacenter, members of the team traveled to several candidate locations throughout the US to tour facilities, meet facility staff, and otherwise continue the selection process. Following this process, we have been able to shortlist our bid proposals, and have begun the final selection process. We hope to complete bid selection and legal review in January.
Work continues on migrating our remaining services to our Ashburn datacenter. Consolidation and migration of databases, fundraising infrastructure, Labs, as well as progress on updating the configuration (puppetization) of several miscellaneous services was accomplished in December Additionaly, “triage” of the hardware within the facility was performed, with an eye towards what will be delivered to Ashburn, what will end up in our new facility, and what will be decommissioned.

Wikimedia Labs

Andrew Bogott purged of empty projects and stale instances, resulting in more accurate usage statistics for Labs:

  • Number of projects: 140
  • Number of instances: 403
  • Amount of RAM in use (in MBs): 1,592,832
  • Amount of allocated storage (in GBs): 21,525
  • Number of virtual CPUs in use: 797
  • Number of users: 2,425
Tool Labs saw a bump in usage as the winter holidays provided an opportunity for volunteers to migrate tools from the Toolserver and work on new projects; there are now 531 tools managed by 435 users, ranging from simple database queries to elaborate editing adjuncts using the new OAuth infrastructure.
Work for the impending migration of Labs to the Ashburn data center is well on its way: hardware is set up, the new storage servers are configured, and a lot of fresh OpenStack puppet manifests are in progress.

Features Engineering

Editor retention: Editing tools

VisualEditor

In December, the VisualEditor team worked to continue the improvements to the stability and performance of the system, and to add new features. The deployed version of the code was updated three times (1.23-wmf6, 1.23-wmf7 and 1.23-wmf8). Most of the team’s focus was on major new features and fixing bugs. There is now basic support for rich copy-and-paste from external sources into VisualEditor, and a basic tool to insert characters not available on users’ keyboards. Work also continued on a dialog for quickly adding citation templated references, and on some major infrastructure changes, splitting out the core of VisualEditor from the MediaWiki-specific items like the transclusion editor.

Parsoid

In December, the relentless Parsoid team continued squashing bugs and incompatibilities; see our deployments page for details. During the node 0.10 migration, we ran into some issues caused by changed garbage collector behavior, and rolled back to 0.8. We spent some time investigating and fixing this; initial testing on our round-trip testing setup indicates that this is now fixed.
Our testing infrastructure is now exercising the entire stack including the web server, which will help to make sure that we also catch issues in HTTP libraries before deployment.
We wrote several RFCs about embracing a service architecture, PHP bindings for services, a general-purpose storage service based on our Rashomon revision store, and a public content API based on this.
Part of the team worked on a new PDF rendering infrastructure using Parsoid HTML, node and PhantomJS. Part of the team has also been mentoring two Outreach Program for Women (OPW) interns.

Core Features

Flow

In December, we deployed Flow to a few selected pages in production (Talk:Flow and Talk:Sandbox on mediawiki.org) and collected feedback about the features and design to date from the community. The results of the feedback period are summarized at Flow/Research#Experienced Users. Throughout the feedback period, we worked on implementing design changes – such as a more compact view of the board and different affordances for topic and post actions, as well as different visualizations of history information – based on the comments of users testing the software. We also began a straw poll about launching Flow as a beta trial in the discussion spaces of WikiProject Breakfast, WikiProject Hampshire, and WikiProject Video Games on English Wikipedia. Based on the outcome of these polls, we hope to deploy Flow to those pages in January.

Growth

Growth

In December, the Growth team spent time working on product development and research for upcoming Wikipedia article creation improvements. First and foremost, the team fulfilled a request from the English Wikipedia community to launch the new Draft namespace there. Pau Giner and others on the team simultaneously began design work on future improvements to drafts functionality (see blog post), including recruiting for usability testing sessions.

Support

Wikipedia Education Program

In December, we improved and fixed issues with the current Education Program extension, and continued preliminary work towards a new version of the software. We added a message on Special:Contributions about users’ participation in courses, fixed a bug involving course undeletion and tweaked related styling, addressed a breaking change in core, improved i18n (in collaboration with Language Engineering) and began work on notifications for course-related events. We also fleshed out more ideas about the new version and possible synergies with other existing and proposed functionality, and reached out to other teams for input on this.

Mobile

Wikipedia Zero

During the last month, the team implemented a global landing page redirector for mobile Wikipedia website access, added support for staged configuration submittals, enhanced interstitials based on input from the field from Wikipedia Zero markets, amended compression proxies support, analyzed USSD/SMS service and partner launch-related access, and added general bugfixes. The team also worked toward a generic JSON configuration extension for use by extension like ZeroRatedMobileAcces, and started on an HTML5 webapp proof of concept as an option for rebooting the Firefox OS app.

Mobile web projects

We’ve been working on finishing the redesign of the overlays. Additionally we’ve continued work on mobile on-boarding. The “Keep going” feature has been changed to a workflow that asks users to add blue links and includes a tutorial. This is consistent with what we’ve learned about how guiding users helps accomplish more edits, and it fits into more of micro contributory workflow that we want to experiment with. We’ve also worked on an A/B test displaying an edit guider for users signing up from the left nav menu. This is mirroring the edit guider that displays for users signing up through the edit call to action. It also is consistent with the behavior that the desktop site will be displaying to users as a result of the OB6 A/B test.

Presentation slides about mobile apps

Wikimedia Apps

The team added saved pages, article navigation, and language support to the mobile Wikipedia app. During the quarterly planning meeting, it was decided to postpone photo uploads from our market release plan in favor of text editing.

Language Engineering

Development of the TwnMainPage extension was completed; Translatewiki.net now has a new user registration process and a new dashboard for translators that provides insight in a user’s activity compared to that of other users.
Plural rules for MediaWiki have been updated to comply with CLDR version 24. There were consequences for existing translations in Russian, languages that fall back to Russian, Serbian, Belarus and Ukrainian. These have been updated semi-automatically, and past contributors have been informed and asked to help in reviewing the updates.
MediaWiki Language Extension Bundle 2013.12 was released. It is compatible with MediaWiki 1.21 and MediaWiki 1.22. The MediaWiki language extension bundle provides easy way to bring ultimate language support to your MediaWiki. The bundle is a collection of selected few MediaWiki extensions needed by any wiki which desires to be multilingual.
A performance issue in the Translate extension that prevented use of the status field for translatable pages on Meta-Wiki was resolved.

Platform Engineering

MediaWiki Core

Search

We’ve continued our aggressive roll-out of Cirrus as a Beta Feature. You can search now 52% of pages including Commons and Wikidata via CirrusSearch. We’ve fallen back somewhat on our goal to make Cirrus the primary search engine. Right now, we only handle about 1.5% of search traffic. While we will be switching more wikis over to Cirrus as the primary search back-end in January, the theme of the month really is adding Cirrus as a Beta Feature to more wikis, including the English Wikipedia. We’re not sure how many wikis we’ll be able to add before we consider ourselves out of hardware space. We’re planning on 50% more servers in February so we’ll likely be able to finish adding wikis then.

Site performance and architecture

Ori Livneh presenting about site performance at the monthly WMF Metrics meeting (slides)

The team wrapped up the Puppetization of Graphite and its migration to Ashburn, and configured Travis CI to run MediaWiki’s test suite under HHVM on each commit to core. They also added an initial HHVM role for MediaWiki-Vagrant and re-wrote MediaWiki’s profiling data aggregator to be more performant. Prior to the rewrite, it was constantly saturated and would drop data; the rewrite reduced average CPU utilization by more than two thirds.

Auth systems

The team implemented performance fixes for CentralAuth to reduce the number of calls by anonymous users.

Wikimania Scholarships app

All critical functionality and several stretch goals were reached for the “final” version, which deployed to production on 2013-12-19. Chad Horohoe stepped in in the final days leading up to launch and helped save the i18n features that were scheduled to be scrapped due to time constraints. Siebrand Mazeland and the wonderful volunteers at translatewiki.net are providing translations at a rapid pace. Bryan Davis also put in some extra hours to clean up the look and feel of the application with a new Bootstrap-based theme. The application period for Wikimania 2014 will open at 2014-01-06T00:00:00Z and continue until 2014-02-17T23:59:59Z. The team will continue to monitor and support the product through the application period, and the subsequent review and approval process of the Scholarship Committee.

Security auditing and response

We continued to respond to reported security issues, and completed security reviews of Flow, the Wikimania Scholarships app, and the GLAM Wiki Toolset.

Admin tools development

The team made several small improvements, including log entries, the addition of global groups to Special:CentralAuth, and the addition of global edit count to Special:MultiLock.

Release & QA

In December, the latest and greatest version of MediaWiki was released, 1.22. This was lead by Mark Hershberger and Markus Glaser, working as the MediaWiki release team, along with help from the Wikimedia Foundation Release and QA team (specifically Greg Grossmeier and Antoine Musso). Of course, this was only possible because of the great work by all of the MediaWiki developers.
The QA team, along with Multimedia team, is working on API level tests starting with UploadWizard. This is close to being done. Another API level testing activity is Parsoid, with help from VE and CI (Antoine) teams.
You can take a look at the first draft of the updated Development and Deployment process flow chart.

Quality assurance

Quality Assurance

In December, the Quality Assurance team worked particularly closely with the Mobile team, both supporting automated testing and also helping fix issues with Beta labs and with Jenkins. We continued to work with the teams from Language engineering, VisualEditor, Flow, Multimedia, Wikidata, and Search, as well as participated in the Google Code-In event. We are in the process of creating new support not only for automated browser testing, but also for API testing, test data creation, and monitoring of both test and production environments.

Beta cluster

Parsoid on the Beta cluster is now based on the mediawiki/services/parsoid repository and is properly self-updating whenever a change is merged in that repository via a Jenkins job. Beta labs played a key role in finding and fixing some significant errors that, in combination, were causing users to see 503 errors in production, particularly on large pages and for Mobile users. For one thing, some timeouts on the Varnish caches had been set too low. We had increased those for the text Varnish servers but had not done so for Mobile Varnish servers. A tricky bug was also causing parts of large pages to be parsed multiple times. Last, the browser tests that incurred the 503 errors should have been capable of ignoring them. Thanks to Beta labs, the Varnish server timeouts are now correct, the multiple-parsing bug is addressed and the browser tests for MobileFrontend are running correctly.

Browser testing

Besides ongoing regression testing of Wikipedia features in cross-browser tests, in December we made the first steps for new abilities like testing geolocation for Mobile tests, testing and monitoring upload ability in production, adding the ability to create test data via the API, running tests in PhantomJS on the WMF Jenkins server, and monitoring the Beta labs test environment for fatal errors.

Engineering Community Team

Bug management

Quim Gil and Andre Klapper continued to run and coordinate Google Code-In for Wikimedia. Andre’s draft for a Bugzilla etiquette received lively feedback and discussion. On the technical side, Daniel Zahn prepared the migration of bugzilla.wikimedia.org to WMF’s new data center by turning the existing rudimentary Bugzilla puppet code into a puppet module and automatically generating documentation on doc.wikimedia.org. As part of this preparation, Daniel and Andre also eliminated nearly all Perl CPAN modules (in Bugzilla’s /lib subfolder) on the new server by using default distribution packages instead. Furthermore, Andre worked on a preliminary patch to display some common queries on the Bugzilla front page.

Project management tools review

Andre Klapper and Guillaume Paumier kicked off an evaluation of Wikimedia’s project management tools. Guillaume prepared a consultation page with topics for stakeholders and improved it together with Andre. It will initially be sent to the teampractices mailing list and individual stakeholders. To facilitate getting input, talking to individual stakeholders via Hangouts and holding an IRC discussion are also considered.

Mentorship programs

Wikimedia’s first participation in the Google Code-In program required a lot of dedication from the ECT members, and about a dozen of mentors and other contributors helping creating and reviewing tasks. Students completed about 200 tasks. The GCI inertia and the lessons learned will help us organize a better gateway for new contributors, which was a main reason for us to join this program. We also believe that the experience acquired will help us make future editions as successful with less work.
Round 7 of the FOSS Outreach Program for Women started and all projects and on track so far:

We joined Facebook Open Academy almost at the last minute thanks to a reminder from developer Tyler Romeo. Six projects were accepted, which will be developed by teams of university students during the first half of 2014:

Technical communications

In December, Guillaume Paumier‘s primary focus was on creating and assigning tasks for the Google Code-in program, mentoring students and reviewing their work. They worked on writing discovery reports, adding TemplateData to widely-used Wikipedia templates, and converting manually-translated pages on mediawiki.org to pages using the Translate extension. Guillaume also continued to provide ongoing communications support for the engineering staff, and assemble and publish the weekly technical newsletter, which is now delivered across wikis using MassMessage. Last, he compiled readability metrics for all past issues of the newsletter, as well as translation and subscribers metrics.

Volunteer coordination and outreach

We reached all our goals for submissions at FOSDEM in Brussels (February 1−2): a fully scheduled Wikis devroom, a main track session (The Wikipedia Stack) by Erik Moeller, and the Wikimedia stand coordinated by Dimitar Dimitrov. Our hiring process for a technical writer contractor was unsuccessful. After screening dozens of candidates and interviewing several of them, our three final candidates declined for various reasons. Without time to hire a writer before the Architecture Summit 2014, we decided to hold the search for now.

Multimedia

Multimedia

In December, Mark Holmquist and GergƑ Tisza updated the beta version of the Media Viewer, based on new designs by Pau Giner. This new version now features next and previous arrows, as well as faster image load and an enhanced metadata panel, as shown on this demo page.
Fabrice Florin managed product development, spearheaded the team’s Multimedia Quarterly Review meeting, hosted more roundtable discussions and presented a Multimedia Vision 2016 to get more community feedback about our goals, with help from volunteer Aaron Arcos.
Bryan Davis, Aaron Schulz and other team members helped Dan Entous and David Haskiya release a first version of the GLAM Toolset for batch uploads by museum curators. We also started work on fixing bugs for the Upload Wizard, which we’ll aim to improve as our primary focus this quarter.
Last but not least, we are delighted to welcome Gilles Dubuc, who is joining our multimedia team as senior software engineer. To discuss these features and keep up with our work, we invite you to join the multimedia mailing list. .

Analytics

Kraken

In late December, the Analytics team partnered with Operations to enable log delivery over Kafka (distributed message bus). All logs from the edge caches serving mobile traffic are now delivered via Kafka into a data warehouse on our Hadoop infrastructure. We’re seeing 3−4K messages per second, with a maximum of 8K/sec over Christmas. This is a significant step towards our goals of building an infrastructure that can be used for analysis of all of our page views.

Wikimetrics

The team added a small but important feature to Wikimetrics in December: the ability to authenticate against MediaWiki OAuth. This allows users to sign up for Wikimetrics without relying on a third party for authentication and is an early adoption of MediaWiki OAuth.

Data Quality

The team continues to spend a large amount of time on data quality. The primary effort in December was in isolating and fixing an error in WikiStats that inflated page views from July to December by a significant amount. The error was patched in early December and the statistics were recalculated. There were also issues with Wikipedia Zero traffic and an outage caused by a single point of failure in the legacy infrastructure.

Research and Data

This month, we kicked off a series of monthly research showcases as an opportunity for the team to share what we’re learning about Wikimedia editors and projects, and new features and programs the Foundation is rolling out. Aaron Halfaker presented research on anonymous editors. The first showcase was targeted at an internal audience but we’re considering making future showcases open to anyone via a public stream.
We analyzed the cause and impact of major over-reporting on page views in the last months of 2013. We filtered bogus traffic from the data, and published updated reports.
We also continued work on metrics standardization and presented the rationale for this project and the results of the initial round of analysis we conducted.
This month also saw the completion of the third volume of the research newsletter, which this year covered a total of 196 publications reviewed by volunteer contributors. A retrospective of research covered in the newsletter in 2013 will be published later in January.

Offline

PDF rendering

Work started on this project aiming to replace the back-end renderer for the Collection extension (mwlib). This is the renderer that creates the PDFs for the ‘Download to PDF’ sidebar link and creates books (downloadable in multipe formats and printable via PediaPress), using Special:Book. One of the goals is to take advantage of Parsoid to do the parsing from wikitext. The team worked on the new parser, the ‘Collection Offline Content Generator’. The team will continue to work on this project over the coming weeks. Read more in the mailing list thread.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.
We have released two new versions of Kiwix for Android this month (1.7 & 1.8), providing many new features; most of them were developed by young new developers as part of the Google Code-in program. Work continues around tools based on Parsoid output, especially as we need to rewrite the ZIM-related code for the MediaWiki offline toolchain, currently under heavy re-engineering. We have compiled download stats for 2013, and for the first time we have reached 700,000 downloads of the Kiwix app a year. Work to digitally sign the OSX and Windows binaries is ongoing and is the last step before releasing 0.9rc3. We have started experimenting with porting Kiwix-plug to RaspberryPi, and it looks good. Lots of new ZIM files were generated; we now generate a ZIM file of Wiktionary, as well as ZIM files without pictures.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

The Wikidata development team continued to work on quantity values, including localization, support for scientific notation and the user interface. They also worked on performance by improving caching and database handling. DataValues Serialization 0.1 was released, as well as Ask Serialization 1.0 and Wikibase DataModel 0.6. A new DataModel serialization component was started, which will allow authors and people analyzing dumps to have the deserialization task solved for them.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?