Wikimedia engineering report, November 2013

Translate this post

Major news in November include:

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in November:

  • 146 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from around 1122 to about 1230.
  • About 29 shell requests were processed.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

  • Jeff Hall joined the Platform engineering group as a member of the QA team (announcement).
  • Aaron Arcos joined the Platform engineering group as a volunteer developer working with the Multimedia team (announcement).
  • Dario Taraborelli was promoted to the position of Senior Research Scientist, Research and Data team lead. (announcement).
  • Aaron Halfaker was promoted to the position of Research Scientist (announcement).
  • Moiz Syed joined the User Experience team as User Experience Designer (announcement).

Technical Operations

Wikimedia Labs

A new dynamic proxy system has been deployed on Labs; it allows the admin of any project to arrange for public web access and a dedicated DNS hostname for a project instance without requesting an IP address. Labs staff and volunteers will now be reclaiming quite a few IPs as existing projects migrate to a dynamic proxy setup.
The WMF has hired a short-term contractor, Mike Hoover, to assist with the migration of Labs infrastructure from Tampa to our new datacenter in Ashburn. Mike has spent a lot of time exploring the existing infrastructure and running test setups; soon he will start to configure the new OpenStack nodes in production.
Andrew Bogott has been working on cleaning up stale and unused resources. He’s working on some automatic documentation that will help users track the status of their projects and instances with an eye towards predicting the impact of the coming migration.
Labs suffered two brief outages: a brief, self-inflicted network failure, and a longer outage during which one of the virtualization hosts failed. Both outages were swiftly resolved, but there’s a bit of lag as some tools and services failed to come back properly afterwards due to poor distribution of virtual servers (inter alia, both the grid master and the shadow [backup] master were on the same server).
Preparation for the move to the Ashburn data center is well in progress, with the new storage server being physically configured this week as well as the new hardware servers for user databases (including the new PostgreSQL instance intended for OpenStreetMap).

Features Engineering

Editor retention: Editing tools

VisualEditor

In November, the VisualEditor team continued to improve the stability and performance of the system, and add new features. The deployed version of the code was updated three times (1.23-wmf3, 1.23-wmf4 and 1.23-wmf5). Most of the team’s focus was on fixing bugs, and on some major infrastructure changes, splitting out the OOJS and OOJS-UI libraries from VisualEditor to make them available to other teams. Much of the team travelled to the Open Source Language Summit in Pune, India to learn more about how to improve VisualEditor for a variety of languages, scripts, users and systems. Two new members of the QA team joined in to help improve VisualEditor – Jeff Hall and Rummana Yasmeen, and thanks to them, the automated browser tests have expanded in breadth and depth of coverage. Work continued on major new features like full rich copy-and-paste from external sources, a dialog for quickly adding citation templated references, and a tool to insert characters not available on users’ keyboards. The editor was made available by default on just over 100 additional Wikipedias as part of the continuing roll-out. VisualEditor was also enabled for opt-in testing on Swedish Wiktionary and Wikimedia Sweden’s wiki, the first time it has been available on a non-Wikipedia production wiki.

Parsoid

November saw the deployment of major changes to the DOM spec in coordination with the VisualEditor team. Link types are now marked up by semantics rather than syntax, interwiki links are detected automatically, categories are marked as page properties and more. During the deployment, we found that the newer libraries used by the web service front-end were buggy. We reverted the library upgrade and contributed fixes upstream. This incident prompted us to work on tests for the HTTP web service to catch issues like this in continuous integration.
After these issues were sorted out, we continued with continuous improvement and fixes. Editing support for magic words and categories was improved, several dirty diff issues were fixed and the API was refined for page-independent wt2html and html2wt conversion. See our deployment page for details.
Cassandra load testing for the Rashomon storage service continued and uncovered several issues that were reported back upstream. With Cassandra 2.0.3 the 2.0 branch is now stabilizing in time to make deployment in December feasible. Cassandra is now stable at extremely high write loads of around 900 revisions per second, which is more than 10 times the load we experience in production.

Core Features

Notifications

In November, we deployed Notifications on the German and Italian Wikipedias, completing our worldwide release of this tool. Fabrice Florin, Denis Barthel, Jan Eissfeldt, Erica Litrenta and Keegan Peterzell managed the community outreach for these final releases, while Benny Situ oversaw the technical deployments. Community response to Notifications has been generally favorable on all wikis. While feature development has now ended for this project, we expect new notifications and features to be developed by other teams in coming months. To learn more, visit our project hub, read the help page and join the discussion on the talk page.

Flow

This month, the Flow team finished out the feature set for our minimum viable product. We added watchlist integration, the ability to see board, topic, and post histories, and did a first round of community feedback and testing with our product to date. We also prepared for release to production wikis in December by working on Operations and Security needs.

Growth

Growth

In November, the Growth team primarily worked on refactoring the GuidedTour and GettingStarted extensions, including development of an API for the latter. This public API will be used by the Growth team, the Mobile team and others to deliver editing tasks to users across a variety of Wikipedia interfaces.
The team also spent significant time on the research and design preparations for its anonymous editor acquisition and Wikipedia article creation projects. This included participating in a community Request for Comment about a potential Draft namespace for articles, requirements gathering, and working on a Draft namespace patch.
Matthew Flaschen and Pau Giner attended the Wikimedia Diversity Conference and presented (along with Jared Zimmerman and Vibha Bamba) on how diversity related to the team’s engineering and product work.

Support

Wikipedia Education Program

This month, we improved a feature that was built in October (allowing instructors to assign articles to student editors), completed a new feature (allowing instructors to add users as students) and started another one (displaying information about student editors’ courses on Special:Contributions). We fixed some bugs, and kept up with changes in MediaWiki core. We also continued preliminary work—started last month—towards renewing the UX and broadening the extension’s scope.

Mobile

Wikipedia Zero

During the last month, the team monitored the rollout of Wikipedia Zero via text (USSD/SMS) in partnership with Airtel Kenya and Praekelt for the first pilot of the program. Additionally, Yuri Astrakhan promoted the program abroad.
The team also prepared code and configuration for approval, finalized IP addresses for zero-rating and deployed bugfixes for the Wikipedia app for Firefox OS. We added support for simpler JSON in configuration files, enhanced performance and redirect features and constrained ZeroRatedMobileAccess extension loading to guard against repeats of last month’s configuration bug.

Mobile web projects

The Onboarding A/B Test resulted in an Edit Guider, now available. The overlay UI overhaul currently in beta is planned to become available on the main site. User profiles intent is also in testing in beta.

Language Engineering

Team highlights for this past month include a very successful Open Source Language Summit in Pune, India co-organized with Red Hat. More than 60 developers joined in to collaborate and work together on improving language support for Wikipedia on the web and mobile. Work sprints on integration of input methods in VisualEditor, Indic Fontbook specification, mobile input methods and content translation were held.
The team also fixed and deployed several issues related to performance and saving preferences for the Universal Language Selector (ULS). Other tasks completed include creating a class for interlanguage links using where the Autonym font can be used only for autonym items. The team also worked on collating documentation about all initial inclusion requests for each web font served through ULS also documented in the font.ini files of each font in the repository.

Platform Engineering

MediaWiki Core

DevOps Sprint 2013

The DevOps sprint participants focused their efforts towards monitoring related work, specifically getting Logstash in production and puppetizing/migrating Graphite (both still in-progress). Cache related fixes were made to avoid users seeing outdated version of pages when using non canonical URL forms. A fix was made to the commons upload process to update all articles that use that page as users would expect.

Search

Before November 18, we were spinning up an aggressive plan to add many new wikis to CirrusSearch. On November 18, we had multiple incidents that caused us to roll all wikis using CirrusSearch back to Lucene; we’ve spent the rest of November implementing fixes for all issues discovered on the 18th. That is now done and we plan to switch all wikis that used to have CirrusSearch back to running it as a secondary search engine on December 2. We’ll attempt to restart our aggressive plan as soon as we’re comfortable with it again.

Site performance and architecture

We ran a controlled experiment to test the impact of module storage on performance. We expect to publish our findings within a week. We puppetized Graphite and MediaWiki’s profiling log aggregator and migrated them to our Ashburn data center. Finally, we started working on a replacement profiling log aggregator that will process and visualize profiling data from both client-side and server-side code.

Auth systems

Our preliminary version of OAuth is now live on all Wikimedia wikis. Since the rollout, five OAuth consumers have been accepted. We’re hopeful many more consumers will be proposed.

Wikimania Scholarships app

Work is progressing towards a planned launch of the application on 2013-12-19. The source code has been imported into an internal git repository and is now being managed via gerrit. A bugzilla component has been created under the Wikimedia product to track defects and feature requests. Several changesets are in review to complete the basic functionality of the application and prepare for an internal security review.

Security auditing and response

We released a security update to MediaWiki to fix a number of issues in core and extensions. Security reviews of Limn, GWTools and Flow extensions are in progress.

Admin tools development

This activity is still officially on hold. However, progress on the global rename user tool continued, as well as implementing global CSS/JS.

Quality assurance

Quality Assurance

November saw significant improvements to the QA documentation on mediawiki.org contributed by both staff and volunteers. Participants in the Google Code-in program made even more contributions, to both documentation and browser test code. The QA team welcomed new staff members Rummana Yasmeen and Jeff Hall, who made immediate contributions to the VisualEditor project and to the browser test automation.

Beta cluster

In November, the Beta cluster saw greatly improved support for testing Parsoid, the parsing engine behind VisualEditor. The Beta cluster also continues to provide a real-world simulation for the Flow project in advance of Flow’s limited release scheduled for December. Beta continues to be the the main test environment for MobileFrontend, CirrusSearch, and many other Wikimedia software projects.

Browser testing

In November, we added significant browser test coverage for the Flow project, and the addition of Jeff Hall to WMF staff brought a focus to testing VisualEditor. Browser tests now reside in ten different repositories across WMF projects. November saw a increased browser test coverage for the Language, VisualEditor, and Flow projects, among others. The diversity of browser tests in project repositories has been a force behind great improvements in infrastructure, with code shared among the projects now residing in the repository at mediawiki/selenium.

Engineering Community Team

In November, the Engineering community team held their second monthly showcase, as well as their quarterly review for the July–September period.
Bug management

Andre Klapper and Quim Gil prepared and organized Wikimedia’s participation in Google Code-In. This includes supporting mentors and students by writing documentation and importing tasks. Code-related, Andre cleaned up Wikimedia Bugzilla’s custom CSS by removing 16 CSS files with 6 left to stay, prepared and tested patches for upgrading Wikimedia Bugzilla from version 4.2 to 4.4, updated the Greasemonkey triagescripts (e.g. stock answers to ping assignees), and sync’ed the “WeeklyReport” Bugzilla extension code with upstream. WMF’s Operations team installed new SSL certificates for bugzilla.wikimedia.org. The “shellpolicy” keyword in Bugzilla was renamed to “community-consensus-needed” and the “wikidata” keyword was removed. Furthermore, Andre created a draft for a Bugzilla etiquette.

Mentorship programs

We started successfully Wikimedia’s first participation in Google Code-In. Six candidates were selected as new interns at the FOSS Outreach Program for Women – Round 7:

We also confirmed the participation of Wikimedia in the Facebook Open Academy program.

Technical communications

In November, Guillaume Paumier‘s primary focus was on preparing for the Google Code-in program, and mentoring students once the program started. In 2 weeks, 18 students worked on writing discovery reports (candid essays from the perspective of newcomers to the Wikimedia technical community); among them, seven completed their task successfully. Guillaume also assembled and published the weekly technical newsletter and provided ongoing communications support for the engineering staff.

Volunteer coordination and outreach

Erik Moeller’s talk “The Wikipedia stack” was accepted for the main track session at FOSDEM. The call for proposals for the Wikis devroom at FOSDEM was extended until December 15. Wikimedia applied for a stand.
A Request for Proposals for a technical writer contractor was also sent. Last, we helped establishing a routine around Architecture meetings.

Multimedia

Multimedia

In November, Mark Holmquist and Gergő Tisza developed a second beta version of the Media Viewer, based on new designs by Pau Giner. For a more immersive experience, this next version displays larger images, as shown in the demo.
We also released Beta Features on all Wikimedia wikis, where it is already used by thousands of users. This experimental program invites users to try out new features before they are released widely, then give feedback to developers. To use Beta Features, click on the small ‘Beta’ link next to your ‘Preferences’ on your site, or test the latest version on MediaWiki.org.
Fabrice Florin managed product development, led the creation of the Multimedia Vision 2016 (with Pau Giner), hosted roundtable discussions and updated the team’s multimedia plans, based on community and team feedback.
Bryan Davis, Aaron Schulz and Chris Steipp reviewed new code for the upcoming GLAM Toolset for batch uploads by museum curators. We also welcomed Aaron Arcos as volunteer software engineer, who is joining our multimedia team full-time through Spring 2014.
To discuss these features and keep up with our work, we invite you to join the multimedia mailing list. We are also recruiting for a senior software engineer position on our team.

Analytics

Kraken

We continued to make progress on event delivery via Kafka. We identified and tested solutions for issues encountered with event delivery from the Amsterdam data center. We also tested solutions to fix Ganglia logging issues.

Wikimetrics

We concluded Phase 1 of Wikimetrics, by implementing asynchronous cohort validation, editor survivor and threshold metrics.

Data Quality

We identified issues with over-counting page views, and deployed a fix in November. Data from July onward were restated.

Research and Data

This month, we started work on metrics standardization, one of the team’s quarterly goals. We published a number of supportive analyses of new user acquisition, activation and retention as well as “active editors” to assess issues and potential benefits of new definitions. The outcome of this analysis will inform design decisions for new dashboards focused on editor engagement.
In collaboration with the Platform team, we ran an A/B test to determine performance gains of localStorage. The results indicate that the use of localStorage significantly improves the site’s performance for the end user: Module storage is faster. Readers whose pages load slower tend to browse less. Mobile browsers don’t seem to benefit substantially from caching.
We published the results of a test designed to explore if displaying a short tutorial could improve the first-edit completion rate of newly-registered users on mobile devices. The results support the hypothesis, indicating that edit guiders are a good onboarding strategy for new mobile users.
We ran an analysis of anonymous editor acquisition as background research for new onboarding strategies designed by the Growth team and found that editors who edit as an IP right before registering an account are our most productive newcomers.
On November 9, 2013 we hosted the inaugural Labs2 Wiki Research Hackathon: it was the first in a series of global events meant to “facilitate problem solving, discovery and innovation with the use of open data and open-source tools” (read the full announcement). Highlights from the event are available in the latest issue of the Research Newsletter. We are planning to host a new hackathon in Spring 2014 and we are actively seeking volunteers to host local and virtual meetups.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.

We have released two new versions of Kiwix for Android this month (1.5 & 1.6), providing many new features; most of them were developed by young new developers as part of the Google Code-in program. We have also released a new and unique tool to easily create ZIM file yourself from data on your hard drive; the tool is stable and can now be used. Work continues around tools based on Parsoid output, especially as we need to rewrite the ZIM-related code for the Mediawiki offline toolchain, currently under heavy re-engineering.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

Wikidata developers held an office hour to give a status update and answer questions (read the log). In addition, they worked on ranks, ordering of statements and the quantities datatype. The quantities datatype is needed, for example, to enter the number of inhabitants of a country in Wikidata. It is available for testing now on http://test.wikidata.org. Ranks will allow for certain statements to be marked as preferred or deprecated. This is for example useful to indicate a previous mayor of a city, or the number of inhabitants of a country in 1900.
Magnus Manske wrote a gadget that allows you to additionally show Wikidata search results when doing a search on Wikipedia. He also extended the Reasonator tool to now also work for cities. Until now, it only supported people and species.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?