Wikimedia engineering report, February 2014

Translate this post

Major news in February include:

  • a call for volunteers to test the upcoming multimedia viewer;
  • improvements to VisualEditor’s media and template editors;
  • the launch of the Flow discussion system on two pilot talk pages on the English Wikipedia;
  • the launch of guided tours to 31 more language versions of Wikipedia, including all of the top 10 projects by number of page views;
  • improvements to the tools and process used to deploy code to Wikimedia production sites;
  • the release of the first archive of the entire English Wikipedia with thumbnails, for offline use.

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in February:

  • 149 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from around 1320 to about 1453.
  • About 22 shell requests were processed.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

  • Leila Zia joined the Analytics team as Research Scientist (announcement).
  • Faidon Liambotis was promoted to Principal Operations Engineer (announcement).
  • YuFei Liu joined the UX Design team as Visual Design Intern (announcement).
  • Following changes in the Language engineering team, Amir Aharoni is now the Acting Product Manager, and Runa Bhattacharjee the ScrumMaster (announcement).

Technical Operations

Datacenter RFP

Final negotiations have completed with the 3 remaining data center bids in February, and the Wikimedia Operations team will make a decision in the first week of March. Expect a public announcement soon.

Wikimedia Labs

Labs metrics in February:

  • Number of projects: 129
  • Number of instances: 458
  • Amount of RAM in use (in MBs): 1,812,992
  • Amount of allocated storage (in GBs): 24,540
  • Number of virtual CPUs in use: 906
  • Number of users: 2,714
The Wikimedia Labs infrastructure in the eqiad data center has been deployed with the OpenStack Havana release, and testing completed in February. Labs users will have 2 weeks to migrate their own projects & instances starting in March. During the last two weeks of March, the Wikimedia Operations team will handle the transfer of the remaining instances that have not been migrated by users themselves.

ulsfo redeployment

During a short deployment of our West Coast data center ulsfo in October 2013 several reliability problems were found with some of our network service providers, which forced us to take this site out of service until they could be resolved. We have worked since to improve reliability and increase redundancy of network transit and transport to this site. As of the week of February 3rd ulsfo is in full production usage again, and is now serving traffic for the US west coast, Oceania and large parts of Asia. A blog post is being prepared describing the improvements in user perceived site performance.

eqiad data center capacity expansion

The Wikimedia Foundation has expanded the capacity of its main data center site eqiad in Ashburn, Virginia by 33%. A fourth row of racks has been added, and all power & networking infrastructure has been installed and configured in February. The added rack space is available for new equipment as of February 24th.

Features Engineering

Editor retention: Editing tools

VisualEditor

In February, the VisualEditor team continued their work on improving the stability and performance of the system, and added some new features and simplifications. Media item editing is now much richer, allowing the setting of position, alt text, size (or setting as default size) and type for most kinds of media item. When adding links, redirects and disambiguation pages are now highlighted to help editors select the right link, and changing the format or style of some text was tweaked to make editing clearer and more obvious. Adding and editing template usages is now a little smoother, auto-focussing on parameters and making them clearer to use. Page settings have expanded to set redirects, page indexing and new section edit link options. The extensive work to make insertion of “citation” references based on templates quick, obvious and simple neared completion. The deployed version of the code was updated four times in the regular releases (1.23-wmf13, 1.23-wmf14, 1.23-wmf15 and 1.23-wmf16).

Parsoid

In February, the Parsoid team continued with bug fixes and improved image support. See the deployment page for a summary of deployments and fixed bugs in February.
Part of the team has continued to mentor two Outreach Program for Women (OPW) interns. This program ends mid-March. Others are mentoring a group of students in a Facebook Open Academy project to build a Cassandra storage back-end for the Parsoid round-trip test server.
We have a first version of a Debian package for Parsoid ready. This package is yet to find a home base (repository) from which it can be installed. This will soon make the installation of Parsoid as easy as apt-get install parsoid.

Core Features

Flow

This month, Flow was launched on the talk pages of two English Wikipedia WikiProjects that volunteered to be a part of the first trial, WikiProject Breakfast and WikiProject Hampshire. We’ve continued to iterate on the front-end design of the discussion system based on user feedback, releasing a new visual treatment during the trial and starting work on a front-end rewrite for better cross-browser and mobile compatibility (to be released sometime in March). We also spent time making sure Flow integrates better with vital MediaWiki tools and processes (e.g., suppression and checkuser) and improving the handling of permalink URLs.

Growth

Growth

Slides of the quarterly review.

In February, the Growth team first focused on releasing the new Wikipedia onboarding experience on additional projects. The GettingStarted extension was deployed to 30 Wikipedias, including all of the top 10 projects by number of page views. This marks the first time its task suggestions and guided tours were available outside English projects. The GuidedTour extension was also deployed to those projects (as a dependency of GettingStarted), as well as the Czech Wikipedia and se.wikimedia.org. Late in the month, the team also presented its work at its first Quarterly Review of the 2014 calendar year (see slides and minutes).

Support

Wikipedia Education Program

For the first half of the month, we focused on the current Education Program extension. We fixed many old and new bugs—including a few remaining database-related problems—and improved the UI for editing courses. Also, two Facebook Open Academy students started work on new notifications for the extension. In mid-February the team shifted our focus to creating new software for many kinds of collaborative editing, including, but not limited to, Education Program courses. The first phase of this work, called editor campaigns, is being carried out with the Growth team.

Mobile

Wikimedia Apps

We’ve worked primarily on enabling wikitext editing, specifically enabling logged-out editing, logged-in editing, logging in and creating accounts.

Mobile web projects

We’ve been working on bringing VisualEditor to tablets (currently in alpha). This is a requirement for redirecting tablets to mobile later on. Specifically, we’ve been working on enabling inspectors, especially the link inspector. We’ve also been fixing a variety of bugs to ensure that the basic editing functionality works as expected.

Wikipedia Zero

During the last month, the team added zero-rating for HTTPS for select carriers in cooperation with the Operations team. In collaboration with the Mobile Apps team, we integrated Wikipedia Zero into the forthcoming rebooted versions of the Android and iOS apps, including API and client-side code for zero-rating detection. We updated the legacy Firefox OS app with bugfixes from January (make spinner background opaque, remove mozmarket.js legacy JS); we also prepared other bugfixes for that app (keep last page browsed on low memory crash, avoid text overlaying <select> dropdwon, ensure ‘X’ clicks stop processing and not send user to Main Page). Discussion with the Operations team and Platform Engineering continued on the ideal portal hosting approach concurrent with sprint planning; portal work is probably deferred until the hosting strategy is formalized. The team also started work on the core API to allow dynamic category pages based on search terms, as well as continuing the discussion on core ResourceLoader features, in support of a proof of concept HTML5 webapp riding atop MobileFrontend. We also started a patch to make contributory features (not just banners and rewritten URLs) present for Wikipedia Zero users on carriers supporting HTTPS zero-rating. Last but not least, Yuri Astrakhan performed extensive analytics work on pageviews and page bandwidth consumption for gzip-capable Wikipedia Zero clients across all Wikipedia Zero-scoped partner pageviews; Yuri also conducted additional analytics work on SMS/USSD data.

Wikipedia Zero (partnerships)

In February, we launched Wikipedia Zero with MTN South Africa (Opera Mini browser only). MTN South Africa responded directly to the kids of Sinenjongo High School with an open letter to the students and the youth of South Africa. They said they agree that Wikipedia could give a boost to their education system, and that offering Wikipedia Zero is a small thing that could change everything (see video on YouTube).
We also launched Wikipedia Zero with Safaricom, the largest operator in Kenya. We now have three partners in Kenya, covering 90% of all mobile subscribers. South Africa is our 23rd country to launch, and Safaricom is our 27th operator partner.
The Mobile Partnerships team attended Mobile World Congress in Barcelona, where we met with existing operator partners, prospective partners and tech companies who want to support the mission. At the conference, our Wikipedia Text pilot with Airtel Kenya and the Praekelt Foundation was nominated as a finalist for the GSMA Global Mobile awards in the education category.

Language Engineering

Language tools

UniversalLanguageSelector was re-enabled with webfonts disabled by default. Research is ongoing to see whether they can and should be re-enabled by default at least for some languages.
More convenient shortcuts were added by Niklas Laxström to the Translate extension.
Kartik Mistry and Amir Aharoni are working on stabilizing the browser tests for all the language extensions and on setting up more robust online staging sites.

Milkshake

Several bugs were fixed in jquery.webfonts.

Language Engineering Communications and Outreach

Runa Bhattacharjee is setting up a Test Case Management System, to facilitate manual testing inside the team and helping volunteer translators test new versions of language tools and report the results.

Content translation

The prototype ContentTranslation server was created in Node.js, mostly by Santhosh Thottingal and David Chan. The server will be responsible for syncing the translations between all the languages, storing translated parallel texts (using Redis) and retrieving caching the results of language tools queries (machine translation, translation memory, dictionaries, segmentation, etc.). Some front-end components for the translation interface were made, mostly by Sucheta Goshal and Amir Aharoni.

Platform Engineering

MediaWiki Core

HipHop deployment

Work is starting back up on this project, with the goal of having at least one production service running on HipHop by the end of the quarter. Tim Starling is working with the HHVM upstream to finish off a compatibility layer for running Zend extensions (ext_zend_compat) under HipHop, with the goal of using it for our Lua module. Ori Livneh is working on packaging and deployment issues, as well as generally wrangling the overall development effort. Aaron Schulz is starting to investigate what is needed for wmferrors support.

Release & QA

Wikimedia development and deployment flowchart

The Release and QA team had their latest quarterly review on February 13. Highlights from the meeting include:

  • We will be hiring two new positions (a QA Automation Engineer and a Test Infrastructure Engineer).
  • We will process through all pain points from the Development and Deployment process review.
  • We will continue performing incremental improvements to the current deployment script (known as “scap”) to better inform future deployment tooling work.
  • We will create a way for tests to create fake/stub data (for use in throw-away/one-off test instances).
  • We will make it so our browser tests are more accurate cross testing and production environments.

Notable progress on things with visuals includes an updated Development and deployment flowchart (opposite), as well as an auto-generated version.

Admin tools development

While this workstream is still officially on hold, the related Global CSS/JS extension to provide per-user global modules was deployed to beta labs for testing. Additionally, patches were contributed by volunteer developers.

Search

This month, almost all LuceneSearch and MWSearch bugs have either been closed as problems that are fixed in CirrusSearch, or moved to the CirrusSearch component. We then prioritized all CirrusSearch bugs. After clearing out any remaining high priority issues, engineering work for an update to the design of the search results page is due to commence on March 10.

Wikimania Scholarships app

The application automatically transitioned from the active scholarship collection period to the review-only period on 2014-02-17. No major issues were reported for February. The back-end features of the application were demoed for the IEG team as part of their information gathering process for implementing a more structured review tool for grants.

Deployment tooling

The month of February saw a lot of work on WMF deployment tooling.
To see a real life example of what it looks like to deploy code on the WMF server cluster, watch this screencast created by Bryan Davis. That shows you what the person deploying the code sees when doing a localization (translations) update. A deployment that includes new changes to the code (e.g. MediaWiki and extensions) on the servers would be different.
The suite of tools that make up the current MediaWiki deployment tooling is continuing to be updated and rewritten in Python. You can see the work of this in the repository’s history.
The updated Development and Deployment Process flowchart is now created using Blockdiag, a Python library for converting text into flow charts. You can see the current draft in the newly-minted Release Engineering repository.
There is now a matrix showing the requirements for deployment tooling for 3 projects (MediaWiki, Parsoid (and related), and ElasticSearch (and related)). This is not a fixed document and will grow/change as more is learned.

Security auditing and response

MediaWiki 1.22.3, 1.21.6, and 1.19.12 security updates were released. We started a review of the Hadoop infrastructure and the Popups extension.

Quality assurance

Quality Assurance

In February, we updated our 3rd-party Jenkins instance to use Jenkins job builder configuration rather than Jenkins templates. Now our 3rd-party Jenkins builds matches the WMF Jenkins build scheme, giving us maximum flexibility for when and how these jobs are run in the future. Also, we laid the groundwork for several significant new test features to be announced in the near future.

Beta cluster

Not much happened on the beta cluster beside the usual maintenance and the platform being used to detect nasty bugs before they land on the production cluster. It is being used successfully for staging various features, bugfixes and extensions as well as for browser tests tracking regressions. Next month will see the beta cluster migrating from the pmtpa datacenter to the eqiad datacenter.

Continuous integration

Two instances in labs have been added as Jenkins slaves. They are equipped with tox and pip to let us tests python software while fetching dependencies from pypi (bug 44443).
Nik Everett made the CirrusSearch browsertests runnable on a labs instance which has elastic search. The job is now triggered from Gerrit and being improved.
The experimental Meetbot instance setup by Antoine back in November has been overhauled and is now maintained by the community in the tools-labs project (thank you Tim Landscheidt).
Several Debian packages are now build automatically via Jenkins thanks to an effort by Carl Fürstenberg https://integration.wikimedia.org/ci/view/Ops-DebGlue/ . It helped packaging Parsoid among others.

Browser testing

Our test coverage of MediaWiki extensions continues to prove itself. In February, using the automated browser tests running against beta labs and test2wiki, we found and fixed several critical errors that would have disrupted production wikis severely if they had been released.

Multimedia

Multimedia

Presentation slides about Media Viewer

In February, the multimedia team continued to focus on Media Viewer v0.2, getting it ready for a wider release next quarter. Gilles Dubuc, Mark Holmquist, Gergő Tisza and Aaron Arcos released a variety of new features, such as: permissions, file usage, pre-loading of images, previews during load and an improved full-screen experience. We also started development on a better ‘Use this file’ panel, including share, embed and download features. Pau Giner designed this panel, as well as a new Zoom feature for next quarter’s v0.3 version of Media Viewer. We invite you to test the latest version (see the testing tips) and share your feedback.
Fabrice Florin managed product development for Media Viewer and prepared the release plan for a gradual deployment of Media Viewer out of beta in coming months, based on the team’s latest development goals. We also hosted an IRC chat to discuss Media Viewer with the rest of the community and plan our next steps together. Lastly, the video RfC we started last month was closed with a community recommendation to not support the proprietary MP4 video format on our sites; as a result, we will only support open video formats like WebM and Ogg in the next version (v0.3) of Media Viewer. For more updates, we invite you to join the multimedia mailing list.

Engineering Community Team

Bug management

Bugzilla got upgraded from version 4.2.7 to 4.4.1, which fixed numerous bugs. Daniel Zahn puppetized Bugzilla and (together with Sean Pringle) moved Wikimedia Bugzilla to a new server. Bugzilla now displays useful queries and personal information on its front page. Its table of duplicates now displays bug resolutions (to identify popular WONTFIXed requests) and priorities as columns. The Bugzilla etiquette was finalized (read the announcement). In Bugzilla’s taxonomy, the MobileFrontend components were restructured and the Windows and MacOS entries in Bugzilla’s “OS” dropdown were reordered to list recent versions first. Andre Klapper refreshed the Annoying little bugs page by adding a section covering common questions and issues of new contributors, based on Google Code-In experience.

Project management tools review

After summarizing community input into consolidated requirements, Andre Klapper and Guillaume Paumier listed the different options mentioned during the consultation process. Those go from keeping the status quo to changing a single tool, to consolidating most tools into one. They also continued to research the main candidates by reading articles and testing demo sites. Once the list of options has been shortened collaboratively, the community RFC will start.

Mentorship programs

The six ongoing FOSS Outreach Program for Women projects all made good progress, and are headed to completion by the end of the program on March 10. For more details, check their dedicated reports:

Getting Facebook Open Academy projects up to speed is becoming even more complex than expected, but we are getting there slowly. All students and mentors met at the kick-off hackathon at Facebook headquarters on February 7−9 (see Marc-André Pelletier’s report).
Wikimedia applied to Google Summer of Code 2014 and we were accepted. We also confirmed our participation in FOSS Outreach Program for Women round 8. We are organizing both programs simultaneously under a common umbrella, as we did last year with great success.

Technical communications

In February, Guillaume Paumier continued to provide ongoing communications support for the engineering staff, and contributed to writing, simplifying, publishing and distributing the weekly technical newsletter. He also edited essays from Google Code-in students for publication on the Wikimedia blog.

Volunteer coordination and outreach

Wikimedia completed its more ambitious participation in FOSDEM (Brussels) with mild success. The Wikis devroom (co-organized with the XWiki and Tiki projects), the Wikimedia stand, and The Wikipedia Stack main track session achieved their basic goals in terms of participation and quality, but at the same time we got many ideas to do better next year. There was more progress on the tech community metrics front, and we now have interesting data gathered around our five key performance indicators: Who contributes code; Gerrit review queue; Code contributors new and gone; Bugzilla response time, and Top contributors.

Architecture and Requests for comment process

We held several architecture meetings to review Requests for Comment on IRC, and continued discussion and implementation of work begun at the architecture summit in January. We also worked on improvements to the architecture guidelines and on a draft of performance guidelines for developers.

Analytics

Kraken

We continue to make progress on the Hadoop/Kafka roll-out. We’ve encountered some issues with cross-data center latencies with Varnish-Kafka that we are currently debugging. We are also testing the Kafka-tee component that provides backwards compatibility for udp2log subscribers. Finally, we are finishing a report for the Mobile team on browser breakdowns using Kafka-provided data on Hadoop.

Limn

We’ve rolled out some minor changes that make creating dashboards easier and more intuitive.

Wikimetrics

Work progresses on enhancing Wikimetrics into a more flexible general tool. This month we completed work on a Vagrant deployment environment which will make it easier for the community to work on Wikimetrics. We’ve also made progress on the scheduler, reporting enhancements and a deployment issue.

Data Quality

We’ve fixed the following production issues:

  • Resolved on No sampled-1000 tsv file for 2014-02-06 on stat1002;
  • Wikipedia Zero team investigated ~30% increase of number of lines zero tsvs between 20140218 and 20140220 file;
  • Wikipedia Zero team investigated on light drop in zero requests around 2014-02-08;
  • Data for ULSFO Cache performance prepared for Ops blog post.

Research and Data

Video of the February 2014 Research Showcase

This month, we welcomed Leila Zia as the newest addition to the team. Leila joins the Foundation as a research scientist after completing a PhD in management science and engineering at Stanford University. Her work will initially focus on modeling editor lifecycles to better understand what affects their survival and retention.
We hosted the first public Research and Data showcase, a monthly showcase of research conducted by the team and other researchers in the organization. This month, we presented two studies on Wikipedia article creation trends and on the measurement of mobile browsing sessions. The showcase is hosted at the Wikimedia Foundation and live streamed on YouTube every 3rd Wednesday of the month at 11.30am Pacific Time.
We attended the 17th ACM Conference on Computer-supported cooperative work and Social Computing (CSCW ’14) in Baltimore. Research on Wikipedia and wiki-based collaboration has been a major focus of CSCW in the past, and this year three Wikipedia research papers were presented. We hosted a session to discuss collaboration opportunities for researchers interested in tackling problems of strategic importance for Wikimedia (a detailed CSCW ’14 report will follow on wiki-research-l).
We started creating public documentation for data sources and tools used by the team for research and data analysis and porting docs previously hosted on internal wikis (for example: analytics/geolocation).
We continued to provide ad-hoc support to various teams at the Foundation and worked closely with the Growth and Mobile teams to prepare and review results for their respective quarterly reviews.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.

For the first time, we have released a ZIM file of the entire Wikipedia in English with all encyclopedic articles and thumbnails (download the 40GB file via torrent). In our announcement, we’ve also explained how we generate those archives and advertised the tools we’ve been working with, like mwoffliner and zimwriterfs. This month, a student also worked on the creation of ZIM files containing TED talks. The internship is now over and was a success; ZIM files will be published soon. Preparation work for our Usability Hackathon has started.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

Wikisource now has access to the the data in Wikidata like ISBNs and the date of birth of an author. The Lua interface for Wikidata has been extended significantly to make it more powerful and easier to use. Support for article badges has seen more work and is now missing mostly the user interface part. Loading time of items on Wikidata has been improved drastically. Everyone is asked to provide input for the upcoming redesign of Wikidata’s user interface.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?