Main deployment of MediaWiki 1.17 to Wikimedia sites complete

Translate this post

We have been running MediaWiki 1.17 on all Wikimedia wikis for almost a day now, and things seem to be in pretty good shape.  We still have a lot of issues to fix, including a problem with disabling the enhanced toolbar in prefs and some issues with categories (see below).  Many of the problems are around Javascript and replacing code that isn’t compatible with ResourceLoader. We have a migration guide for developers of gadgets and other MediaWiki customizations, which we encourage anyone who is having problems with gadgets to refer to.  Our developers are continuing to find and fix problems.
Based on early reports (albeit very subjective) ResourceLoader is already paying dividends, as navigating around the site seems much zippier in many cases.  We hope this is your experience as well.
We still have some deployment work left to do around this release.  In addition to the bugfixes, we also want to reintroduce the category improvements that Aryeh Gregor made last summer.  We had to temporarily remove these because they required schema changes that would make it difficult to do the type of deployment that we did.  Now that we’re confident we’re staying with MediaWiki 1.17, we should be able to deploy these improvements soon.  Some bugs with categories you see now may actually be related to this plan, so the good news is that those problems may be fixed by this coming update.  We also plan to update ArticleFeedback now that we’re on the newer codebase, and we’ll probably also update some other extensions, too.
If you are interested in the deployment, there’s much more below…

Our deployment window was 6:00 UTC-12:00 UTC on February 16.  A common question about this deployment was “why did you do it at that time?”, with many people pointing out that it was prime (something) time in their timezone.  As the old saying goes: “it’s always 5 o’clock somewhere”. The release started at 6:00 UTC, which corresponds to 5pm in Australia (see?), noon in Sri Lanka, 7am in western Europe, 1am in the eastern U.S., and 10pm in the western U.S., and we had people who worked on this release in all of those places, so even finding a time that worked for our small group was tough, let alone for everyone who uses our website.
At 6:00 UTC, the banner went up, but we had some maintenance to finish up.  Once we did that, we switched over eowiki and nlwiki at around 6:30 UTC, fully expecting that the load on the site would climb pretty high and that we’d have some heavy debugging to do.  As it turns out, though, despite our collective skepticism, the fixes that the developers found over the past week were in fact enough to explain a lot of our load problems last week.
We were stunned for a bit, but then decided to try even bigger wikis.  We deployed to frwiki, dewiki, jawiki, Commons, enwiktionary, and zhwiki next.  Some redirects were broken on frwiki, so we had to back out the changes there.  We saw some rather obscure PHP errors in the logs (“canary mismatch on efree()”) and there were plenty of other non-load related problems to fix.  The developers spent a couple of hours finding bugs, fixing them, and deploying the fixes.  We didn’t get everything, but we got things to a respectable place.  One performance related thing we did do was up the size of the APC cache.
By this point, it was around 10:00 UTC, and we felt like we might be able to finish up the deploy during this window.  The only way we would know for sure was to try enwiki, so we did.  Almost immediately, bits.wikimedia.org and the corresponding Varnish caches melted down (where, as of 1.17, style sheets are served from).  Much debugging commenced, but unfortunately, it took a while for these machines to come back up.  It wasn’t until 11:00 UTC that things were fully recovered.
After we had a chance to figure out what happened, make some adjustments, and discuss, we decided to give enwiki another try.  This time, bits.wikimedia.org held up, and the site seemed to be pretty zippy.  Given that success, with only 30 minutes left in our deployment window, we decided to get the job done.  We deployed to the remaining wikis, finishing up the deployment to 1.17.
When I talk about what “we” did, there’s a pretty big “we”.  In addition to the staff, there are a lot of community members to acknowledge here as well, though we don’t know many by anything other than their IRC handles.  Ariel Glenn, Chad Horohoe, Mark Bergsma, Roan Kattouw, and Tim Starling did much of the heavy lifting, with many community members (Platonids, thedj, pawelx, Happy-melon, Bryan Tong Minh) and other staff (Sam Reed, Aaron Schulz) providing help on IRC with code review and debugging ideas. Guillaume Paumier and I provided updates like this (and occasionally chimed in asking the developers and ops people “are you sure you want to do that?”), and Nadeesha Weerasingh from Calcey tested the final output with much help from the community.  Many community members (shizhao, strainu, YMS, waihorace, Romaine, aokomoriuta, TheForums, aharoni, among others) pointed out various project-specific problems and helped us fix problems sometimes within minutes of the initial deployment.
pawelx in particular did something very cool: provided an idea for a one-line fix that cut the load on our Varnish boxes by half.
This kind of thing really shows the power of community-based open source.  With everyone involved in the deployment on the #wikimedia-tech IRC channel, we were able to maintain a tight communication loop with a lot of people, and had great real-time visibility into problems as they happened.
Thanks for your patience during this release cycle, and thanks for pulling together to make this happen!  We hope you are enjoying the improved performance, and look forward to working together on the next release!

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

5 Comments
Inline Feedbacks
View all comments

Regarding RL, a lot of problems with site JS can be caused by document.write, which can be fixed by using appropriate jQuery functions. I compiled a list yesterday and found over 600 of those instances that need fixing, so if feels bored, that would be a nice task for them.
http://toolserver.org/~bryan/stuff/document.write.html
Note that this does not include user JS.

Thanks for the clear overview of a complex process, Rob. I hope we will be able to go to a more conintuous integration type of deployment of MediaWiki soon, so that finding the needle in the haystack of thousands of revisions can be reduced to finding the bees nest in the haystack.

It’s Platonides, not Platonids

diebuche :
It’s Platonides, not Platonids

Platonids was the nickname he was using at the time.
Guillaume’s last name was spelled wrong, though, let me see if WordPress will let me correct it.

Roan Kattouw :

Guillaume’s last name was spelled wrong, though, let me see if WordPress will let me correct it.

Indeed it did. Guillaume’s name is now spelled correctly.