We had 52 minutes of downtime on the English-language Wikipedia site today; only en.wikipedia.org was affected. Our master database server was thrown into a funky state in which hundreds of access threads were stuck in the “statistics” state — which seems to be MySQL’s way of saying “I’ve fallen and I can’t get up”.
It’s unclear exactly what set it off, but basically nothing works until you restart MySQL. After switching the site to an alternate master database, all has been well.
At 52 minutes from start of event, this took us a bit longer than I’d like to resolve — we had to percolate through a couple levels of alert calls before we finished diagnosing it and getting the DB switch pushed through. (Sorry to wake you up early Tim!)
A similar event in future should be fixable within a few minutes, thanks to Tim’s work on making the master-switch system more foolproof. We’re fixing up our internal documentation so all our site ops will now know how to run the database master switch script next time!

— brion

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

Welcome to Diff

Welcome to Diff, a community blog by – and for – the Wikimedia movement. Join Diff today to share stories from your community and comment on articles. We want to hear your voice!

Subscribe to Diff via Email

4 Comments

Inline Feedbacks

View all comments

Pharos

15 years ago

#570

I have a brilliant idea!
From now on, our downtime screen should say, “This Wikipedia is broken. We recommend looking up this subject in your local library; while you’re at it, kindly take down notes and add them to the Wikipedia article later.”

pfctdayelise

#571

No donation link? A wiki was down; are donations up?

brion

#572

In this case the donations page would have worked fine… we don’t always want a link though since some sitewide outages would leave that broken to. 🙂

Fred from France

#573

well it is doing it again, only partial access and it s including the wikinews servers this time with no access to the wikinews page. I vote conspiricy theory. Is it safeguarded against malicious flooding? ~~~~

Diff

Downtime on en.wikipedia.org resolved

Can you help us translate this article?

Related

Welcome to Diff

Subscribe to Diff via Email

Wikimania Katowice

Wikimedia CEE Meeting 2024

Celtic Knot 2024

Wikimedia Foundation News

Wikimedia Technology Blog

Down the Rabbit Hole

	This comment is spam
	This comment is a violation of the Code of Conduct
	Other