10/10/10 Outage

Around 18:00 UTC today, all Wikimedia projects experienced an unplanned outage caused by a cascade of events originating with the Image Scalers and eventually spreading through our web servers and load balancers due to an apparent bug in PyBal code. Situation was remedied by restarting key servers and rebalancing the load between subsystems. Full services availability was restored at 19:30 UTC.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.