Current events and traffic spikes

Translate this post

News agencies today are reporting that pop star Michael Jackson has been hospitalized, and perhaps died. We can all think back on how the King of Pop has touched our lives, but today we can also see how high-profile news events can affect a web site… See also past events such as the Popedotting and the 2008 US election.
Here at the office we first noticed something was going on when IM services such as AOL Instant Messenger started logging people out — we quickly noticed that our own servers were hitting load spikes, and suspected there was something going on…
Server CPU load spike (likely several more to come):
load-spike
The actual traffic load spike is subtler; server effects can be disproportionate to the actual traffic:
Image (3) traffic-spike.png for post 3737
Update 22:53 UTC:
The traffic is pretty much holding steady but we’ve still been seeing intermittent load spikes:
Image (5) load-spike2.png for post 3737
These are at least in part due to one of our memcached internal data cache servers going wonky and swapping due to overuse of memory from text storage running on the same node. We’ve reduced traffic on the node and restarted it to even out its memory usage. (Thanks Domas!)
Update 23:00 UTC:
You may see intermittent messages like “(Cannot contact the database server: Unknown error (10.0.6.24))” as temporary database overloads cascade around the system. Sorry for the inconvenience while we work the kinks out; just wait a few minutes and try again…
Update 23:43 UTC:
We believe a large chunk of the CPU overload is due to cache swarming — many visitors simultaneously causing a re-render of the page due to an expired cache version. I’ve put in a temporary hack which will reduce the amount of rendering, but may cause some people to see out of date copies of the page.
Update 2009-07-02:
Here’s a link to Domas’s blog post with technical details on the cash swarming problem.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

10 Comments
Inline Feedbacks
View all comments

So based on those two spikes, Fawcett was 1/3rd as popular as Jackson?

Yeah, I got “server couldn’t connect to database” a few times when I looked at Talk:Michael Jackson. He knocked Wikipedia off air momentarily!

Yeah, he was -that- great 🙂

re: jps
Remember, Michael Jackson is on the front page of Wikipedia, whereas Farrah isn’t. That partially helps with the traffic boost.

Here at $midsizedsmartphoneco, the web traffic did the same thing right as the news went out, as did SMS traffic.
Not much of the load went towards Wikipedia, but it was a pretty widespread effect across many sites.

When I got the error several times and learned about the death from a friend, I immediately joked “The site is dying because of Michael Jackson”.
Jeee, it was actually true !

I posted somewhat more verbose tech explanation at http://dammit.lt/2009/06/26/embarrassment/ in case anyone wonders 🙂

What’s causing the error message this time?

Giants27 :

What’s causing the error message this time?

That doesn’t appear to be MJ-related. Primary database fell down and needed to be switched out: http://techdiff.wikimedia.org/2009/07/downtime-on-en-wikipedia-org-resolved/

MJ is definitely the king. When i listen to his records i feel as he is still alive. For sure one of the best music superstars ever been born!