How the new center in São Paulo is already lowering Wikipedia load times.
One second ago, people around the world accessed Wikipedia 5,500 times.
What those people likely don’t know is that what appeared on their screens came via a data center owned and run by the Wikimedia Foundation, the non-profit that operates Wikipedia and other Wikimedia projects. The Foundation’s global network of data centers makes the loading of Wikipedia articles and other Wikimedia content fast, secure, and private, no matter where on Earth you are located.
The Foundation recently opened a new data center in São Paulo, Brazil. It is the latest to open among the Wikimedia Foundation’s seven data centers across the world and is the first center in South America. As a result of the new center, the average time it takes a reader in Brazil to load Wikipedia dropped by one-third of a second. That’s important, because for every moment it takes to load a page, someone may get frustrated and become less likely to use Wikipedia in the future.
Opening a data center is fascinatingly complex. About 12 Wikimedia Foundation staff spent over ten months surmounting legal concerns, regulatory hoops, extended equipment delays, and employees physically installing servers into the data center to make the new center possible.
Let’s break down how the Foundation accomplished this endeavor.
Just what is a data center? Why are there “servers” in it?
You may have seen one of the many Hollywood movies where a main character needs to break into a room with a bunch of neatly arranged electronic towers and install a device to destroy the villain’s evil plans. Take Tron: Legacy or Mission: Impossible, for examples.
The accuracy of these fictional scenes can leave room for desire, but they get the idea across: the actors are plugging those devices into individual servers that hold a platform’s digital data, and when those servers are networked together, you have a data center. In real life, whenever you try to access a page on Wikipedia, the prompt goes to the Wikimedia Foundation data center nearest to you. In turn, that sends it to your device.
How do the Wikimedia Foundation’s data centers work? Why are they important?
The Wikimedia Foundation maintains seven data centers located in the United States, Singapore, the Netherlands, France, and now Brazil.
Most of these data centers are used to serve you “cached” versions of Wikimedia content. That means that the data centers try to keep a copy of that content on file after a person opens it for the first time. This practice allows us to quickly respond to whatever page you’d like to load and send it to you with a minimum of lag.
Still, it is impossible to overcome the physical limitations of distance. Prior to opening a data center in Brazil, someone living in Rio de Janeiro took twice as long to load a Wikipedia article as someone in New York City because they were that much farther from the nearest data center.
As part of our commitment towards knowledge equity, the Wikimedia Foundation has been steadily opening servers outside the United States since 2012. Each new location lowers the average loading time of all the regions it is connected to.
What goes into building a new data center?
So much. Let’s break it down.
Alright, let’s start with the legal matters.
First up, the Foundation has to select a location for a new data center. This process involves months of work by the legal team to vet the laws and regulations governing each candidate location. Wikimedia websites collect only a vanishingly small amount of personal data from people who visit Wikimedia sites, and so any location that the Foundation selects has to pass our high privacy standards. They also need to determine the answers to more mundane questions, like tax liabilities.
Not coincidentally, this is very similar to the reason why the Foundation operates its own data centers, even as much of the tech industry has moved to cloud computing. For Wikimedia, it’s a simple choice: we believe in user privacy, and we believe that you should be able to read anything on our sites without fearing that a company, government, or anyone else is snooping on what you are interested in.
In addition to all that, the data center in São Paulo presented the Foundation with new challenges because we had to find vendors that would work with us despite not being listed on Brazil’s National Registry of Legal Entities (CNPJ for short). This impacted our equipment purchases, our delivery plans, and even our acquisition of IP addresses.
What else goes into selecting a location?
Wikimedia Foundation data centers need to be situated in a city where a number of submarine communications cables come ashore. These fiber optic cables run along sea floors all around the world and are the backbone of the internet.
How do you build a data center?
With a hardworking team of Wikimedia Foundation staff and a dream!
Each data center requires a significant amount of physical hardware that needs to be purchased and sent to the desired location. This includes:
- Physical hardware like servers, routers, switches, cabling, and power
- The data center colocation provider
- Network circuits like peering, transport, and transit with redundancy
This work is more complicated today than the early days of Wikipedia. But the Wikimedia Foundation’s continued focus on improving technical infrastructure has meant that the Foundation can now worry less about when and how often Wikipedia will break and go down. Instead, the Foundation can devote its resources to improving the lived experience of readers and editors around the world, such as reducing their loading times.
Unfortunately, all that equipment cannot install itself. Servers do not put themselves on racks, nor can they plug themselves into the correct ports or swap out cables that prove to be faulty. They also cannot label ports to make sure that anyone doing future maintenance on those servers understands how and why they are wired. Instead, for each data center the Foundation has opened in recent years, staff members have gone to each location to set everything up.
Having people on-site for the installation process is also useful when someone needs to figure out solutions for when—not if—things go wrong.
For example, to set up the São Paulo data center, the Foundation shipped down four pallets full of equipment for staff members to install over the course of one week. The first three pallets made it through customs and to the data center on time. The other, which included all of the team’s routers, switches, and cables, did not—it was delayed by mandatory documentation and pending approvals. It only arrived on the last day of the team’s planned work, and they completed 2 to 3 days of work in the final 12 hours before their flights left for home.
Setting up a new Wikimedia data center is complex and delicate work. Twelve staff members spent upwards of 1,600 hours over ten months to study legal issues, select a location, find vendors who would work with us, order the equipment, send it to the right destination, have some of it not arrive in time, and get everything installed anyway. And that’s just the beginning of the journey. We hope that this data center will provide fast and reliable access to Wikipedia across the region for many years to come.
This is just one of the reasons why the Wikimedia Foundation exists. It takes on daunting tasks behind the scenes so that people everywhere can contribute to and access the sum of all knowledge.
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation