Opening our operations with Wikimedia Labs

16 April 2012 by Ryan Lane

For the past year and a half we’ve been working on a project named Wikimedia Labs, which enables us to invite our community to contribute to how our sites are run. Labs is a cloud computing environment using OpenStack for development, testing and deployment of Wikimedia’s infrastructure as a whole, enabling us to treat our infrastructure as an open source software project.

The problems we’re solving

When Wikipedia and its sister projects started, volunteers had root level access on our infrastructure. They were the only roots and most of the infrastructure they built is still in use today. Our lenient access policy made us flexible, so changes could happen quickly. Also, the sites were smaller, had far fewer users, and large, fundamental changes could be made in production.
Growth has made us less willing to give out root access to volunteers. Because of the size of our sites, downtime is less acceptable. But having fewer volunteers means we have less ideas, and due to that, our ability to make changes quickly is decreased. We haven’t had a new volunteer root in years. We haven’t even had a new volunteer with shell access. Engaging volunteers and enabling them to easily contribute is a wider problem as well.
Our software development community scales with volunteers. Unfortunately, operations doesn’t scale in a similar way right now. We’re limited to the staff operations engineers we currently have. The staff is great, but the fact that operations can’t scale to meet the needs of a large growth of developers means that operations is a bottleneck. Furthermore, our access policy prevents volunteer developers from learning how our infrastructure works.
This leads to a situation where our staff developers and volunteer developers can’t easily collaborate. Our volunteers also have no way of appropriately testing their changes, since our infrastructure is complex and difficult to replicate. This means it’s harder to take contributions, which further slows the pace of changes on our sites.

The approach to solving the problem

The solution we are taking for this is to open access to our infrastructure up as widely as possible. We are making it possible for volunteers to use and modify our infrastructure like they do our software.
Operations work can be done via configuration management tools, orchestration, and cloud computing. Thanks to this, operations can be treated like a software development project. If it can be a software project, then we can make it an open source project, allowing us to open infrastructure development to the wider community.
Opening up infrastructure development to the software development community allows developers to bypass operations for work when operations is too busy to get to a task quickly enough. It also makes it possible to take contributions from volunteer operations engineers, bringing back the flexibility we originally had.
However, going staff-only to volunteer-based is hard. We still have the same concerns about site availability, so we must take a new approach.
Our first step was to release our puppet configuration. We spent nearly a month going through the configuration, pulling out sensitive info, ensuring we don’t release anything that would open us to major vulnerabilities.
Next we created an infrastructure, called Labs, where volunteers can test their work, document it, and eventually have it deployed live to production. Our goal is to build a self-sustaining operations community using this infrastructure in a way similar to how the operations staff team currently works.

A technical overview of what Labs is

TL;DR: Labs is an environment where users can create and manage entire computing and networking infrastructures.
Labs has its own set of terminology, so to understand this technical overview, it’s likely good to read the terminology guide.
Labs allows users to create instances (virtual machines) that multiple users can use and fully control. It does so inside of projects, where projects are maintained by a community and the projects reflect real world projects, like testing and developing MediaWiki extensions for Incubator.
Projects are a security concept. A project is a grouping of resources, like instances, firewall rules (via security groups), IP addresses, DNS entries, etc. It’s also a grouping of users and roles. A project has membership and roles that allow users to perform certain actions. Membership to a project generally allows a user to access instances that exist in the project, and often that access is full root level privileges. A member of a sysadmin role in a project allows that user to create, delete, and reconfigure instances. A member of a sysadmin role in a project allows that user to manage IP addresses, DNS records, and security groups.
Here’s an example workflow in Labs, where a user wants to create and test a new search infrastructure:

Request a new project called search, where they have sysadmin and netadmin roles. When the project is created, document the project’s purpose on the project’s page.
Create a web security group, with ports 80 and 443 open to the world.
Create three small instances (with 2 CPUs and 2GB RAM), with one of the three instances using the web security group. The instance with the web security group will be the search frontend, and the other two will be indexers.
Configure the three instances, by hand, documenting the process on the project’s page.
When the service is ready to be demoed, a floating IP address will be requested. Once the IP address is given, it’ll be associated with the search frontend.
After the IP address is associated, add a DNS entry to it, allowing end-users to access the search infrastructure by host name.
If the demoed infrastructure is successful, the configuration of the instances will then be puppetized and the puppet configuration will be pushed to the puppet repository for review.
Upon successful review, the code will be merged, and will be deployed to production. The project will continue to be used for maintenance and feature testing before production.

The reception and current use of Labs

We launched Labs as a closed beta in October 2011, and are still in closed beta at this time. At the time of this writing, Labs has 79 projects, 129 instances, and 264 users.
So far the community is very vibrant. Here’s a small select list of community maintained projects:

Bots: A bots infrastructure, for hosting bots used to edit Wikimedia sites.
Deployment-prep: A clone of our production infrastructure, meant to be used for testing software deployments before they go to production. You can be a beta tester for changes on the beta wikis.
Nagios: A volunteer maintained nagios monitoring system for Labs.
Hugglewa: A project used to develop a web based version of huggle.
Wikidata-dev: A project used to develop and demo the exciting new Wikidata project.
Mobile: A project used to develop and demo changes to our mobile infrastructure and software.
Maps: A project used to build OpenStreetMap infrastructure. This project is a collaboration of our staff and volunteer developers and developers of OpenStreetMap to add OpenStreetMap support to Wikimedia projects.

There are many, many more, and we add new projects every day.

How to help

Labs is community built and maintained. We’d love for you to help out! If you’d like to help, please request an account and come talk with us about what you’d like to help with.
We can easily be found on #wikimedia-labs on Freenode (IRC); look for Ryan_Lane, andrewbogott, sarasmollett, paravoid, and sumanah. Also you can subscribe and send us email on our mailing lists, labs-l.
Ryan Lane
Operations Engineer

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

Diff

Opening our operations with Wikimedia Labs

The problems we’re solving

The approach to solving the problem

A technical overview of what Labs is

The reception and current use of Labs

How to help

Can you help us translate this article?

Related

Welcome to Diff

Subscribe to Diff via Email

Wikimania Katowice

Wikimedia CEE Meeting 2024

Celtic Knot 2024

Wikimedia Foundation News

Wikimedia Technology Blog

Down the Rabbit Hole

	This comment is spam
	This comment is a violation of the Code of Conduct
	Other