Ever wondered how the Wikimedia servers are configured?

Well, wonder no longer! To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code that manages all of our servers like a single large application. Of course, to really know how our servers are configured, you’d need to see our Puppet configuration.
Good news: we’ve just released our Puppet configuration in a public Git repository.

What is and isn’t included

Basically everything is included in the repository. We spent a few weeks removing private and sensitive things from the repository, though. We have these in a private repository that is only available to Wikimedia staff and volunteers with root access.
This, of course, means that the puppet configuration, as released, won’t completely work. The public repository makes references to files and manifests in the private repository. To make the repository work, you’ll need to fill in the missing information. There isn’t very much in the private repository, though, so that task should be fairly easy.

The point of making this repository public

We have a couple reasons for making this repository public:

  1. It shares knowledge with the world
  2. It lets us treat operations like a software development project

Both reasons align with our mission, but we were already mostly sharing this knowledge via wikitech. The second reason aligns more closely with our mission, as it allows us to let the world be directly involved in our operations efforts.

Labs and community oriented operations

The release of this Puppet repository is the first step in the Wikimedia Test/Dev Labs project. We’ll be going further than just making the repository readable by the world. Part of the Test/Dev Labs project is to create a clone of our production cluster. This clone will run a branch of the puppet repository.
Staff and community developers, and staff and community operations engineers will be able to push changes to the test branch of the Puppet repository, which will manage the cloned cluster. They’ll then be able to push these changes for review to the production branch of the Puppet repository. The staff operations engineers can then code-review the changes and push the changes out to the production systems.
Like the Wikimedia content, the site interface, and the site’s software (MediaWiki), community members will be able to edit the site’s architecture as well.

Accessing the repository

Since this is a public Git repository, you can do an anonymous git clone like so:

git clone https://gerrit.wikimedia.org/r/p/operations/puppet

You can browse the repository through the gitweb interface. You can see the code review activity via Gerrit.
Ryan Lane
Operations Engineer

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Inline Feedbacks
View all comments

Random idea: why don’t we publish boilerplate versions of the private files with placeholders for passwords etc such that you can replace those placeholders with your own private data and have things Just Work?

That’s a good idea, and we’ll likely get around to doing that soon.

We did the same 1 year ago for the mageia project ( for the same type of reason, plus a couple others like transparency and trust, and to help us recruit sysadmin ), and we faced the same issue for password.
And so we used extlookup with a default value of x, so anybody could test ( in a insecure way ). I heard that hiera was also something nice for that, you could publish a default file, and override it without fiddling with git.

Ryan et al
Great to see these released. Let us know if you’d like any help, advice or code review from Puppet Labs. Happy to help set something up for you.

@James We’d love to have some help. We aren’t ready just yet to give out accounts, but will be soon. I’ll let you guys know when it’s ready.

Hi, Great to see you using Puppet, and publishing your recipies! I work on the puppet packages for Debian, I hope you find them useful. I noticed that you have not embraced the best practices model of using modules and instead have everything in manifests. Modules is a really interesting way of abstracting out your puppet modules that opens the door for great collaboration with other groups who are also working on module development. You can share development on abstract pieces of your infrastructure, without having to give any access. Its a very interesting way to be involved in collaborative… Read more »

@micah: Well, until now we weren’t planning on sharing our puppet config. If you’d like, you can help us with things like this.

Depending on what version of puppet you’re using, you could also use extlookup; store the values in either host- or domain- or site-specific csv files, use extlookup(“super-secret-password”) and then you can simply store the extlookup data in a separate repo. That way you could still publish a sensible “common.csv” with such amusing fields as:

@Robin: That’s a pretty great idea. I may look at implementing that soon.

[…] Foundation using this environment. I recently set up Gerrit, imported our puppet repository, and released it to the public, as a first major step in the opening of the Labs […]

[…] Wikimedia, курирующий работу свободной энциклопедии Wikipedia, открыл доступ к Git-репозиторию в котором представлены файлы […]

[…] Wikimedia, курирующий работу свободной энциклопедии Wikipedia, открыл доступ к Git-репозиторию в котором представлены файлы […]

How do you deal with puppet change management and testing configurations? for example the next website release now requires a change to an apache conf file, a pear module, and crontab.

We’ll be figuring that out soon enough ;). This is a work in progress. If you are willing to help and learn with us, please contact me.

It seems access is limited now, right?
$ git clone https://gerrit.wikimedia.org/r/p/operations/puppet.git
Initialized empty Git repository in /tmp/puppet/.git/
fatal: https://gerrit.wikimedia.org/r/p/operations/puppet.git/info/refs download error – The requested URL returned error: 403

It’s working for me… It’s possible we were doing some maintenance.


Thumbs up, Roan! I was just wondering the same.
(A link to a daily archive might also be a good idea. Will be simpler for git non-speakers.)

Also fails for me:
(0) hikari ~/workdir/tweak $ git clone https://gerrit.wikimedia.org/r/p/operations/puppet
Initialized empty Git repository in /home/jon/workdir/tweak/puppet/.git/
error: The requested URL returned error: 403
warning: remote HEAD refers to nonexistent ref, unable to checkout.
(0) hikari ~/workdir/tweak $
Accessing https://gerrit.wikimedia.org/r/p/operations/puppet.git/info/refs directly in iceweasel fails, too.

Hmm… I was able to clone when I switched to an almost-identical machine, but with newer software. In particular, the machine with git couldn’t clone, but the machine with git could. That might be it, Yuri.

[…] el titulo de “Ever wondered how the Wikimedia servers are configured“, la misma gente de Winkipedia nos enseña en su blog esas configuraciones de los servidores […]

[…] Ever wondered how the Wikimedia servers are configured? […]

Thank you, guys. It was buggy version of git-core package (1:
git-core 1: works perfectly.

Great to know. Thanks for the update Yuri!

[…] fairly detailed public documentation on how this is implemented. We’ve also very recently released our puppet configuration in a public git repository, so you can see the exact configuration as […]

[…] It was launched in 2005 by Puppet Labs. The founder and current CEO of which is Luke Kanies. Wikipedia is one of many high scaling websites that uses Puppet to manage its […]

“gitweb interface” link is broken. Instead of https://gerrit.wikimedia.org/r/gitweb?p=operations%2Fpuppet.git, it should link to https://phabricator.wikimedia.org/diffusion/OPUP/, I think.