Yesterday we flipped a switch: editors can now use Lua, an innovative programming language, to generate sections of wiki pages on all our sites. We’d like to talk about what this means for the open source community at large, for Wikimedians, and for our future.
Why we did this
When we started digging into the causes of slow pageload times a few years ago, we saw that our CPUs ate a lot of time interpreting templates — useful bits of markup that programmatically told MediaWiki to reuse little bits of text. Templates are everywhere on our sites. Good Wikipedia articles heavily use the citation templates, for instance, and you’ve seen the ubiquitous infoboxes on every biography. In fact, editors can write code to generate substantial portions of wiki pages. Hit “View source” sometime to see how.
But, because we’d never planned for wikitext to become a programming language, these templates were terribly inefficient and hacky — they didn’t even have recursion or loops — and were terrible for performance. When you edit a complex article like Tulsi Gabbard, with scores of citations, it can take up to 30 seconds to parse and display the page. Even as we worked to improve performance via caching, query profiling, new hardware, and other common means, we sometimes had to advise our community to remove functionality from a particular template so pages would render faster.
This wouldn’t do. It was a terrible experience for our users and especially hard for our editors, who had to wait for a multi-second roundtrip after every “how would this page look?” preview.
So our staffers and volunteers worked on Scribunto (from the Latin for “they shall write”), a MediaWiki extension to allow editors to embed Lua scripts instead of wikitext for templating. And volunteers and Foundation staffers have already started identifying pages that are slow to render and converting the most inefficient templates. We have 488,731 templates on English Wikipedia alone right now. The process of turning many of those into Lua scripts is going to affect everyone who reads our sites — and the Scribunto project has already started giving back to the Lua community.
Us and Lua
For instance, our engineer Brad Jorsch wrote mw.ustring.lua, a Unicode module reusable by other Lua developers. This library is good news for people who write templates in non-Latin characters, and for anyone who wants a version of Lua’s standard String library where the methods operate on characters in UTF-8 encoded strings rather than bytes.
And with Scribunto, we empower those frustrated Wikimedians who have been spending years breaking their knuckles making amazing things in wikitext; as they learn how much easier it is to script in Lua, we hope they’ll be able to use those skills in their hobbies, schools, and workplaces. They’ll join forces with the graduates of Codecademy, World of Warcraft, and the other communities that teach anyone to program. New programmers with basic knowledge of computer science who want to do something real with their new skills will find that Lua scripting on Wikimedia sites is a logical next step for them. Our implementation only differs slightly from standard Lua.
And since Scribunto is an extension that any MediaWiki administrator can install, we hope the MediaWiki administrators out there will enjoy using Lua to more easily customize their wikis for their users.
Structured data and new ways to display it
Scribunto lays the foundations for exciting work to come when the Wikidata structured data project comes further online (the Wikidata interface is still in development and being deployed in phases). We know that Lua will be an attractive way to integrate Wikidata information into pages, and we hope a lot of (currently) unstructured data will get structured, helping new applications emerge.
Now that Lua and Wikidata are more mature, we can look forward to enabling more functionality and plugging in more libraries. And as we continue deploying Wikidata, people will make interesting improvements that we currently can’t predict. For instance, right now, each citation is hard to programmatically dissect; the Cite template takes many unstructured parameters (“author1,” “author2,” etc.) We structure these arguments by convention, but the data’s not structured as CS folks would have it, and can’t be queried via APIs, remixed, and so on.
But in the future, we could have citations stored in Wikidata and then put together onto article pages using Lua, or even assembled into other various reasonable forms (automatically generated bibliographies?) using Lua, and it will be more easy for Zotero users to discover. That’s just one example; on all our sites over the next few years, things will change from the status quo in a user-visible way. The old math and geography templates were inefficient and hard to hack; once rewritten, they’ll run faster and perhaps editors will use them more. We might see galleries, automatic data analyses, better annotated maps, and various other interesting processes and queries embedded in Wikimedia pages.
Open for change
So, now is the first time that the Wikimedia site maintainers have enabled real coding that affects all readers. We’re letting people program Wikipedia unsupervised. Anyone can write a chunk of code to be included in an article that will be seen by millions of people, often without much review. We are taking our “anyone can edit” maxim one big step forward.
If someone doesn’t like the load time of a webpage, they can now actually improve it themselves. Just as we crowdsourced building Wikipedia, now we’re crowdsourcing bits of infrastructure improvement. And this kind of massively multiplayer, crowdsourced performance improvement is uniquely us.
Wikitext templates could do a lot of things, but Lua does them better and faster, and now mere mortals can do it. We’re aiming to help our users learn to program, to empower themselves, and to help each other and help our readers.
We hope you’ll join us.
Sumana Harihareswara, Engineering Community Manager