Inventing as we go: building a visual editor for MediaWiki

Visual_Editor-logo
We at the Wikimedia Foundation, in conjunction with Wikia, are building the VisualEditor for Wikipedia and all other sites based on the MediaWiki software. This is taking us some time, and we often get asked what’s involved in this, just how hard it is, and why it’s taking us so long, so we thought we’d explain some of the intricacies in the most challenging software project the Wikimedia Foundation has ever worked on.

What are we trying to do?

We are creating software that will let users load, edit and save Wikipedia articles visually, bypassing the existing system that requires our users learn “wikitext,” a complex markup code. Instead, the articles they’re editing will look the same as when they’re reading them, and any changes they make will be obvious in their effects before they press save — just like writing a document in a word processor.

Haven’t people done this before?

Yes and no. There are lots of visual text editors out there, and even a few open source ones that can edit Web pages quite well, but they are insufficient for our needs for a number of reasons.
One criterion that is hugely important to us is that our editor should work with lots of languages. This is not just a matter of supporting certain right-to-left languages, or a few based on ideograms, but being able to use any and all of the 290 languages we currently provide. We also want to be able to do so seamlessly in the same documents to support our multi-lingual communities. Some of the languages we are committed to working with have very little software support, and we are often one of the few sources of written material for them, or at least, one of the largest.
Another issue is that wikitext has grown over the past 12 years to have a large number of rich and complicated features that are not just “simplified ways of writing HTML.” Though we originally intended many of these to be used only occasionally, they have often evolved to be at the core of how MediaWiki pages are now written by many Wikimedia users and more widely.
These advanced features include content transclusion (pulling in a “live” copy of one page or part of it into another), templates (transclusion with parts defined at the second page rather than the source), and parser functions (pages that show different things depending on hundreds of potential options, like the day of the week or whether another page exists). Attempting to retrofit this into an existing editor would have been exceptionally difficult, and more work than starting from scratch.

VisualEditor and Parsoid module stack
How VisualEditor and Parsoid work together (click to enlarge)

What is the parser?

Finally, we need to not just edit pages, but to read and save existing wiki pages in the old wikitext format, and continue to work with it in parallel to the new editor. We can’t throw away the huge amount of work our communities have done over the past 12 years, so we need to re-write the “translator.” This means making a two-way “parser” — a bit of software that converts wikitext into HTML and back again. Until now, we have only had a one-way parser; the second stage, converting back from what people want to write to wikitext, has had to be done by our editors in their heads.
This means that we’ve had to have an entirely separate project – the Parsoid – that can translate in both directions: from wikitext to Web pages and also back again. This is not remotely an easy task; you have to be very attentive in replacing a parser, as it’s hugely important that we don’t break anything. The old parser, and the wikitext “language” it interprets, just grew organically as people had ideas, and no one really designed it. There are nearly two billion versions of pages in Wikimedia’s wikis alone, and this lack of design means that there are a huge number of little rules we need Parsoid to follow to avoid “dirty-diffs” — issues where the wikitext would be broken by people using the VisualEditor.

Parsoid HTML+RDFa content model
Explaining the HTML+RDFa content model used by Parsoid (click to enlarge)

We use automated testing to “round-trip” 100,000 randomly chosen pages from the English Wikipedia (as a reasonable sample of wiki content in the Latin alphabet): taking the wikitext, converting it to HTML format and back to wikitext, and comparing the result. This helps us by identifying many issues to fix so that using Parsoid does not cause articles to break. Right now we get about 80 percent of these articles to round-trip without any differences at all (up from 65 percent in October), an additional 18 percent round-trip with only minor differences (whitespace, quote style etc.), and the remaining 2 percent of pages have differences that still need fixing (down from about 15 percent in October).

Learn more

As you can see, to make a visual editor that our users need is a large amount of work. No existing technologies could do what we wanted to, so we have needed to work very hard to make sure that we deliver this. We look forward over the next few months to showing off how editors will be able to use VisualEditor and Parsoid for a much better experience, free of learning any complicated codes, letting them focus on the content and not the tool they use to write it.
If you’re interested, we have a brief presentation about how Parsoid and VisualEditor work, and what they will look like on Wikipedia.
James ForresterProduct Manager, VisualEditor and Parsoid

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

18 Comments
Inline Feedbacks
View all comments

According to the time table First English Wikipedia deployments and iteration should start now, by July 2013 Visual Editor should be enabled by default for (almost) every wikipedia/mediawiki instance.
https://www.mediawiki.org/wiki/Wikimedia_Engineering/2012-13_Goals
Any updates?

[…] sound that complicated, and thus shouldn’t take too long, right?In response, Wikimedia has shared an explanation of why it believes this is the “most challenging software project the Wikimedia Foundation […]

[…] http://diff.wikimedia.org/2012/12/07/inventing-as-we-go-building-a-visual-editor-for-mediawiki/ Warum es so super fucking complicated ist, einen WYSIWYG-Editor für Wikipedia zu bauen (und warum wir uns noch ein bisschen gedulden müssen). […]

Torsten,
Should we treat this as a formal media request?
thanks,
Matthew Roth
WMF Communications team

The 2011/2012 targets were, “Develop Visual Editor. First opt-in user-facing production usage by December 2011, and first small wiki default deployment by June 2012.” Those dates passed, and the new 2012/2013 targets then were: “Launch a new default environment for Wikipedia projects that does not require markup. Limited English Wikipedia release for real-world editing in December 2012. Deployed to majority of Wikimedia wikis and ready for default usage by July 2013.” December 2012 is upon us. Would it be correct to assume that the limited English Wikipedia release will now now take place this month? If so, do you have… Read more »

Matthew: Nope. No deadlines, no articles planned, just curious.
Greetings to HaeB. 😉

I don’t like that Wikipedia has such a close relationship with Wikia. I personally would like to have Andreas’ questions answered.
Thanks,

Sounds great. I would be interested to know how it’ll deal with Wikipedia’s existing microformats.

PS:
According to the slides, the deployment will start in a few days.
“On Tuesday 11 December we will launch VisualEditor on a few Wikipedias”
https://docs.google.com/presentation/d/1bsP3zJB-K4KSmMABD-BwJT9d9tMbvo5OujZBpWR94VM/edit#slide=id.p28

I was disappointed, again, that there was no discussion of creating a Table namespace, as exists with some other wikis (I saw one demonstrated years ago). That would significantly simplify the overall problem of a WYSIWYG editor, because a specialized table editor could be developed as a separate module.

I’m as well interested to hear an answer over the details about the relationship between Visual Editor and Wikia that Andreas raised.

why is all wiki p, wiki m ‘micro-unreadable’now ?
what do I do to make it PC
readable ?

http://terrychay.com/article/response-to-questions-concerning-the-visualeditor.shtml I feel somewhat responsible for putting James Forrester on the hot seat concerning this article as I was the one who asked him to take out time from his busy schedule to explain some of the challenges faced by the team concerning the project and manage expectations somewhat concerning the release. I want to thank you for your questions and concerns. They are very important, and realize if I’m not answering them satisfactory, it’s mostly-likely due to a failing on my part, not as some slight to you. If, on the other extreme, my answers seem too overwrought, please… Read more »

terry chai: Thank you.

Thanks for the post. Here is a translation in French: Créer un éditeur visuel pour MediaWiki : en quoi est-ce complexe ?

[…] Autor: Ferdinand Thommes  Quelle: Wikimedia « Vorige News | Nächste News […]

I’d like to see a distributable made available to end users more readily than grabbing latest source code. Any chance of that happening soon?

[…] a test version does exist, it remains a dream deferred. Yet a goofy feature called the MoodBar was released in 2012 to collective puzzlement, only to be […]