Get live updates to Wikimedia projects with EventStreams

Translate this post

Photo by Mikey Tnasuttimonkol, CC BY-SA 4.0.

We are happy to announce EventStreams, a new public service that exposes live streams of Wikimedia events.  And we don’t mean the next big calendar event like the Winter Olympics or Wikimania.  Here, an ‘event’ is defined to be a small piece of data usually representing a state change. An edit of a Wikipedia page that adds some new information is an ‘event’, and could be described like the following:

{
'event-type': 'edit',
'page': 'Special Olympics',
'project': 'English Wikipedia',
'time': '2017-03-07 09:31',
'user': 'TheBestEditor'
}

This means: “a user named ‘TheBestEditor’ added some content to the English Wikipedia’s Special Olympics page on March 7, 2017 at 9:31am”.
While composing this blog post, we sought visualizations that use EventStreams, and found some awesome examples.
Open now in Los Angeles, DataWaltz is a physical installation that “creates a spatial feedback system for engaging with Wikipedia live updates, allowing visitors to follow and produce content from their interactions with the gallery’s physical environment.” You can see a photo of it at the top, and a 360 video of it over on Vimeo.
Sacha Saint-Leger sent us this display of real-time edits on a rotating globe, showing off where they are made.

Ethan Jewett created a really nice continuously updating chart of edit statistics.

A little background—why EventStreams?

EventStreams is not the first service from Wikimedia to expose RecentChange events as a stream. irc.wikimedia.org and RCStream have existed for years.  These all serve the same data: RecentChange events.  So why add a third stream service?
Both irc.wikimedia.org and RCStream suffer from similar design flaws.  Neither service can be restarted without interrupting client subscriptions.  This makes it difficult to build comprehensive tools that might not want to miss an event, and hard for WMF engineers to maintain. They are not easy to use, as services require several programming setup steps just to start subscribing to the stream.  Perhaps more importantly, these services are RecentChanges specific, meaning that they are not able to serve different types of events. EventStreams addresses all of these issues.
EventStreams is built on the w3c standard Server Sent Events (SSE).  SSE is simply a streaming HTTP connection with event data in a particular text format.  Client libraries, usually called EventSource, assist with building responsive tools, but because SSE is really just HTTP, you can use any HTTP client (even curl!) to consume it.
The SSE standard defines a Last-Event-ID HTTP header, which allows clients to tell servers about the last event that they’ve consumed.  EventStreams uses this header to begin streaming to a client from a point in the past.  If EventSource clients are disconnected from servers (due to network issues or EventStreams service restarts), they will send this header to the server and automatically reconnect and begin from where they left off.
EventStreams can be used to expose any useful streams of events, not just RecentChanges.  If there’s a stream you’d like to have, we want to know about it.  For example, soon ORES revision score events may be exposed in their own stream.  The service API docs have an up to date list of the (currently limited) available stream endpoints.
We’d like all RecentChange stream clients to switch to EventStreams, but we recognize that there are valuable bots out there running on irc.wikimedia.org that we might not be able to find the maintainers of.  We commit to supporting irc.wikimedia.org for the foreseeable future.
However, we believe the list of (really important) RCStream clients is small enough that we can convince or help folks switch to EventStreams.  We’ve chosen an official RCStream decommission date of July 7 this year.  If you run an RCStream client and are reading this and want help migrating, please reach out to us!

Quickstart

EventStreams is really easy to use, as shown by this quickstart example in JavaScript.  Navigate to http://wikimedia.org in your browser and open the development console (for Google Chrome: More Tools > Developer Tools, and click ‘console’ on the bottom screen, which should open on the browser below the page you are visiting). Then paste the following:

// This is the EventStreams RecentChange stream endpoint
var url = 'https://stream.wikimedia.org/v2/stream/recentchange';

// Use EventSource (available in most browsers, or as an
// npm module: https://www.npmjs.com/package/eventsource)
// to subscribe to the stream.
var recentChangeStream = new EventSource(url);

// Print each event to the console
recentChangeStream.onmessage = function(message) {

//Parse the message.data string as JSON.
var event = JSON.parse(message.data);
console.log(event);

};

You should see RecentChange events fly by in your console.
That’s it!   The EventStreams documentation has in depth information and usage examples in other languages.
If you build something, please tell us, or add yourself to the Powered By EventStreams wiki page.  There are already some amazing uses there!
Andrew Otto, Senior Operations Engineer, Analytics
Wikimedia Foundation

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?