This post by Emile Aben is cross-posted from RIPE Labs, a blog maintained by the Réseaux IP Européens Network Coordination Centre (RIPE NCC). In addition to being the Regional Internet Registry for Europe, the Middle East and parts of Central Asia, the RIPE NCC also operates RIPE Atlas, a global measurement network that collects data on Internet connectivity and reachability to assess the state of the Internet in real time. Wikimedia engineer Faidon Liambotis recently collaborated with the RIPE NCC on a project to measure the delivery of Wikimedia sites to users in Asia and elsewhere using our current infrastructure. Together, they identified ways to decrease latency and improve performance for users around the world.
During RIPE 67, Faidon Liambotis (Principal Operations Engineer at the Wikimedia Foundation) and I got into a hallway conversation. Long story short: We figured we could do something with RIPE Atlas to decrease latency for users visiting Wikipedia and other Wikimedia sites.
At that time, Wikimedia had two locations active (Ashburn and Amsterdam), and was preparing a third (San Francisco), to better serve users in Oceania, South Asia, and US/Canada west coast regions. We were wondering about the effects on network latency for users world-wide for this third location and Wikimedia wanted to quantify the effect turning up this location would have.
Wikimedia runs their own Content Delivery Network (CDN), mostly for privacy & cost reasons. Like most CDNs, to geographically balance the traffic to their various points of presence (PoPs), they employ a technique called GeoDNS: a user will, based on the DNS request that is made on their behalf from their DNS resolver, be specifically directed to one of the data centers based on their or their resolver’s IP address. This requires the authoritative DNS servers for Wikimedia sites to know where to best direct the user to. Wikimedia uses gdnsd for authoritative DNS to dynamically respond to those queries based on a region-to-datacenter map.
Some call this ‘stupid DNS tricks‘, others find it useful to decrease latency towards websites. Wikimedia is in the latter group, and we used RIPE Atlas to see how this method performs.
One specific question we wanted answered is where to “split Asia” between the San Francisco and the Amsterdam Wikimedia location. Latency is obviously a function of physical distance, but also the choice of upstream networks. As an example, these choices determine if packets to “other side of the world” destinations tend to be routed clockwise or counter-clockwise.
We scheduled latency measurements from all RIPE Atlas probes towards the three Wikimedia locations we wanted to look at, and visualised what datacenter showed the lowest latency for each probe. You can see the results in Figure 1 below.
This latency map shows the locations of RIPE Atlas probes, coloured by what Wikimedia data center has the lowest latency measured from that probe:
- Orange: the Amsterdam PoP has the lowest latency
- Green: the Ashburn PoP has the lowest latency
- Blue: the San Francisco PoP has the lowest latency.
Probes where the lowest latency is over 150ms have a red outline. An interactive version of this map is available here. Note that this is a prototype to show the potential of this approach, so it is a little rough around the edges.
Probes located in India clearly have lower latency towards Amsterdam. Probes in China, South Korea, the Philippines, Malaysia and Singapore showed lower latency towards San Francisco. For other locations in South-East Asia the situation was less clear, but that is also useful information to have, because it shows that directing users to either the Amsterdam or the San Francisco data center seems equally good (or bad). It is also interesting to note that all of Russia, including the two most eastern probes in Vladivostok have lowest latency towards Amsterdam. For the Vladivostok probes Amsterdam and San Francisco are almost the same distance, give or take 100 km. Nearby probes in China, South Korea and Japan have lowest latency towards San Francisco.
There is always the question of drawing conclusions based on a low number of samples, and how representative RIPE Atlas probe hosts are for a larger population. Having some data is better then no data in these cases though, and if a region has a low number of probes that can always be fixed by deploying more probes there. If you live in an underrepresented region you can apply for a probe and make this better.
With this measurement data to back it, Wikimedia has gradually turned up Oceania, South Asian countries and US/Canada states where RIPE Atlas measurements showed minimal latency to, to be served by their San Francisco caching PoP. The geo-config that Wikimedia is running on, is publicly available here.
As for the code that created the measurements and created the latency map: This was all prototype-quality code at best, so I originally planned to find a second site where we could do this, so to see if we could generalise scripts and visualisation and then share.
At RIPE 68 there was interest in even this raw prototype code for doing things with data centers, latency and RIPE Atlas, so we ended up sharing this code privately, and have heard of progress made on that already. In the meantime we’ve put up the code that created the latency map on github. Again: it’s a prototype, but if you can’t wait for a better version, please feel free to use and improve it.
If you have an interesting idea, and have no time, or other things are stopping you from implementing it, please let us know! You can always chat with us at a RIPE meeting, regional meeting, via email or any other channels. We don’t have infinite time, but we can definitely try out things, especially ideas that will improve the Internet and/or improve the life of network operators.