In preparation for enabling HTTPS on Wikimedia Foundation sites, we’ve recently enabled protocol relative URLs on test.wikipedia.org. Protocol relative URLs are needed to make the site work properly in both HTTP and HTTPS modes.
What are protocol relative URLs?
Normal URLs look like: http://test.wikipedia.org/wiki/Main_Page or https://test.wikipedia.org/wiki/Main_Page. Both of these URLs define the protocol that will be used. Protocol relative URLs look like this: //test.wikipedia.org/wiki/Main_Page. Dropping the protocol from the URL allows the browser to assign the current protocol to the URL. So, if you are visiting the site in HTTPS mode, links will point to HTTPS, and if you are visiting the site in HTTP mode, links will point to HTTP.
Why are protocol relative URLs needed?
We need to use protocol relative URLs for a couple reasons:
- All requests are served by our caching layer (squid or varnish). If you are browsing the site in HTTPS mode, and another user is browsing the same pages in HTTP mode, two versions of those pages will be stored in our cache, as the links are different between the two modes. This splits our cache, which makes it less efficient and more expensive to operate.
- When browsing in HTTPS mode, we want to ensure links point to the correct protocol. When pages are parsed, things like interwiki links are created by the parser. If we do not use protocol relative URLs, then links will point to either HTTPS or HTTP, which will cause users to switch modes randomly.
How does this affect me?
It shouldn’t. Things should continue to work as before. We are currently testing this out on some internal wikis, and have enabled it on test.wikipedia.org so that the entire community will have a couple weeks to test it out before we enable it on all projects.
API users, especially, should test thoroughly. The API, in most cases, will not output protocol-relative URLs, but will continue to output http:// URLs no matter whether you call it over HTTP or HTTPS. This is because we don’t expect API clients to be able to resolve protocol relative URLs correctly, and that the context of these URLs (which is needed to resolve them) will frequently get lost along the way.
The exceptions to this are:
- HTML produced by the parser will have protocol-relative URLs in <a href=”…”> tags etc.
- prop=extlinks and list=exturlusage will output URLs verbatim as they appear in the article, which means they may output protocol-relative URLs
If you are getting protocol-relative URLs in some other place in the API, that’s likely a bug.
If you notice any issues related to protocol relative URLs, in the API or not, please let us know.
Note: we’ve also enabled HTTPS on test.wikipedia.org; so, please do test protocol relative URLs in HTTP and HTTPS modes. There is at least one known bug with regards to HTTPS mode and redirects, which will be fixed soon. More to come on this in a later post.