Upload-by-URL for test.wikipedia.org

Since we increased the upload filesize limit to 100MB on the main wikis a few months ago it’s been easier to upload large images and medium-size video clips, but there’s always something that’s just a leeeeetle over the limit… MediaWiki’s upload form does have an option for pulling a file from an external web site, which wouldn’t be restricted to the HTTP post limits in the Squid->Apache->PHP system.
We hadn’t been able to deploy it initially on Wikimedia sites because the web servers are walled off and don’t have direct access to the internet; further we were worried about safety given security reports about how the CURL library can follow malicious redirects to local filesystem resources.
On investigation, Tim found that CURL is safe in the default case — you need to explicitly enable redirect following to be exploited, which we don’t. We also have an HTTP proxy which our internal servers can use to reach outside files… I’ve made some tweaks to Special:Upload to support the proxy setting, and it’s now enabled on test.wikipedia.org:
Image (1) upload-url-form.png for post 3708
My very first URL-uploaded file was a screenshot from one of my blog posts, Spiffy!
The default configuration limits URL uploads to sysops, so for now you’ll need to be a sysop on Test Wikipedia to try it out. If everything seems fairly problem-free we’ll start rolling this out a bit more widely for Commons and other sites.
The upload-by-URL functionality is also needed for future-facing work Michael Dale is working on to allow an on-wiki media picker to fetch freely-licensed files from Flickr, Archive.org, and other places.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Inline Feedbacks
View all comments

Is the URL saved and displayed after the upload? Thad would be a nice feature, as it simplifies the search for the source if no source is explicitly specified. It may be even possible to automatically fill out parts of the Information template…

Yay! That’s exciting news…

@Church of emacs —
No, currently the source URL isn’t retained as metadata. Would be a good thing to add, especially if we open it to general usage!

Instead of wasting server space with duplicate data, why not just transclude the image from the URL onto Wikipedia.

Simply transcluding the remote file might sound nice in a web-centric way, but there are a few big problems: * We want to be able to redistribute files used in our articles including for offline use; if we don’t have them, that’s a lot harder. * The source site might not be able to handle the level of hotlinking traffic, causing performance problems or cost to the site owner. (Some sites disallow hotlinking outright.) * The file might be changed without warning by the owner of the remote site; we need to retain a change history. * The file might… Read more »

Would this allow multiple uploads? Instead of clicking upload many times it would be nice having this URL images fetcher have a queue upload feature which would upload the images after each other and report to the user after uploading them all.

I’m afraid it would make the problem of uploading copyrighted pictures from news sources and blogs worse, whough…

With the source URL recorded as metadata, it would probably actually help in identifying those — if people are more likely to plop in a URL instead of saving and uploading, we know the source right off.

So, will the limit still be 100MB? Or is it raised if this method is used?

Currently the experimental limit for upload-by-URL is 500MB … but I think it has to download in 30 seconds. 🙂 Needs more work.

Having this feature available to the ordinary user on commons will of course make it much more likely that people move files from their respective wikis to commons, which is imho good & desirable.