Wikimedia Commons has more than 17 million freely licensed media files available for anyone’s use in practically any way, so long as the terms of the license are upheld. The ever growing database speaks to the passion that Wikimedians have for sharing free media to the benefit of anyone around the world.
Making sense of this massive amount of uploaded images, video and audio files, however, hasn’t always been so easy or straightforward. One of the challenges, as Commons contributor Brian Wolff can attest, is that metadata isn’t well integrated into the database, which is the standard with most image databases. On-wiki descriptions generally aren’t included in the file’s internal metadata, which can result in loss of important information when the file is reused outside of Wikimedia sites. Additionally the data that is in the file’s internal metadata is ignored by search, and cannot be programmatically used inside the wiki.
When a media file is uploaded to Commons today, a table is added to the file page with some information like EXIF values for aperture and shutter speed, for example, but Wolff believes a lot more can be done with the metadata. In the long term, he expects people will use Wikimedia Commons search to find files only with a certain license, taken with a certain camera, or on a certain date, etc.
“Editing file metadata in our current setup is a not happening as well as it should at the moment, which is sad,” Wolff said. Currently people need to download the file, edit the metadata, and re-upload the file, if they want the metadata to be included in the file. “But one approach that’s been suggested is just to use one of the existing program like ExifTool and put an interface on top of it in MediaWiki.”
Wolff thinks this shouldn’t be hard to accomplish, as he doesn’t see any major technical challenges that couldn’t be surmounted. This would let the updated metadata stick with the file, instead of just being on a wiki page. Since files rarely stay within the wiki, if the metadata is not included inside the file, it gets lost as soon as the file leaves the website.
Wolff first became interested in Commons during a Google’s Summer of Code internship with the Wikimedia Foundation in 2010. He had been a regular Wikinewsie since 2004, but he said he it had been difficult to make the jump from Wikinews contributor to MediaWiki developer.
“Google’s Summer of Code 2010 seemed like a good opportunity to actually become a member of the MediaWiki community, so to speak.” he said. “Before that, I was kind of a bit on, well not the outskirts, but I was very much a newbie and I was kind of stumbling around. It served as an opportunity to become integrated in the community.”
Wolff’s computer skills have grown from the days of his first Winnie-the-Pooh computer game his parents gave him to learn to read. In May, he completed an undergraduate degree in computer science, with a math minor, at Dalhousie University in Halifax, Nova Scotia. This summer, he’s going to be back at the Wikimedia Foundation working on Commons. Wolff will explore a number of issues that should make the database more useful and user friendly.
This summer, he will be working on image patrolling features for Commons, among other capacities. This will allow admins to check new uploads as they come in, in an organized fashion. Normal pages can be patrolled using new page patrol or the more recent page curation tools, but no equivalent tools exist for patrolling images. This is important, as many of the pictures received by commons aren’t appropriate for a database of educational material that can be freely used by everyone. Many people just upload pictures of themselves, which is acceptable if that person happens to be famous, or even if that person is an editor and plans to use the photo on their own user page, but for the average internet user, that’s not the type of content accepted on Commons. Additionally, many people try to upload files directly copied from commercial websites, which is generally not allowed, since these files are usually owned by somebody else.
Beyond this, Wolff said he will explore a number of other areas that, taken separately, might seem trivial. “Personally, I think there’s a lot of kind of little things that each individually don’t matter much but, combined, would be really useful,” he said.
According to Wolff, this can include things like making upload log entries include a hash of the image so that tool makers can more easily associate log entries with actual images, or allowing people to specify what page number is displayed when putting a pdf file in an image gallery. Some other projects he’s working on:
- Experimenting with a different image gallery layout that is used on category listings as well as by users with the gallery tag, a project he notes is still very experimental and may change significantly.
- Another thing he may consider trialling: an optional gallery mode on category listings for the subcategory section, where each subcategory gets a representative image from that category, instead of just showing a textual link.
If you have any questions or comments for Brian, you can reach him on irc.freenode.net at #mediawiki and #wikimedia-commons or at his talk page.
Profile by Donna Peterson, Communications Volunteer
Interview by Matthew Roth, Global Communications Manager