Learn how to edit and upload to Wikimedia Commons with OpenRefine

Translate this post
Illustration of a teacher in front of a class of students, behind her at the blackboard, the OpenRefine and Commons logos
Illustration for OpenRefine-Wikimedia Train-the-Trainer course (Sandra Fauconnier, CC0, via Wikimedia Commons)

“OpenRefine is a free data wrangling tool that can be used to process, manipulate and clean tabular (spreadsheet) data and connect it with knowledge bases (“spreadsheets on steroids” / “a swiss army knife for data”).”

OpenRefine has been one of the major tools for batch uploads, especially for Wikidata, and an important way to keep our movement (especially the Culture and Heritage part) sustainable. For the past few years, we invested in OpenRefine to increase its Wikimedia reach, especially by adding functionalities related to images and structured data on Wikimedia Commons. 

This support took the form of two funded projects:

The first funded project added Structured Data on Commons functionalities in OpenRefine, finally making it possible to edit structured data and wikitext on Wikimedia Commons using a single tool and upload images with SDC. 

Structured Data on Commons (or SDC) brings linked open data from Wikidata to the media on Wikimedia Commons, making the images more accessible, findable, and machine-readable. Therefore, it is an important addition to the open data ecosystem and imperative for cultural and heritage institutions that want to share their images and truly take part in the Open GLAM space – the global network that participates in the sharing of cultural heritage.

After the development of the Wikimedia Commons extension for OpenRefine, it became clear that it would be important to support the adoption of the new functionalities through outreach and training. OpenRefine is not an easy tool, with a low barrier to entry. Rather, it is just like a Swiss army knife, it serves several purposes and, therefore, it has many particularities and tricks. More than that, Wikidata and Structured Data on Commons are also very complex environments, with their own set of rules, including some that are still being decided – this is especially the case for SDC. 

Screenshot of an OpenRefine project on a web browser, with several lines of metadata and the illustration of a plant
A typical Wikimedia Commons project in OpenRefine (Sandra Fauconnier, CC BY 4.0, via Wikimedia Commons)

To help potential users of OpenRefine overcome the high barrier to entry, we invested in training: one training for the interested user and contributor who could be starting from scratch and another to prepare advanced OpenRefine users to deliver training, courses, and workshops to others, thereby supporting wider adoption of the tool. 

#1 Train-the-Trainer course (advanced):

The Train-the-Trainer program 2023-24 was planned, designed, and delivered by Sandra Fauconnier. As the main coordinator of the previous projects, Sandra was the best-positioned person to prepare and coordinate a group of advanced users of OpenRefine. (Note: At the end of this post, you can find a quick interview with Sandra to learn more about her process for accomplishing all this work.)

At first, the course was planned for 8 people only, however, with the great demand of submissions and the necessity of having OpenRefine trainers covering different groups and languages, it was decided that 16 trainers would be accepted. 

The training took 6 months, between November 2023 and April 2024 and it was a self-paced course. Participants had seven categories to learn from and those were divided into several activities. For example, in one of the activities from the first category, 01 General OpenRefine onboarding 💎, participants were asked to take the Library Carpentry: OpenRefine course, if they hadn’t already, to make sure they understood OpenRefine for Wikidata.

Categories: 

  • 01 General OpenRefine onboarding 💎
  • 02 Wikidata editing with OpenRefine 📄
  • 03 Wikimedia Commons editing with OpenRefine 🏞️
  • 04 Specialized OpenRefine and Wikimedia Commons tasks 🧠
  • 05 Interaction with OpenRefine ecosystem; help other people 👥
  • 06 Create and improve training and documentation materials 🎓
  • 07 Teach own OpenRefine-SDC training + related support tasks 🧑‍🏫

The complete curriculum for this course is available here

In addition to the planned activities, the group met once a month – across two meetings in order to accommodate time zones. These meetings were a place to talk about doubts and difficulties, as well as to demonstrate the work that had been accomplished. Ad hoc communication took place on a Discord server.

So far, 11 people (including me!) completed the necessary tasks to qualify as a certified trainer:

  • Ada Jakubowska (Wikimedia Poland)
  • Bart Magnus (meemoo, Belgium)
  • Carla Toro Fernández (Wikimedia Chile)
  • Tamsin Braisher (Wikimedia Aotearoa New Zealand)
  • Giovanna Fontenelle (Culture & Heritage team, Wikimedia Foundation)
  • Jinoy Tom Jacob (Wikimedians of Kerala User Group)
  • Max Kristen (Kristbaum, Germany)
  • Lucas Nascimento Belo (Wiki Movimento Brasil)
  • Rute Correia (Wikimedia Portugal)
  • Sara Thomas (Wikimedia UK)
  • Will Kent (Wiki Education Foundation)

Certified OpenRefine trainers will be listed on this category (via a userbox) and on this page, where all the information related to OpenRefine and trainings will be available. And even if you have not followed the Train-the-Trainer course, but are delivering trainings related to OpenRefine, please feel free and encouraged to list yourself and to place the userbox on your user page. That way, other people can find and contact you.

Screenshot of Meta-Wiki with "Template:User OpenRefine Trainer", a blue diamond, and the phrase "This user teaches OpenRefine"
Screenshot of the OpenRefine Trainer template for the userbox used by certified trainers

#2 WikiLearn course (basic): 

Sandra Fauconnier also developed OpenRefine for Wikimedia Commons: the basics, an online course available at any time, for free, on the WikiLearn platform. Anyone with a Wikimedia account can enroll and follow it at their own pace, with computer-graded exercises. A certificate is awarded at the end to those who complete the course.

OpenRefine Wikimedia Commons course on WikiLearn (Sandra Fauconnier, CC BY 4.0, via Wikimedia Commons)

As this is an introductory course and not only talks about OpenRefine, but also about Wikimedia Commons, the training is suitable for Wikimedians, Wikimedia affiliate staff, and partners, like GLAM staff or Wikimedians in Residence. Completing this training should take an average of 6 to 8 hours.

The course is divided into five parts. Here’s the complete outline:

0 – Getting started
– Welcome, and what we already expect you to know
– Self-assessment: Are you familiar with the basics needed for this course?
– Self-assessment feedback
– Course introduction
– Installing and running OpenRefine for Wikimedia Commons

1 – Wikimedia Commons and structured data: a refresher
– Wikimedia Commons, a repository of free media files
– File pages, wikitext, structured data
– OpenRefine and Wikimedia Commons
– Self-assessment: The foundations of Wikimedia Commons

2 – Uploading files to Wikimedia Commons with OpenRefine
– Checklist of things to know and do beforehand
– Preparing your data for upload
– Readying the data in an OpenRefine project
– The upload process
– Correcting mistakes
– Self-assessment: The OpenRefine upload process

3 – Editing existing files on Wikimedia Commons
– Preparations
– Readying the data in an OpenRefine project
– The editing process
– Correcting mistakes
– Self-assessment: The OpenRefine editing process

4 – Wrapping up, Completed section
– Advanced tips and tricks, and congratulations!

Throughout the course, the participants find not only written detailed guidelines but also subtitled videos. These short demos are also available on Wikimedia Commons in this category: Category:WikiLearn: OpenRefine for Wikimedia Commons: the basics.

The course is already available in three languages, other than English, with more to come:

Gray hexagon with dark blue border, "COMPLETION OF COURSE" in light blue. In white, a diamond, "OpenRefine" and a Commons logo
Badge for completing WikiLearn course OpenRefine and Wikimedia Commons: the basics (Sandra Fauconnier, CC0, via Wikimedia Commons)

If you are interested in translating the course into your language, you can check the status of the current translations here and learn about the translation process in this detailed video by Asaf Bartov:

Video introducing the course content translation feature on WikiLearn and how to contribute translations via Meta (Asaf (WMF), CC BY-SA 4.0, via Wikimedia Commons)

Mini interview with Sandra Fauconnier, OpenRefine’s Project Director:

  1. Why was it important to you to have OpenRefine trainings?

“I have seen that most people (including me!) need one or more introductions or tutorials before they are able to confidently and autonomously use OpenRefine. To make OpenRefine’s Wikimedia Commons features and extension as useful as possible for as many people as possible, we need active trainers and helpers around the world, and a variety of learning resources that accommodate diverse learning styles. I hope that this project has built the foundations for that, and I hope that it will help more Wikimedians and GLAMs contribute to Wikimedia Commons using SDC!”

  1. How did you organize yourself to write the WikiLearn course and coordinate the TTT course?

“For both courses I started with outlines, asking myself three questions: “If I would take this course, what would I like to learn?”; “What should every teacher or OpenRefine-Commons user know?”; and “What do I typically see people struggle with, and what therefore needs emphasis and extra explanation?”. After a while, in both courses, the program and details were adjusted based on input and requests I got from the participants and (in the case of the WikiLearn course) from a group of beta testers. This was extremely valuable!”

  1. And, finally, what do you still like to see developed or accomplished on OpenRefine or Wikimedia Commons?

“I am a very firm believer in the potential of Wikimedia Commons as a knowledge platform in itself (beyond the narrow encyclopedic scope of Wikipedia) and would like to see the Wikimedia movement work on more ways to make more impact with media, not just in long-form encyclopedic text with small illustrations. Coming from a GLAM background myself, and having worked on media databases for over 20 years, I think structured data on Commons is a huge step forward to make this possible. I would really like to see us invest in making it shine.”

If you want to learn more about OpenRefine, you can check its official website and its pages on Meta-Wiki, Wikimedia Commons, or Wikidata. You can also leave comments on OpenRefine’s community forum or join the Telegram group.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?