OpenRefine 3.9: New Features and Improvements for Wikimedians

Translate this post

OpenRefine, a widely used open-source tool for data cleaning and transformation, recently released version 3.9, bringing several improvements and new features that will be especially useful for software users, with a particular focus on Wikimedians.

Part of this work was carried out under the Commons:OpenRefine/Training 2023-24 grant with the support of Wikimedia Sweden and the involvement of a volunteer contributor.

1. OpenRefine Enhancements for Wikimedia Integration

Support for Large Media Files

Users can now upload media files over 100MB to Wikimedia Commons and other Wikibases, expanding the possibilities for integration and use of multimedia data.

Simplified File Import

Files can now be imported into OpenRefine by simply dragging and dropping them into the interface, making the process more intuitive and efficient.

Improved Multiple File Import

The import of multiple files has been enhanced with alphabetical and chronological sorting options, facilitating the organization and analysis of datasets composed of various files.

Wikibase Integration

The Wikibase editing operation now reports results directly in the grid, making it easier to track the changes made.

More Precise Editing Error Messages in Wikibase

Error messages during editing operations in Wikibase have been improved to provide more detailed and useful information. This makes it easier to identify and correct issues during the editing process.

Uploading New Versions of Media Files

It is now possible to replace previously uploaded media files with updated versions, making it easier to maintain and update multimedia content on Wikimedia Commons.

New Appearance for Wikibase Notices

The Wikibase notice interface has been redesigned to provide a more intuitive and visually appealing experience. This update makes it easier to identify and understand notices during editing operations.

Description Length Limitation

Descriptions for Wikibase now have a 250-character limit to prevent errors from being returned.

Prevention of Identical Facet Creation in Reconciliation and Editing Operations

A logic has been implemented to prevent the creation of identical facets during reconciliation and editing operations in Wikibase. This improvement avoids redundancies and maintains data integrity.

Display of Property Names in Wikibase Issue Reports

Wikibase issue reports now include the names of the properties involved, allowing for a more detailed and efficient analysis of identified problems.

QuickStatements Export

The export functionality for QuickStatements has been fixed to resolve persistent issues affecting the correct export of data. With this fix, users can now reliably export statements to QuickStatements, facilitating data integration and updates in Wikidata.

Additional Fixes and Improvements

In addition to the mentioned improvements, version 3.9 also addressed various issues and enhanced overall usability, including:

  • Fixing button label swaps in the Wikibase schema editor.
  • Correcting inconsistent placement of the “Extract/Apply” buttons.
  • Improved handling of the “badtags” error in Wikibase.
  • Ensuring thread safety in date parsing within Wikibase.
  • Fixing inappropriate rate limiting when editing Wikimedia Commons.

2. General improvements to OpenRefine

Clustering Enhancements

The clustering functionality has received significant improvements:

  • Candidate Exclusion: Users can now exclude candidates from a cluster during the grouping process, offering greater control over the results.
  • Custom Distances and Keyers: Support for user-defined distances and keyers has been introduced, allowing for more precise customization of the clustering process.

History Deletion Warnings

To prevent accidental data loss, a warning dialog has been added to alert users before deleting entries in the history.

New GREL Functions

The GREL expression language has been expanded with new functions:

  • levensthteinDistance: Calculates the edit distance between two strings, useful for identifying similarities and differences in text.
  • zip: Allows combining two lists into a list of pairs, facilitating operations that require handling multiple lists simultaneously.

Date Support in Jython Expressions

OpenRefine now supports date manipulation in Jython expressions, expanding the possibilities for transforming and analyzing temporal data.


For a comprehensive list of updates and fixes in this version, please visit the official release page on GitHub: OpenRefine 3.9. The webinar about these features is also available on Wikimedia Commons, see here.If you plan to use OpenRefine to upload media to Wikimedia Commons, we recommend reading the article “Learn how to edit and upload to Wikimedia Commons with OpenRefine,” which details how OpenRefine can be used to edit and upload files to Wikimedia Commons. You can also visit the official website and explore the pages on Meta-Wiki, Wikimedia Commons, or Wikidata. Additionally, feel free to share your ideas on the OpenRefine community forum or join the Telegram group.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?