A shorter version of this post was published in the Biodiversity Heritage Library blog.



The Biodiversity Heritage Library (BHL) is an international consortium dedicated to making historical works in biodiversity available for free, for all. This major biodiversity infrastructure (with over 60 million scanned pages) has a mutualistic relationship with Wikimedia content through the hands of BHL staff and Wiki volunteers. This post tells a bit of the story, in particular how BHL got its first Wikimedian in Residence (WiR) and how those works relate to larger discussions on knowledge equity and resilience of GLAM institutions.

In November 2024, I had the immense pleasure of starting as a Wikimedian in Residence at BHL. I was hired to help organize the metadata, to connect beautiful public domain illustrations to the Wikimedia projects, and to support a growing community of volunteers and institutions who care deeply about biodiversity and free knowledge. The core task of the job was to support the BHL-Wiki working group, a forward-thinking collaboration between wikimedians and the Biodiversity Heritage Library.

First, a bit of background. The scope of this Wikimedian-in-Residence position was the result of community-driven recommendations from the Unifying Biodiversity Knowledge to Support Life on a Sustainable Planet white paper, written by JJ Dearborn. The effort was spearheaded by 4 key people: JJ Dearborn, Jake Orlowitz, Giovanna Fontenelle and Siobhan Leachmann, who outlined Structured Data on Commons as a key piece in this puzzle, drafted a scope of work, and started the selection process. I was lucky to be selected by the team and started working on this absolute dream job.



The first activity on the list was to organize BHL’s Wiki presence. As an open library with rich public domain content, Wikimedia volunteers had been using BHL content on Wiki long before any formal partnership. The Commons BHL page was created by Gaurav in 2011 with the “hopes to facilitate a partnership between the BHL and the Wikimedia Commons to their mutual benefit.” The original description said that in order to convince the BHL to commit scarce resources to this task, the group wanted to start by showing them the value of the Wikimedia Commons, as well as making a mailing list available to discuss BHL/Wikipedia collaborations. It is fair to say they were successful — even if 13 years in the future! The biggest image import occurred around 2014-2015, Fæ went through uploading most of the BHL images to Commons, using both Flickr and the BHL Application Programming Interface. This effort made 300.000+ BHL-derived files available for reuse on Commons. A great situation to start with!

Besides Commons, contributions were spread around Wikidata and multiple Wikipedias, so we created a beautiful multi-project page on Meta Wiki. The page now connects the dots on the connections between the two communities. In a subpage, I also started recording Status Updates for the Wikimedian-in-Residence work as it occurred, so anyone can go back and see the process as it occurred, step by step.

After organizing the MetaWiki page, I started to move towards the data side, developing a little game on Toolforge, BHL Arena, to explore the diversity of BHL content on-wiki. The game is about selecting the favourite between 2 random BHL images. The game also gathers information about what images people like the most, creating a prioritization mechanism for what to bring up to the Wikimedia platforms.

The BHL Arena app, try it out!

At the same time, there was a need to measure the impact of the BHL image collection in the Wikimedia ecosystem. There are many different tools that GLAMs use for tracking contributions, each one doing slightly different things. For BHL, I implemented a small dashboard leveraging the Commons Impact Metrics API (CIM API). The dashboard is also usable for all GLAM partnerships covered by the CIM API, and it is open for exploration. Through it, we saw that BHL images get around twenty million views on Wikimedia Projects every month. The BHL website itself gets around 1.5 million views per month. It is remarkable that by seeding content in the wikiverse, we can extend the reach of BHL by an order of magnitude! As an extra, in the image below, it is possible to see that there was a big spike in April 2025, with around 38 million views.

Monthly pageviews for BHL content on Wiki through the Commons Impact Metrics dashboard.

As much as I would like to say that the spike is a consequence of the Wikimedian-in-Residence work, it is not. The dashboard also enables us to explore a bit more and see which pages (and files) contribute to the counts, and the spike was due to news about the purported de-extinction of the dire wolf. People headed massively to the Wikipedia page for the species, which received alone 11M views in April. As the page includes several BHL Images representing the phylogeny of the wolf, global views for BHL content on Wiki reached an all-time peak. This little detour emphasises that through the dashboard, we can spot trends and anomalies and are able to investigate causes and points of improvement. We invite you to play with it, either for BHL or the available GLAM categories.

After that solid basis, my focus went on to the key tasks of this project:

1 – Consolidate a Metadata Model for BHL Images on Commons

2 – Create workflows for adding the information to Wikimedia Commons

3 – Update structured data for at least 5000 images

4 – Help to organize events promoting the reuse of these images

5 – Communicate about what was done

The working group had been working on the Metadata Model for some time, and it already covered many of the corner cases. In order not to get analysis paralysis, we started a versioning system for the model. The versioning was done on Google Spreadsheets at first (Minimum BHL Image Data Model – v 0.1.6), to keep it a bit more under control during the Wikimedian-in-Residence period. Now, the model is available on-wiki under https://commons.wikimedia.org/wiki/Commons:Biodiversity_Heritage_Library/Modeling, where it can be updated by the community in the future.

At the same time, we wrote a tutorial on how to use Open Refine to add SDC to BHL Images. It was a modification with BHL-specific notes on great documentation of the CommonsExtension (thank you, Sandra!). Open Refine is an outstanding way to batch-add information in a semi-automatic format, streamlining curation. User:Ambrosia10 has been using and refining this workflow intensively and has improved information for thousands of BHL Images in that way!

While I do think that Open Refine is magic, it does require a human in the loop. For very large-scale editing of the images, of the kind needed for the 300,000+ images in the BHL collection, we needed more automated pipelines. We started using WikibaseIntegrator as a backbone to update the Structured Data on Wikimedia Commons with higher throughput.

The design-build-test cycles and the gradual improvement of the code base were crucial for the success of the project, but the details might be a bit dry for this blog post. Suffice to say that the code parsed and integrated 3 different resources:

The Flickr API, which worked as a source of taxonomic names and artist identifications,

curated by volunteers over the past 10+ years

The BHL API provides the metadata about each work, such as the institutions that held and digitized the images and other pieces curated by BHL over the years. It also provided more taxonomic names, as an output of the Optical Character Recognition + Named Entity pipelines from BHL

And the GBIF API, which provided an automatic way to match scientific names from the past to their currently accepted synonyms

On top of all of that, I adapted a command-line-based curation workflow (using the package wdcuration) to turn information encoded as strings into things, i.e., the Wikidata IDs representing those institutions.

As work on the project continued, another important source of information became apparent – Commons itself. The categories on Wikimedia Commons, curated over the decades, present as a rich source for structured data, provided good heuristics are in place.

For the initial pilot, we focused on categories for botanical illustrations on Commons, which are extremely well curated. A bot script parsed these categories, looking for taxon names and adding them as images to Wikidata wherever an image was missing. Before the bot script, 15,869 BHL Images were used on Wikidata, and right after the run, 21,250 BHL Images were used (~5,400 more, a 34% increase). We used two properties, picking the most suitable one in each case: either image (P18) or reference illustration (P13162), depending on previous coverage. These images end up flowing to countless Wikipedia projects. For example, in the Portuguese-speaking Wikipedia, there were 959 BHL Images before the script was run. After it, there were 1449 (~500 more, a 51% increase). We also used it to add over 50,000 “depicts” statements on Commons to the botanical illustrations, getting much positive feedback and even a barnstar for the bot. This is a nice stimulus for future work, extending Category-to-Structured-Data inferences beyond just the botanical illustrations.

Using this mix of strategies, we reached the original project milestone (adding Structured Data to over 5,000 files on Commons) by the end of March, in time for the #1pic1bio events, organized by the Wikimedia Foundation in partnership with BHL. These events were dedicated to increasing the usage of BHL images on Wikipedia. The events occurred in Spanish on March 26, in French on March 28, and in Portuguese on April 2 and were hosted by me, Giovanna, Siobhan, and Lidia Ponce de la Vega. Each of the events took around 1h45 and brought different insights on the relation between Wikimedia communities and BHL, and led to meaningful interactions. As the events were related to equity, diversity, and inclusion, they were sponsored by the Wikimedia Foundation, but not the Smithsonian, to comply with the U.S. Federal orders.

Alongside the changes in the U.S. Federal policies, around that time, inklings of a big change in the governance model of BHL began to surface. A few months later, there was official confirmation that the Smithsonian Institution could no longer host the administrative functions of the Biodiversity Heritage Library. This decision creates new opportunities for BHL to look for a more international model of governance, but implies the beginning of a transition period, where future steps are not immediately known.

It was an interesting time to be a Wikimedian-in-Residence in the Biodiversity Heritage Library and to see all these changes firsthand. On one side, the landscape of transitions made it impossible to extend the contract for this work (what I would have loved, by the way, as the team and mission made this experience extraordinary). But on the other side, it also opens up avenues for a new future for the Library, perhaps more freely to explore the equity, diversity, and inclusion aspects, and re-signifying the biodiversity heritage treasure.

In some ways, having the Wikimedia community working on BHL content is more important than ever. The images, in particular, are beautiful and inspiring, and connect us to a wider meaning, both to our past as a society and as a little dot on the tree of life. One joy of being a Wikimedian-in-Residence with such a vast and open collection is that, now that the contract is over, I am highly trained to keep doing meaningful work with the BHL collection as a volunteer. And crucially, to help my fellow volunteers keep doing meaningful work too!

In that sense, I believe that (besides adding 5-star linked open structured data for 18,000+ files), there are four things made during this residence that are particularly meaningful for the future:

The data model, as it incorporates the BHL-Wiki discussions into a structured data system and can be used as a reference in the future. The code+workflow to import SDC from the BHL, Flickr, and GBIF APIs, which needs some technical expertise to run, but can work semi-automatically. It may even become a bot in the future. The code+workflow to add depicts statements from categories for taxonomic illustrations, which may be safe to do beyond just botanical illustrations, and The BHL Image Explorer tool, to navigate the collection and enrich Wikimedia projects.

The BHL Image Explorer main interface.

I left the explorer for last, so you can open a new tab, close this one, and go explore it: https://bhl-gallery.toolforge.org. This little tool was developed for the #1Pic1Bio events, and its functionalities matured through user feedback. Using it, you can navigate the images from the Biodiversity Heritage Library filtering by taxa and location, and see where these images could be used on Wikipedia pages (currently supporting English, French, Spanish, and Portuguese Wikipedias).

Some tech details on the tool include:

Wikidata autocomplete: the taxon search uses Wikidata as a backend for the taxon selection. This means searching for common names (e.g. Baobab instead of Adansonia digitata). Only candidates with GBIF identifiers are shown, as these IDs are needed for the taxonomic processing.

Click-based navigation: The tool also has a clickable taxonomic hierarchy box, enabling users to navigate by just clicking around. The species names are also clickable, redirecting the user to the BHL Explorer page for the species.

Distribution map: A map of GBIF Occurrences is displayed next to the taxonomy, giving some visual feedback on where a taxon is expected to occur. This may help users to know better if a species is only present in South America or exclusive to South America, for example.

Wikimedia reuse counter: Image boxes now show a Wikimedia reuse counter for each of the images, giving an idea of their impact within the Wikimedia ecosystem. Most images have 0 global Wikimedia uses – a lot of opportunities for volunteersǃ ː)

If you want to use the explorer and run into any issues or want some guidance, just let me know! In these 6 months, I have grown a passion for the BHL Image collection, and I really want to see it everywhere (e.g. every time I open a new tab). In an age of cheap, AI-generated illustrations, there is something grand in seeing these extremely detailed, sweat-and-blood pieces of art representing Nature in extreme detail. These pictures tell a thousand tales of the biodiversity-loving nature of humankind, and in that way, bridge old times with the pressing matters of the biodiversity crisis we are facing.

The first step for taking care of something is to see, acknowledge, and admire. That is what iNaturalist does for the biodiversity around us today. And that is what the Biodiversity Heritage Library collection does for admiring biodiversity through human history.

So what now? Well, there are plenty of fun, impactful things for biodiversity heritage aficionados on the Wikimedia ecosystem. Some of those are listed on the BHL Meta-Wiki page, others are out there to be invented. I believe that, more than the tangible results of this residency, we managed to create new, meaningful ways for the Wiki community to flourish through the Biodiversity Heritage Library content. Hopefully, our work on these beautiful illustrations from the past will plant a seed towards a richer linked-open-data future for Wikimedia Commons.

Have fun!

