The Monthly Wikimedia Research Showcase is a public showcase of recent research by the Wikimedia Foundation’s Research Team and guest presenters from the academic community. The showcase is hosted at the Wikimedia Foundation every 3rd Wednesday of the month at 9:30 a.m. Pacific Time/18:30 p.m. CET.
ThemeThe Free Knowledge Ecosystem
Video: YouTube
The evolution of humanitarian mapping in OpenStreetMap (OSM) and how it affects map completeness and inequalities in OSMBy Benjamin Herfort, Heidelberg Institute for Geoinformation Technology
Mapping efforts of communities in OpenStreetMap (OSM) over the previous decade have created a unique global geographic database, which is accessible to all with no licensing costs. The collaborative maps of OSM have been used to support humanitarian efforts around the world as well as to fill important data gaps for implementing major development frameworks such as the Sustainable Development Goals (SDGs). Besides the well-examined Global North – Global South bias in OSM, the OSM data as of 2023 shows a much more spatially diverse spread pattern than previously considered, which was shaped by regional, socio-economic and demographic factors across several scales. Humanitarian mapping efforts of the previous decade have already made OSM more inclusive, contributing to diversify and expand the spatial footprint of the areas mapped. However, methods to quantify and account for the remaining biases in OSM’s coverage are needed so that researchers and practitioners will be able to draw the right conclusions, e .g. about progress towards the SDGs in cities.
Dataset reuseː Toward translating principles to practiceBy Laura Koesten, University of Vienna
The web provides access to millions of datasets. These data can have additional impact when used beyond the context for which they were originally created. But using a dataset beyond the context in which it originated remains challenging. Simply making data available does not mean it will be or can be easily used by others. At the same time, we have little empirical insight into what makes a dataset reusable and which of the existing guidelines and frameworks have an impact.In this talk, I will discuss our research on what makes data reusable in practice. This is informed by a synthesis of literature on the topic, our studies on how people evaluate and make sense of data, and a case study on datasets on GitHub. In the case study, we describe a corpus of more than 1.4 million data files from over 65,000 repositories. Building on reuse features from the literature, we use GitHub’s engagement metrics as proxies for dataset reuse and devise an initial model, using deep neural networks, to predict a dataset’s reusability. This demonstrates the practical gap between principles and actionable insights that might allow data publishers and tool designers to implement functionalities that facilitate reuse.