Opportunities to improve integration between Wikisource and Wikidata

This blog post is the second of a two-part series that talk about integration between Wikidata and Wikisource. While the first part is about an initiative by the Wikimedia community that helps to make use of the data on Wikidata, in Wikisource, this part will focus on possible opportunities and applications in this space for further work.

Bibliographic metadata is used on Wikisource in various forms and processes, including but not limited to, verifying the copyright status, on index pages, for categorization etc. However, most of this data currently entered manually to Wikisource, in the majority of the language editions.

Though this is not a problem, it not the ideal situation. For instance, if there is a mistake in the data, updating data on either Wikidata or Wikisource, will not fix the mistake on the other platform, unless it is updated as well. At this point, Wikidata can be leveraged to its best, and one of the main reasons for its creation, to centrally curate all data on various Wikimedia projects. In the previous post, we spoke about modules that help index pages on Wikisource to retrieve respective bibliographic data from Wikidata.

Here are a few ideas for opportunities and applications where further integration between Wikidata and Wikisource can take place. Please note that this is not an exhaustive list of all possibilities, but only to be indicative for further work in this area.

  • Web application to create and add Wikidata QIDs to index pages: The above-mentioned modules only work if the Wikidata QIDs are added to the index pages.  While a bot can scan the main pages, retreieve respective Wikidata items and add them to the index pages, its work is limited to only existing Wikidata items and only if they are linked to respective main pages on Wikisource. To overcome this limitation, a tool can be helpful for contributors to semi-automate the process. Roughly, the workflow of the tool will be:
    • Extract all index pages from a Wikisource, that are not linked to Wikidata.
    • Show a list of possible matches based on a string-based search on Wikidata for each of them.
    • If there is a match, users selects of the items. The QID is then added to the respective index page Wikisource and the index page link as value to the index page property to the Wikidata item.
    • If there aren’t any matches, the users will be able to create Wikidata items with the help of predefined fields, similar to a cradle form, and link them.
  • Editing Wikidata items from index pages on Wikisource: Though the current modules display data retrieved from Wikidata on index pages, edits to the data will still have to be made on Wikidata itself, to reflect here. This may turn away newcomers contributing to Wikisource. A possibility here could be to able to provide the ability to edit/add data to Wikidata with the help of a popup or a page overlay so that users will not have to leave the site.
  • Header template on the Main Page of work: The header template on the main pages present the title, year of publication and author(s). While these are currently manually added strings + wikilinks, they can also be rendered with the data retrieved from Wikidata.
An example header; Popular Science Monthly Volume 31 May 1887
  • License template on Wikisource: It is customary and generally considered a good practice to add license information to the main page of a work under which the source text is released/uploaded to Wikimedia Commons. While this information is not exactly available on Wikidata, it is available through Structured Data on Commons. Several bots on Wikimedia Commons have been working to add structured licensing and authorship data to all the files. With the help of this data, respective templates can be displayed on the main page of a work.
  • Author pages: The author pages on Wikisource can also be auto-generated based on the author’s respective Wikidata item, by generating a list of works created/written by the author.

As already mentioned, these are a few ideas of many possibilities in this area. Another crucial step in the development is to enable these functionalities as MediaWiki extensions that communities can easily enable on their Wikis. Currently, the Lua modules need to be individually deployed on each wiki with necessary modifications, and it can be problematic for communities lacking technical support. The biggest benefit of this work would be fewer discrepancies in data about a bibliographic work across Wikimedia projects, and errors could easily be fixed by editing Wikidata items, and it reflects all around.


WikiCite wordmark

WikiCite is a Wikimedia initiative to develop open citations and linked bibliographic data to serve free knowledge. Initially a series of conferences and workshops in support of that goal, WikiCite is now a community of people and an ecosystem of projects which focuses on source metadata leveraging the Wikidata platform—a free and open knowledge base that can be read and edited by both humans and machines, and central storage for the structured data of its Wikimedia sister projects.