How we know what we know: The Initiative for Open Citations (I4OC) helps unlock millions of connections between scholarly research

Translate this post

Like a submarine far below the surface sends intelligence to stations on land, a web of scholarly citations underlies and connects our world of knowledge. Photo by Lt. Ed Early/US Navy, public domain/CC0.

Citations are the backbone of scholarly knowledge. They help researchers verify information, build on the existing knowledge we already know, and generate opportunity for new discoveries.
Citations are not only relevant to academia. They are the foundation for how we know what we know.
Until recently, the idea of creating a freely accessible repository of open citation data—i.e. data representing how scholarly works cite each other—has been hampered by restrictive and inconsistent licenses and by the lack of machine-readable reference data.
Today, we are proud to announce a key milestone toward unlocking the potential for open citation data.


The Wikimedia Foundation, in collaboration with 29 publishers and a network of organizations, including the Public Library of Science (PLOS), the Internet Archive, Mozilla, the Bill & Melinda Gates Foundation, the Wellcome Trust, and many others, announced the Initiative for Open Citations (I4OC), which aims to make citation data freely available for anyone to access.
Scholarly publishers deposit the bibliographic record and raw metadata for their publications to Crossref. Thanks to a growing list of publishers participating in I4OC, reference metadata for nearly 15 million scholarly papers in Crossref’s database will become available to the public without copyright restriction.1 This data includes bibliographic information (like the title of a paper, its author(s), and publication date), machine readable identifiers like DOIs (Digital Object Identifier, a common way to identify scholarly works), as well as data on how papers reference one another. It will help draw connections within scientific research, find and surface relevant information, and enrich knowledge in places like Wikipedia and Wikidata.
Unlike scholarly articles, citation data are not subject to copyright in the same way that articles themselves may be. Citation data typically rest in the public domain — free for anyone to access. Until recently, however, much of the citation data in the scientific research world has been difficult to find, surface, and access. “It is a scandal,” wrote David Shotton in Nature in 2013, “that reference lists from journal articles—core elements of scholarly communication that permit the attribution of credit and integrate our independent research endeavours—are not readily and freely available.”
Before the I4OC started, publishers releasing references in the open accounted for just 1% of the publications registered with Crossref.  As of the launch of the I4OC initiative, more than 40% of this data has become freely available.

As of March 2017, the fraction of publications with open references has grown from 1% to more than 40% out of the nearly 35 million articles with references deposited with Crossref (to date). Image by Dario Tarborelli, public domain/CC0.

Like sources cited within a Wikipedia article, references cited within a scholarly article can help build powerful discovery tools and a stronger foundation for open knowledge.
Volunteer contributors and software developers in the Wikimedia movement have been curating and incorporating scholarly citations into the Wikimedia projects for quite some time. The GeneWiki project has been linking reference sources to information about genes, proteins, and diseases in Wikipedia and Wikidata. Initiatives like WikiCite aim to create a bibliographic database in Wikidata to serve all Wikimedia projects. The LibraryBase project is building tools to better understand how information in Wikipedia is referenced and guide how editors identify and use references on Wikipedia. The WikiFactMine project is helping connect Wikidata statements in the field of biomedical sciences to scholarly literature.  Programmatic initiatives such as 1lib1ref are engaging librarians to add missing citations to Wikipedia, and services like Citoid are simplifying the discoverability and creation of citations for free knowledge.
These projects depend on the availability of open bibliographic and citation data. We expect I4OC will substantially contribute to all these initiatives.

Example of a partial citation graph for Laemmli (1970), one the most cited scholarly journal articles of all time. Graph generated from open citation data in Wikidata via a SPARQL query. Image by Dario Taraborelli, public domain/CC0.

Over the coming months, the organizations involved in I4OC will be working with different stakeholders to raise awareness of the availability of open citation data and evaluate how it can be reused, analyzed, and built upon. We  will provide regular updates on the growth of the public citations corpus, how the data is being used, additional stakeholders and participating publishers, and new services that are being developed.
Any publisher can freely license and share their reference data by enabling reference distribution via Crossref. For more information and details on how to get involved, please visit the I4OC website: or follow @i4oc_org on Twitter.
A joint press release about the announcement is available on the I4OC website.
Dario Taraborelli, Director, Head of Research, Wikimedia Foundation
Jonathan Dugan, WikiCite organizing committee

[1] As of March 2017, nearly 35 million articles with references have been registered with Crossref. Citation data from the Crossref REST API will be made available shortly after the announcement.


  • OpenCitations
  • Wikimedia Foundation
  • PLOS
  • eLife
  • DataCite
  • Centre for Culture and Technology, Curtin University

Participating publishers

  • American Geophysical Union
  • Association for Computing Machinery
  • BMJ
  • Co-Action Publishing
  • Cambridge University Press
  • Cold Spring Harbor Laboratory Press
  • Copernicus GmbH
  • eLife
  • EMBO Press
  • Faculty of 1000, Ltd.
  • Frontiers Media SA
  • Geological Society of London
  • Hamad bin Khalifa University Press (HBKU Press)
  • Hindawi
  • International Union of Crystallography
  • Leibniz Institute for Psychology Information
  • MIT Press
  • PeerJ
  • Pensoft Publishers
  • Portland Press
  • Public Library of Science
  • Royal Society of Chemistry
  • SAGE Publishing
  • Springer Nature
  • Taylor & Francis Group
  • The Rockefeller University Press
  • The Royal Society
  • Ubiquity Press, Ltd.
  • Wiley


  • Alfred P. Sloan Foundation
  • Altmetric
  • Association of Research Libraries
  • Authorea
  • Bill & Melinda Gates Foundation
  • California Digital Library
  • Center for Open Science
  • Coko Foundation
  • Confederation of Open Access Repositories
  • ContentMine
  • Data Carpentry
  • Dataverse
  • dblp: computer science bibliography
  • Department of Computer Science and Engineering, University of Bologna
  • Dryad
  • Figshare
  • ImpactStory
  • Internet Archive
  • Knowledge Lab
  • Max Planck Digital Library
  • Mozilla
  • Open Knowledge International
  • OpenAIRE
  • Overleaf
  • Project Jupyter
  • rOpenSci
  • Science Sandbox
  • Wellcome Trust
  • Wiki Education Foundation
  • Wikimedia Deutschland
  • Wikimedia UK
  • Zotero

Archive notice: This is an archived post from, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

Inline Feedbacks
View all comments

[…] scholarly articles are subject to copyright, Wikimedia explains in a blog that citation data typically is part of the public domain. In their announcement, the Initiative […]

Great to see this information in the public domain at last, its been a long time coming. See for example this viewpoint published in Science from 1991

Congratulations on this excellent Initiative. It would be great to this data available in bulk download format as well along with the crossref API.

[…] Initiative for Open Citations (I4OC) was announced: the coalition of publishers and others released about 40% of all citations – open and […]