Preserving Wikipedia citations for the future: Geoffrey Bilder

Translate This Post

Geoffrey Bilder
Geoffrey Bilder is working to prevent the death of hyperlinks on Wikipedia, also known as “link rot”. Photo by Helpameout, freely licensed under CC BY-SA 4.0.

Reading scholarly work on print often involves frequent interruptions, as footnotes and endnotes would often call for midstream re-evaluation for readers who dart back and forth from passage to citation. This jumping around has become a natural process of reading on the Internet, where bright blue hyperlinks can be followed with a single click. Still, while hyperlinks may have smoothed out the jarring experience of reading printed scholarly text, they do have one major disadvantage.

Geoffrey Bilder calls it “link rot“: the demise of hyperlinks that no longer point to their original resource.

Bilder has spent 15 years in the scholarly communication industry and has used Wikipedia since the early 2000s. He tells us that the average lifespan of a hyperlink has been a mere six years—after that time, the many of the pages being linked to will be taken down or moved to a new location. As such, the original link is no longer useful as a citation tool.

This widespread “link rot” troubles Bilder, who is concerned that broken citations can undermine the reliability of online scholarly work. To address this issue, he joined CrossRef, a non-profit organization that provides web resources and the infrastructure to keep hyperlinks working.

What the infrastructure does is rather simple: it decouples the location of an item on the web from its identifier. This allows people to easily update the identifier, thereby preventing the link from breaking if the location of the item changes.

“We can’t have references break in six years,” said Bilder. “We’re trying to preserve references and citations for … five thousand or several hundred years.”

Bilder, who is an avid user of Wikipedia as a resource, says the issue of correct online citations has become critical, due to a surge in websites that point to scholarly work online. Wikipedia is one of them: the encyclopedia is one of the top 10 web referrers to scholarly literature online.

Bilder adds that while you might “think the National Institutes of Health or Scopus, a gigantic scholarly database, would far exceed Wikipedia in terms of web referral power, but Wikipedia is right up there!”

That’s despite the fact that only a fraction of Wikipedia articles cite any scholarly references.

“Because a lot of people have been referring to scholarly literature, we try to make sure that cross-publisher citations do not break in six years, are always up-to-date, and always point to the same material,” says Bilder.

In 2014, Bilder and his team launched an initiative to better integrate scholarly literature and scholarly identifiers into Wikipedia.

“What we are starting to realize is that a lot of the citation tools in Wikipedia have not been updated for a long time,” he says. “Since then we’ve been working on trying to get real-time feed of DOI citations from the all the different language wikipedias.”

The idea behind the project, called DOI Event Tracking (DET), is to eventually feed the citation data from Wikipedia into a more general tool that will allow scholarly communities to track mentions of DOIs that occur outside of the formal scholarly literature.

Bilder hopes to work with more Wikipedians to help improve Wikipedia’s citation tools and eventually remove “link rot” from the site.

Profile by Yoona Ha, Assistant Storyteller Intern, Wikimedia Foundation
Edited by Victor Grigas, Storyteller and Video Producer, Wikimedia Foundation
Interview by Jan Novak, Wikimedia community volunteer

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

6 Comments
Inline Feedbacks
View all comments

Surely a partnership with the Internet Archive – where all cited links are automatically generated as internet archive links. Then you have a stable, time-bound citation. I use Internet Archive links whenever I cite anything now, especially Government and News sites as they often restructure and kill old links.

naturally, none of these solutions address the fact that much literature is still behind paywalls and difficult to link to. That’s a longer and more complicated struggle ahead.

Thanks for the writeup + comments. re: paywalls, the Wikipedia OA Signalling project is using the CrossRef API (amongst other techniques) to indicate when citations point to content that can be accessed without a paywall. While CrossRef will also soon be working to provide link citation backups via the Internet Archive (and other archives, if they are interested), we think it would be a mistake to *default* to citing content out of context (e.g. in the Internet Archive) because the surrounding paratext is often important in interpreting the target of a citation. So we will always try to link to… Read more »

> Surely a partnership with the Internet Archive – where all cited links are automatically generated as internet archive links.
IA is already doing this without asking anything. See https://www.mediawiki.org/wiki/Archived_Pages

Academic libraries have already created a solution to this challenge.

[…] a photograph. Some will go further, cutting-and-pasting parts of other images, like the head of Geoffrey Bilder on the body of a runway model. Image alteration is an industry standard in the world of fashion, […]