Wikipedia community and Internet Archive partner to fix one million broken links on Wikipedia

Photo by Diego Delso, CC BY-SA 3.0.
Photo by Diego Delso, CC BY-SA 3.0.

The Internet Archive, the Wikimedia Foundation, and volunteers from the Wikipedia community have now fixed more than one million broken outbound web links on English Wikipedia. This has been done by the Internet Archive’s monitoring for all new, and edited, outbound links from English Wikipedia for three years and archiving them soon after changes are made to articles.  This combined with the other web archiving projects, means that as pages on the Web become inaccessible, links to archived versions in the Internet Archive’s Wayback Machine can take their place.  This has now been done for the English Wikipedia and more than one million links are now pointing to preserved copies of missing web content.
This story is a testament to the sharing, cooperative nature and resulting benefits of the open world.
What do you do when good web links go bad? If you are a volunteer editor on Wikipedia, you start by writing software to examine every outbound link in English Wikipedia to make sure it is still available via the “live web.” If, for whatever reason, it is no longer good (e.g. if it returns a “404” error code or “Page Not Found”) you check to see if an archived copy of the page is available via the Internet Archive’s Wayback Machine. If it is, you instruct your software to edit the Wikipedia page to point to the archived version, taking care to let users of the link know they will be visiting a version via the Wayback Machine.
That is exactly what Maximilian Doerr and Stephen Balbach have done. As a result of their work, in close collaboration with the non-profit Internet Archive and the Wikimedia Foundation’s Wikipedia Library program and Community Tech team, now more than one million broken links have been repaired. For example, footnote #85 from the article about Easter Island, now links to the Wayback Machine instead of a now-missing page.  Pretty cool, right?
“We are honored to work with the Wikipedia community to help maintain the cultural treasure that is Wikipedia,” said Brewster Kahle, founder and Digital Librarian of the Internet Archive, home of the Wayback Machine. “By editing broken outbound links on English Wikipedia to their archived versions available via the Wayback Machine, we are helping to provide persistent availability to reference information. Links that would have otherwise lead to a virtual dead end.”
“What Max and Stephen have done in partnership with Mark Graham at the Internet Archive is nothing short of critical for Wikipedia’s enduring value as a shared repository of knowledge. Without dependable and persistent links, our articles lose their backbone of reliable sources. It’s amazing what a few people can do when they are motivated by sharing—and preserving—knowledge,” said Jake Orlowitz, head of the Wikipedia Library. “Having the opportunity to contribute something big to the community with a fun task like this is why I am a Wikipedia volunteer and bot operator.  It’s also the reason why I continue to work on this never-ending project, and I’m proud to call myself its lead developer,” said Maximilian, the primary developer and operator of InternetArchiveBot.
So, what is next for this collaboration between Wikipedia and the Internet Archive? Well… there are nearly 300 Wikipedia language editions to rid of broken links. And, we are exploring ways to help make links added to Wikipedia self-healing. It’s a big job and we could use help.
Making the web more reliable… one web page at a time. It’s what we do!
Mark Graham, Director, Wayback Machine Project
Internet Archive

A huge thank you to Kenji Nagahashi, Vinay Goel, John Lekashman, Mark Graham, Maximilian Doerr, Stephen Balbach, the Wikimedia Foundation, Wikipedia community members, and Brewster Kahle.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

20 Comments
Inline Feedbacks
View all comments

[…] and the Internet Archive have teamed up. “The Internet Archive, the Wikimedia Foundation, and volunteers from the Wikipedia community […]

Thank you all involved for your vision and dedication. This is excellent.

[…] répondre à ce problème, la fondation Wikimedia a noué un partenariat avec Internet Archive, qui lui a déjà permis de rétablir la lisibilité de plus d’un million de liens qui […]

Guess this would put an end to users using the anchor link text for the SEO purpose?

Great news for the English Wikipedia. Is there any plan for other language versions?

This is honestly one of the most pleasantly smart things I’ve read about in a long time. Thanks for all of your excellent work to bring us the best internet resource we’ve ever had!

[…] and the Internet Archive have teamed up. “The Internet Archive, the Wikimedia Foundation, and volunteers from the Wikipedia community […]

Would this not be even better: Additionally, when finding a live link, ask the Archive to crawl it once to add it to the index (https://archive.org/web/). That way, you will be quite guaranteed to find a version later when the link goes dead.

I came here to share the same idea Johannes also had.
It would be great if all new links could be added to the wayback machine to not loose any of the sources.

In fact, this is very bad news because what Wikipedia needs to stay on top ot things would be people who check whether the information of an item referenced with a dead link still is valid, or whether there is a more appropriate reference that should take the place of the dead link. If you only replace one deal link with an archive version you make sure Wikipedia remains out of date.

[…] Wikipedia Community and Internet Archive teamed up to fix one million broken links on Wikipedia, check out the article here. Ever heard of link […]

[…] Вася Атанасова Редактор: Лъчезар Илиев По текст на Марк Греъм от блога на Фондация Уикимедия По проекта работят: […]

For 3 years we have been crawling ALL Wikipedia links in every language. We at Internet Archive would next really like to work with other Wikipedia groups to have similar bots replacing dead links. Looking for partners to work with us!

Thank you for these comments and suggestions. As a matter of fact the Internet Archive does capture and preserve all new/changed Wikipedia links to help insure we will have them if the original, live versions, go bad. And, yes, we are now working to extend this effort to other Wikipedia’s (and other platforms) world-wide. Please do continue to share your ideas about how we might work together to help make the web more reliable!

[…] La communauté Wikipedia et l’Internet Archive s’unissent pour corriger un million de liens bris…. […]

Do you loose http://wikiwix.com it s an alternative solution since 2008 on french Wikipedia ( 100 000 000 links ) , we could storage also all the project the WMF.
It s
https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Syst%C3%A8me_de_cache 🙂

[…] Wikipedia community and Internet Archive partner to fix one million broken links on Wikipedia […]

[…] niet meer werken aan te passen en te laten linken naar een pagina in het archief. Inmiddels zijn al meer dan een miljoen links […]

When a domain expires, a speculator usually snaps it up, parks it, and sets robots.txt to exclude the domain. This causes Wayback Machine to deny access even to previously archived versions of documents on that domain. How should links to documents on domains that have expired and been snapped up, parked, and robot-excluded be fixed?

[…] of New Nigerian cinema. Many Nigerian news outlets do not maintain archives, which leads to a significant amount of lost knowledge and a much more time-consuming process for […]