Contributing to Wikipedia is a powerful experience—when you edit an article, the information is immediately available to anyone in the world. But imagine transcending the limits of language, where adding a fact or concept makes it instantly accessible to all of the hundreds of different language editions of Wikipedia, to any project within the Wikimedia universe, or to any entity on the Internet.
This is the promise of Wikidata, and it is already a reality. As Wikidata celebrates its fifth birthday, it has evolved from an experimental semantic web database to an inter-language linking hub for Wikipedia articles, to now being an engine for capturing relationships among numerical, text, visual and graphical content. Interestingly, as it gains more traction within the Wikimedia community, Wikidata’s impact is perhaps even more significant outside of the Wikimedia universe — national libraries and world class museums working with “linked open data” have turned to Wikidata as a crucial public-facing interlinking hub.
Why has Wikidata garnered such enthusiasm so quickly? It has become the most visible and user-friendly embodiment of the ideals of the semantic web—an evolution of the World Wide Web that its inventor, Tim Berners-Lee, has advocated for representing and sharing concepts across the internet.
Wikidata’s difference is that it deals with concepts. Human-readable labels provide language-specific references to these concepts. Take the example of Muammar Kaddafi, the former leader of Libya who has had dozens of variations in the Latinized spelling of his name. This has always posed a problem for anyone doing a search on the web. Where the traditional lexical web from 1994 struggles, Wikidata as an implementation of the semantic web uses a number to represent him (Q19878) as an item in the database. It records 118 spelling variations of his name and essential facts about his life, relationships, and positions of power as statements (“military rank” > “colonel”). It also contains a list of identifiers—pointers to dozens of other databases from libraries, catalogs or web sites that contain records about him. At the first ever WikidataCon in Berlin this week, Theo Van Veen of the National Library of the Netherlands was bold enough to label his talk “Wikidata as [a] universal (library) thesaurus” for the world, proposing Wikidata serve as the world’s most prominent central hub (video).
Wikidata embraces this role with unique approaches and interfaces. A project to link Wikidata to all available public data catalogs has been a pet project of Wikimedia veteran and programmer Magnus Manske. His “Mix’n’Match” tool provides a game-like interface which allows beginners with no Wikidata knowledge to connect concepts in Wikidata to those in other databases. A list of the more than 500 databases in the tool is a who’s who of the world’s premier cultural and knowledge institutions from all over the world.
At WikidataCon’s opening session, Lydia Pintscher, Wikidata’s program manager, provided some impressive numbers about Wikidata’s rise. It is the fastest growing project within the movement, with more than 37 million items, and a community of more than 1,400 very active editors. Edits on Wikidata account for roughly one in three edits across all Wikimedia projects. The interface that supports free-form database queries in Wikidata using the SPARQL language handles 8.5 million requests a day from users all over the Internet. The ability to create complex queries and visualizations (e.g. “Show me all politicians who are descendants of other politicians and have served in the military”) is a powerful tool providing new unprecedented capabilities.
Perhaps the most visible use of Wikidata in Wikipedia today is through automated information boxes that access statements from Wikidata to directly create rich infoboxes.
Catalan Wikipedia has been a pioneer in this area, creating advanced infobox templates with interactive maps and detailed career information for biographies, all invoked with one line of wiki markup code. Today, 58% of their articles utilize Wikidata-driven infoboxes. The French Wikipedia articles about monuments and artists have adopted more than 40,000 Wikidata-driven infoboxes. The English Wikipedia, on the other hand, has but a few hundreds of these types of automated infoboxes, as there is still intense debate about their use in its five million articles.
With Wikidata’s content available under a CC0 license, extensive reuse is allowed and encouraged. This has spawned projects like Histropedia, a customizable interactive visual timeline based on Wikidata, and means that you’ve probably used Wikidata before without knowing it. Google’s Knowledge Graph project, which helps power their search engine, makes extensive use of Wikidata. In fact, Google’s own linked open data project, Freebase, was discontinued because of the superiority of Wikidata’s approach and active community.
With the rapid growth and innovative uses of Wikidata, there are caveats and concerns. A high percentage of the statements in Wikidata items have no references whatsoever, or simply state “Wikipedia” as the source. The lack of references has led to friction in adopting Wikidata in certain projects, as there are questions over the centralization of factual information on Wikidata and a subsequent loss of control on Wikipedia. Will interfaces to modify and work with Wikidata within Wikipedia be developed? These are still critical questions that will require more user experience and developer experimentation.
The main strength of Wikidata—a set of factual claims, which can grow and adapt without a rigid central schema—is also its main weakness. Ontological modeling problems abound, with some domains of knowledge being healthy and consistent—paintings and monuments, for example—while others suffer from less attention or haphazard treatment.
Despite these issues, the future is bright for Wikidata within the Wikimedia movement and global community. There is now a “structured data for Commons” project to bring a Wikidata-oriented metadata system to the Commons multimedia repository, with a team of four employees and funding from the Sloan Foundation. WikiCite is a project to store citations using Wikidata’s methods, providing consistent, open citations for Wikimedia projects, academia and the world. The ability to do a federated search with Wikidata across multiple semantic web databases around the world is extremely powerful and is still being optimized.
Within the Wikidata team’s presentation slides is the 16th century painting The Tower of Babel by Pieter Bruegel the Elder. The biblical tale from the Book of Genesis about the Tower relates a time when the “whole earth was of one language, and of one speech.” Pintscher sees that as an inspiration and encapsulation of the challenges for Wikidata:
“Working across language and culture barriers in Wikipedia has been really hard. What we are trying to do with Wikidata is to make that possible. Have people from many different languages come together and work on the same data, no matter what language that speak. That’s hard but also amazing to do.”
Andrew Lih, Wikimedia District of Columbia
Robert Fernandez, Wikimedia District of Columbia