Mapping knowledge gaps makes otherwise invisible absences visible, allowing editors, event organizers, and developers to systematically identify what’s missing, excessive, poorly structured, or uncatalogued. Instead of relying solely on personal knowledge or random discovery, these methods can surface knowledge discrepancies more objectively, thereby enabling the Wikimedia community to direct editorial and technical efforts precisely to areas that most need attention.
You’ve jumped into the fourth post in our series; check out our previous posts:
● Post 1: An Overview of Wikipedia’s Structural Framework
● Post 2: Navigating Wikipedia’s Knowledge Discrepancies
● Post 3: Prioritizing Wikipedia’s Knowledge Gaps
The Wikimedia community, consisting of volunteers and staff, has developed numerous tools and workflows. Many of these, though originally intended for other purposes, can be adapted to map these diverse knowledge discrepancies. This post will categorize and describe a selection of tools and workflows based on the type of gap they help identify. It’s important to note that this discussion exclusively addresses gaps within Wikipedia itself, excluding sister projects like Wikimedia Commons or Wiktionary. Furthermore, we will focus strictly on tools, excluding individual user scripts or administrative mass-cleanup utilities, prioritizing those available in English for broader understanding.
Tools for Mapping Missing Content
Identifying where information is entirely absent, whether it’s a missing article or critical content within an existing one, is a fundamental aspect of gap mapping. Several tools exist to highlight these omissions:
- PetScan: A very useful tool for allowing users to create highly customized queries of Wikipedia data. By combining various criteria such as page size, category membership, and Wikidata linkage, PetScan is extensively used for mapping coverage gaps. For example, one could find all articles within a certain category with empty sections, suggesting incomplete coverage.
- Listeria: Designed for generating and updating dynamic lists, Listeria can auto-populate tables of items from Wikidata. This functionality is particularly useful for identifying notable individuals, places, or concepts that have a presence in Wikidata but currently lack a corresponding Wikipedia article in a target language.
- SPARQL: As a powerful query language for Wikidata, SPARQL enables complex, sophisticated data retrieval. This allows users to find notable topics absent from specific language editions of Wikipedia by querying Wikidata. An example might be finding all Mexican climate activists born between 1980 and 2000 who have a Wikidata item but are missing an article in the French Wikipedia.
- Wiki Shoot Me: Visually intuitive tool that maps Wikidata items missing corresponding images. For instance, it’s effective for quickly finding public art installations that lack photographic documentation on Wikimedia Commons.
- FIST (Free Image Search Tool): This tool recommends images that are present in Wikimedia Commons or Wikidata but are currently missing from relevant Wikipedia articles, addressing visual content gaps directly.
- WikiCompleter: Can be used for identifying topics that are well-covered in many Wikipedia languages but are conspicuously absent from a specific target language, highlighting translation/coverage gaps.
- Navboxes: Thematic navigational boxes, or “navboxes,” provide a visual and practical means of mapping gaps. These templates often contain internal links to existing articles within a specific theme or topic area, with red links visually indicating articles that are missing but considered relevant.
- Not In The Other Language: As its name suggests, this tool helps identify topics that have an article in one Wikipedia edition but are missing in another, facilitating cross-language content expansion efforts.
- WikiMap: Focusing on geographical knowledge gaps, this tool allows users to pinpoint images absent from a specific location, such as photographs for monuments in San Francisco.

Topics without an article on English Wikipedia are shown in red.
Tools for Mapping Poorly Structured Content
Poorly structured content, as we discussed in Blog Post 2, encompasses a range of issues from inaccuracy to disorganization. Identifying these nuances often relies on community input and automated analysis:
- Maintenance Templates: Wikipedians actively flag poorly structured content by adding various maintenance templates directly to articles or sections (e.g., {{citation needed}}, {{outdated}}, {{cleanup}}, {{unreferenced}}). These templates serve to signal specific quality issues.
- PetScan: PetScan’s versatility extends to sorting and filtering these maintenance templates. It can generate lists of articles that have been tagged with specific maintenance templates, allowing editors to find poorly structured content pertaining to specific issues, such as statements without citations or articles containing outdated information.
- Citation Hunt: This tool specifically identifies statements within Wikipedia articles that are flagged as needing references. It presents these statements to users in an easy-to-use interface, facilitating targeted improvements in verifiability and reducing the number of unverified claims.
- Manual article assessment and ORES scores: Wikipedia uses manual article assessment, where experienced editors rate articles on scales such as Stub, Start, C, B, Good Article (GA), and Featured Article (FA). Automated quality predictions are also generated by the Objective Revision Evaluation Service (ORES), which uses machine learning models trained on past assessments to estimate the likely quality of an article. A highly important article rated low on the quality scale by both manual and machine assessment could indicate knowledge discrepancies.
Tools for Mapping Surplus Content
Surplus content refers to material that either doesn’t belong on Wikipedia or is inappropriately included (e.g., copyright violations, vandalism, non-notable topics). The following tools help identify and manage these unwanted additions:
- Copyvio Detector: This tool helps find content on Wikipedia that has been copied from external copyrighted sources, such as textbooks, ensuring adherence to Wikipedia’s open-licensing policies.
- Community Activity Logs: While not a “mapping” tool in the sense of finding content types, these logs detect sudden or unusual activity that is often indicative of mass vandalism or spam. By monitoring these logs, communities can discover such events in real time and initiate a rapid response to remove unwanted content.
- PetScan: Again, PetScan can also be utilized to identify articles with promotional, non-notable, or other undesirable content. It does this by generating lists of articles marked by specific maintenance templates (e.g., {{advert}}, {{notability}}) that indicate various types of surplus content.
Tools for Mapping Cataloging Gaps
Cataloging gaps occur when the structural metadata, such as categories or Wikidata connection, is missing or incorrect, hindering discoverability and machine processing.
- Duplicity: Identifying Wikipedia articles that do not have a corresponding item on Wikidata, highlighting discrepancies between the structural data repository and Wikipedia’s article base. Bridging these gaps enhance cross-project connectivity and machine readability.
- HotCat and Cat-a-lot: These user-friendly tools are designed to streamline the process of managing categories on Wikipedia. HotCat allows editors to quickly add, remove, or change categories directly from the article view, while Cat-a-lot facilitates mass changes to categories across multiple articles, significantly improving the efficiency of this process.
There are several other tools for mapping cataloging gaps, but these are generally used by experienced users engaged in article cleanup. Some can only be accessed by users with extended rights. For brevity, we do not list them here.
External Tools Beyond Wikimedia Ecosystem
Many tools developed outside Wikimedia’s immediate ecosystem can be used for mapping knowledge gaps. Data Visualization Tools (such as Tableau public) could be used to analyze Wikipedia’s content in new ways, revealing patterns that might not be obvious from raw data. External Academic Databases (such as the MeSH database) can be used to compare against the knowledge present on Wikipedia, highlighting gaps in scholarly coverage. Emerging Artificial Intelligence technologies, such as Generative Large Language Models (such as ChatGPT Deep Research), can synthesize information against prompts designed to specifically find knowledge gaps in a given article on a topic area. Search Engine Analysis tools (such as Google Trends) can help highlight public interest topics that Wikipedia doesn’t currently have robust content for.
Workflows for Mapping Wikipedia’s Knowledge Discrepancies
Beyond individual tools, Wikipedia’s knowledge gaps are systematically mapped through established community workflows. WikiProjects serve as dedicated hubs where editors collaborate around specific topics or themes (e.g., WikiProject Medicine, WikiProject Women in Red). WikiProjects are crucial for identifying and addressing content gaps systematically within their areas of interest. Among these, the WikiProject Missing Encyclopedic Articles explicitly aims to identify notable topics that lack articles on Wikipedia.
Additionally, the Vital Articles List highlights a curated set of core topics essential for any encyclopedia, signaling which articles demand the most rigorous quality improvements. Tools like Programs & Events Dashboard are useful for event organizers to track the progress in the development of a group of articles over time. They allow for the creation and tracking of lists of articles needing improvement, effectively coordinating edit-a-thons and campaigns aimed at closing specific knowledge gaps.
Other methods include comparative language edition analyses backed by Wikidata, where gaps where a topic is well-covered in one language but missing or underdeveloped in another inform editors in the less-developed language of potential expansion opportunities (e.g., using vital article comparison tools). Mapping from external databases such as academic indexes (such as ORCID for researchers) or cultural heritage collections (like Sum of All Paintings for artwork) also reveals specific topics or subjects that are well-documented externally but are absent on Wikipedia. Lastly, advanced workflows can leverage more than one of the tools mentioned above, such as data generated from the SPARQL interface being visualized on an external website (e.g., Humaniki).
Together, these structured tools and collaborative workflows make it possible to visualize unseen gaps, enabling the Wikimedia community to turn them into actionable tasks.
Wrapping up
In this blog post, we have aimed to introduce some popular tools and workflows for systematically identifying and visualizing knowledge gaps. While this list is by no means comprehensive, we hope it will be useful for editors, event organizers, and developers to start familiarizing themselves with the tools they need in their arsenal for daily work.
Are there additional tools or workflows you personally use to map Wikipedia’s knowledge gaps? If so, please share them with us in the comments; we are always interested in learning from your experience and expanding our understanding.
This topic was presented on 7th August during Wikimania 2025; the slides can be viewed on Commons here, the video recording of the presentation is here.
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation