AI for the people: How machines can help humans improve Wikipedia

Translate this post

Sharing Wikipedia’s 20+ years of experience and lessons learned with Artificial Intelligence (AI) and Machine Learning (ML).

An illustration of Artificial Intelligence (AI) represented as a neural network in the shape of a human brain, with the outline of a planet made from dots, lines, and numbers in the background
An illustration of Artificial Intelligence (AI) represented as a neural network. Image by Mike MacKenzie, CC BY 2.0, via Flickr.

The rapid development, distribution, and adoption of artificial intelligence (AI) has spurred legislative and regulatory debates around the world. Many policymakers are now asking: “What should we do about AI?”

While AI has been under development for years, for most people it has suddenly become ubiquitous and unavoidable. The emergence of sophisticated chatbot applications like ChatGPT and Gemini, and subsequent releases of multimodal applications that generate text, audio, and video responses to prompts from users, have brought AI into our homes, our conversations, and our jobs.

Many governments and international organizations are seeking stakeholder feedback about how policies should be formulated in order to best serve the public interest. The Wikimedia Foundation has recently submitted comments in response to several such consultations. 

The Foundation’s comments have fallen into two categories. Some are directly relevant to the work being done by volunteer Wikipedia editors around the world, such as on copyright and openness of foundational AI models. Others applied our values and the valuable lessons we have learned from our AI/ML work to benefit public interest projects focused on free knowledge and the online information ecosystem—i.e., decentralized community-led decision-making, privacy, stakeholder inclusion, and internet commons. We will highlight a few of these themes below.

Attribution and verifiability promote trust and sustain the online information ecosystem

In our comments to the United States Copyright Office’s notice of inquiry, we noted the critical role that attribution of sources plays in the online information ecosystem. Attribution—that is to say, citing and linking to the sources of information that support another work—is central to Wikipedia and other Wikimedia projects. Every assertion in every article must be supported by reliable, authoritative sources so that readers and volunteer editors can verify the accuracy of content. At the same time, all content is made freely available under the Creative Commons BY-SA 4.0 license, which requires anyone who reuses Wikimedia content to provide attribution. In addition to supporting the verifiability of the content, attribution also recognizes the valuable work of the volunteers who contribute to the projects. 

Our comments argue that, for the same reasons, AI systems that use Wikimedia project content should provide attribution. At a minimum, AI developers who include Wikipedia in the training data used to create large language models (LLMs) should publicly acknowledge that use and give credit to Wikipedia and the volunteer editors who made this rich source of raw materials for LLMs. We also urge AI companies—whose chatbots include Wikipedia content in their generative responses (now even more common with the deployment of retrieval augmented generation [RAG])—to provide hyperlinks to the relevant articles, both to credit the authors and to enable people who use AI systems to access information to verify the responses to their queries. Linking to Wikipedia not only helps readers learn more about their query, it also lets them know that the information came from a trusted source. Even setting aside copyright policy and licensing terms, providing attribution in generative outputs improves the quality of those responses, helps readers verify their accuracy, and supports the sustainability of sources like Wikipedia.

Context is important for AI consultations

In our comments to the USAID’s request for information, we encouraged the Agency to prioritize inclusion of all stakeholders. We urged the USAID and others to engage in focused consultations that clearly identify the form of AI in question and its relevant use cases. For example, we suggested that rather than frame a consultation around “AI and education,” interested stakeholders should be provided with more information and context, and—as an example—be consulted about  “generative AI tools for translation and summarization of text for educational purposes.” We also noted the tensions that exist between enabling local communities to develop and share data sources for AI development and the risks that Indigenous culture and knowledge will merely be extracted and exploited, perpetuating historical inequalities and underrepresentation.

In AI, both open and closed models may bring risks, but open AI models bring more benefits

In his Executive Order on AI, President Biden instructed the US Department of Commerce, working through the NTIA, to conduct a public consultation and provide a report on the risks, benefits, and regulatory approaches to “dual use foundation models for which the model weights are widely available.” As mentioned before, part of the NTIA’s request for comments sought input on what this phrase should mean, but the consultation generally implied a distinction between models that are “closed” (with limited public information about how to reproduce or modify the model) and models that are more “open” (with some or all information about the model, including training data, model weights, and source code made publicly available).

Our responses to the NTIA encouraged the agency to recommend regulatory approaches that enabled the development of open AI models. We acknowledged that the development of powerful, multipurpose AI models presents some risks and that, given the rapid pace of AI development, correctly identifying and anticipating those risks could be an additional challenge. However, we also argued that many of those risks would exist regardless of whether an AI model was more open or more closed.

Going further, we argued that making information about AI models more open and more available to the public would lead to more benefits than attempting to keep model information locked behind proprietary doors. With access to information about model weights, source code, and training data, researchers and developers can inspect, test, improve, and modify AI models. This research and development could help to identify flaws and vulnerabilities, counteract biases, and improve the performance of AI tools, as well as modifying those tools to address different needs.

Access to information about AI technologies can help level the playing field for global governance conversations

In addition to participating in consultations with US executive agencies, we provided our comments to the United Nations AI Advisory Body in response to its Interim Report on Governing AI for Humanity. We echoed several of the points we raised in our comments to the USAID, aimed at improving conditions for a more diverse set of stakeholders in conversations about AI governance. To this end, we also suggested that the UN could leverage its connections with academic institutions and researchers around the world to improve the quality and consistency of Wikipedia articles about AI, machine learning, and computer science across the world’s major languages.

We reasoned that policymakers would be better equipped to propose and discuss approaches to regulating AI if they had access to reliable and accurate information about how AI technologies work. We noted that Wikipedia already serves as one source of such information—the article about ChatGPT was one of the most visited articles in 2023 with over 52 million page views—but that more can be done to make information about other AI technologies available in more languages.

Finally, we reminded the UN AI Advisory Body of the critical roles that people play in creating and compiling sources of high quality information, like Wikipedia, and of the importance of ensuring that new technologies support people in this work. Specifically, we urged the UN and others to protect the sustainability of a free and open knowledge ecosystem by recognizing the valuable sources of knowledge that people create and to respect their contributions through clear and consistent attribution.

Empowering people online to share in the sum of all human knowledge is our primary goal

The central theme underpinning all of our positions and suggestions for the development, use, and governance of AI is simple: put people first. This theme echoed through our various comments and also in our contribution and statements to the Global Digital Compact—a process led by UN Member States that aims to establish principles and commitments that can help harness the immense potential benefits of digital technologies. We called upon the international community to consider including three of our main suggestions within the Compact in an open letter, which we co-authored with Wikimedia affiliates. The open letter asks UN Member States to ensure that AI supports and empowers, not replaces, people who work in the public interest.

Final thoughts

Every revolutionary technology comes along with waves of hype and panic, and these waves can sometimes lead to uninformed regulatory or legislative approaches. It is reassuring, however, when governments, international institutions, policymakers, and regulators attempt to gain a better understanding of the technology at hand and to engage in consultations with stakeholders about their needs, concerns, and values. We hope that our participation in these consultations will help steer agencies toward approaches that support, promote, and respect the people that generate the world’s knowledge and emerging technologies—and continue moving together toward a better, shared digital future.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?