Dagbanli is spoken by over 3 million people in Ghana and northern Togo, yet it has no comprehensive dictionary, let alone a digital one. This is the story of why we built one, why we chose to make it digital first, and why we selected Wikidata as its backbone.
Introduction
For many underrepresented languages, a “dictionary” means a basic bilingual wordlist, a bridge to or from a dominant language like English or French. But a language is more than a translation. It is a system of sounds, grammar, and cultural context that deserves to be understood and appreciated on its own terms.
In an increasingly digital world, a language’s presence online matters more than ever. When people need information, they reach for their phones first. Imagine you speak one of the world’s 7,000 languages, but when you search for a word in your own language, there’s nothing there. No dictionary app. No spell checker. No voice assistant that understands you. For speakers of Dagbanli, a major language of Northern Ghana, this is reality.
Dagbanli has a rich oral tradition, a complex grammar, and a vibrant community of over three million speakers. Yet in the digital world, it remains nearly invisible. While English, French, and other well-resourced languages enjoy countless digital tools, Dagbanli speakers have been left behind. This isn’t just an inconvenience. It’s a threat to the language’s survival in an increasingly connected world.
Today we introduce dagbanli.info, a new kind of dictionary for the Dagbanli language. Built entirely on crowd-sourced, open data, it offers a living, growing resource that puts the language first. This is the first in a series of blog posts about the Dagbanli Dictionary, telling the story of why we built it, how it works, and why we chose an open, community-driven foundation: Wikidata.
Dictionary home page with search bar and “Dagbanli Dictionary” title
The Dagbanli Language: Context and Significance
Dagbanli is a Gur language spoken by the Dagbamba people primarily in Ghana’s Northern Region. With over 3 million speakers, it is one of the country’s most widely spoken indigenous languages. Yet its digital footprint is tiny compared to its number of speakers.
The language has several features that make it both fascinating and challenging.
- Rich morphology: Dagbanli is agglutinative, meaning words are built by stringing together meaningful parts. For example, the word “ninsalnima” (plural) is formed from the singular “ninsala” plus a suffix.
- Tone: Like many West African languages, Dagbanli uses tone to distinguish meaning. The same sequence of sounds can mean different things depending on pitch. For example, wahu can mean “snake” or “horse” depending on tone.
- Digraphs: The alphabet includes multi-character letters such as “gb”, “kp”, and “ŋm”, each treated as a single letter. This poses unique challenges for sorting and searching.
A note on naming: As part of the broader work of language decolonization, we’ve chosen to use “Dagbanli” throughout the platform rather than “Dagbani,” which was standardized during the colonial era. This aligns with the native system: Dagbaŋ (the land), Dagbamba (the people), and Dagbanli (the language). It’s a small but intentional step toward linguistic self-determination.
For centuries, knowledge has been passed down orally. Griots, elders, and everyday conversations keep the language alive. But today, if a language isn’t on the internet, it risks being invisible to younger generations who grow up with smartphones. Digital tools aren’t just conveniences. They’re essential for intergenerational transmission.
Moving Beyond Static Bilingual Wordlists
Dagbanli has been documented before. The lawyer and historian Ibrahim Mahama produced foundational wordlists, translating between Dagbanli and English. Researchers like Roger Blench and Tony Naden created valuable lexical checklists. But these works existed as physical books or static PDFs, closed formats that could not grow with the language, often created for outsiders seeking translations rather than native speakers seeking depth. In the digital space, most indigenous languages are still reduced to simple bilingual lists that strip away grammatical structure, usage context, and oral heritage.
This gap has real consequences because learners (both native speakers and second-language learners) will struggle to truly access the language. Writers will lack spell checkers. Linguists will lack easily searchable corpora. The language itself becomes harder to use in modern contexts.
But the gap also presents an opportunity. With today’s open-source tools and collaborative platforms, we can build a dictionary that is:
- Free for anyone to use,
- Searchable and accessible on any device,
- Audio-rich, so learners can hear correct pronunciations,
- Openly licensed, so the data, even the infrastructure can be reused by researchers, app developers, and educators.
The Dagbanli Dictionary marks an intentional shift from static archives to a truly living, growing language resource.
Why Wikidata as the Foundation
When we started, we had to choose a data source. We could build our own database from scratch, but that would mean reinventing the wheel, and the data would be locked in our system. Instead, we chose Wikidata, the free, structured knowledge base that powers Wikipedia’s infoboxes.
Wikidata offers several advantages:
- Structured lexicographical data: Since 2018, Wikidata has supported a full model for words: lexemes (the word itself), senses (meanings), and forms (grammatical variants). This maps perfectly to a dictionary entry.
- Open licensing: All data is CC0, public domain. Anyone can reuse it without asking permission.
- Community-editable: The Dagbanli community on Wikidata had already started adding lexemes. By building on Wikidata, every new contribution instantly improves the dictionary. The dictionary grows as the community grows.
- No lock-in: Because the data lives in Wikidata, it’s not tied to our code. Other projects can use the same data to build their own tools.
Choosing Wikidata meant the dictionary could be a living resource, not a one-time publication. Every time a volunteer adds a new word or sense on Wikidata, it appears in the dictionary automatically after the next harvest.
The Vision: More Than a Word List
From the start, we wanted the dictionary to be more than a list of Dagbanli words with definitions in a secondary language. We envisioned a tool that would serve the Dagbanli community in practical ways:
- Monolingual depth: The dictionary prioritizes Dagbanli‑language definitions over English translations. This encourages users to think and read in Dagbanli as a primary language of thought.
- Audio from native speakers: Through Wikimedia Commons, we link pronunciation recordings of words, made by real people. You can hear a word spoken, not just read it.
- Offline-first: Many users in rural areas have unreliable internet. The dictionary can be fully downloaded and works offline, with all data stored locally in IndexedDB.
- Bilingual and localized: The interface itself can be toggled between English and Dagbanli. We’re working to complete the Dagbanli UI so the entire experience can be in the language it serves.
- Usage examples from Mozilla Common Voice and University of Ghana research datasets: We’ve integrated thousands of sentences with audio, so learners see words in context.
- Removing technical barriers: Standard mobile and desktop keyboards often lack Dagbanli special characters (such as ɛ, ɔ, ŋ, ɣ, and ʒ). To solve this, we integrated a custom floating keyboard that appears when the search input is focused. This ensures that speakers can type, search, and eventually contribute in their own orthography without friction.
This isn’t just a dictionary. It’s a platform for preserving and revitalising Dagbanli in the digital age, a new benchmark for what African language tools can achieve.
Conclusion
Dagbanli deserves the same digital language tools that English and French speakers take for granted. By building on Wikidata, we’ve created a dictionary that is open, community-driven, and perpetually improvable. It’s a small step toward closing the digital language gap, but one we hope will inspire similar efforts for other under-resourced languages.
Monolingual dictionaries, where a language is defined on its own terms, remain exceptionally rare in Africa. Outside South Africa’s advanced research infrastructure (such as the African Wordnet nodes and SADiLaR’s projects), few communities have access to comprehensive monolingual tools. We hope this project serves as a model and an invitation for others to fork and replicate it for their own languages.
So how do you actually structure a language as morphologically rich as Dagbanli on Wikidata? That’s the subject of the next post.
You can learn more about the project’s goals, priorities, and how to contribute on the official Wikidata project page. This work is supported by the Foundation for Indigenous and Oral Knowledge Archives (IOKA), which champions the preservation and promotion of oral heritage through digital innovation.
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation
