
A dictionary without pronunciation is incomplete, especially for a tonal language like Dagbanli. Here’s how we built a pipeline from Wikimedia Commons recordings to in-browser playback, including a workaround for iOS Safari.
Introduction
Dagbanli is a tonal language where pronunciation carries meaning. The word wahu with high tones means “horse”, but with low tones means “snake”. A text-only dictionary that simply prints the spelling misses this crucial dimension. Listeners cannot know which word is intended unless they hear it.
In our previous post, we covered how we structured Dagbanli Lexemes on Wikidata using Senses, Forms, and special handling for digraphs. Now we turn to the audio pipeline that brings those words to life.
Wikidata’s lexicographical model includes the P443 property (pronunciation audio), which links a specific form of a Lexeme to an audio file stored on Wikimedia Commons. These recordings are crowdsourced contributions from native Dagbanli speakers, making them an invaluable resource for preserving authentic pronunciation.
The challenge is making these recordings play reliably across every device, operating system, and browser, including those that refuse to play the open OGG format. This post dives into how we built an audio pipeline that respects the source of truth while ensuring universal playback.
1. P443: Pronunciation Audio on Wikidata
How Audio Is Stored on Wikidata
When a contributor records a pronunciation for a Dagbanli word, they upload an audio file (typically .ogg format) to Wikimedia Commons. Then, on the corresponding Lexeme Form, they add a statement with property P443 pointing to that file.
For example, the singular Form of “kuli” might have:
kuli (L307875-F1)
P443 --> File: Dag-kuli.ogg
The file itself lives at a URL like:
https://upload.wikimedia.org/wikipedia/commons/5/5e/Dag-Kuli.ogg
Extracting Audio During Harvest
Our harvest script (running every six hours via cron) fetches all Dagbanli Lexemes from Wikidata. When processing Forms, it looks for P443 claims and extracts the filename:
javascript
const audioClaimValue = Form.claims?.P443?.[0]?.mainsnak?.datavalue?.value;
const audioFilename = typeof audioClaimValue === 'string'
? audioClaimValue
: audioClaimValue?.value || audioClaimValue?.text;
const audioUrl = audioFilename
? `https://commons.wikimedia.org/wiki/Special:FilePath/${encodeURIComponent(audioFilename.replace(/ /g, '_'))}`
: undefined;
The resulting audioUrl is stored in the final JSON alongside the Form data, ready to be served to the frontend.
The Audio Index
Scanning every Lexeme’s Forms on each search or filter would be prohibitively slow. Instead, we build an audio index: a lightweight set of Lexeme IDs that have at least one Form with a P443 recording.
javascript
const audioIds = new Set();
for (const Lexeme of Lexemes) {
if (Lexeme.Forms?.some(Form => Form.audioUrl)) {
audioIds.add(Lexeme.wikidataId);
}
}
This index powers the “Has Wikidata Form Pronunciation Audio” filter in the Gballi browser, enabling instant filtering across 11,000+ words without scanning the entire dataset.
2. The OGG Problem on iOS
Why OGG?
Wikimedia Commons stores audio in the OGG Vorbis format. It is an open, patent-free codec aligned with the mission of free knowledge. No proprietary licensing, no restrictions, exactly what an open platform should use.
Unfortunately, browser support for OGG is inconsistent:
| Browser | OGG Support |
| Chrome | Full support |
| Firefox | Full support |
| Edge | Full support |
| Safari (macOS) | Partial (may require configuration) |
| Safari (iOS) | No support at all |
iOS Safari simply refuses to play OGG files. No fallback, no codec download, nothing.
Our Solution: On‑the‑Fly Transcoding
We could not just store MP3 copies alongside the OGG files, because Wikimedia Commons is the source of truth. Duplicating files would break the principle of a single authoritative source. Instead, we built a transcoding proxy into our Cloudflare Worker.
When the frontend requests an audio file, it first checks if the browser is Safari on iOS:
javascript
const isIOS = /iPad|iPhone|iPod/.test(navigator.userAgent) && !window.MSStream;
const isSafari = /^((?!chrome|android).)*safari/i.test(navigator.userAgent);
const needsTranscode = (isIOS || isSafari);
If transcoding is needed, the app requests the audio through our worker’s /audio-proxy endpoint:
https://dagbanli-harvest-worker.workers.dev/audio-proxy?url=https://upload.wikimedia.org/wikipedia/commons/5/5e/Dag-Kuli.ogg
The worker then:
- Fetches the original OGG file from Wikimedia Commons.
- Transcodes it to MP3 using FFmpeg (via WebAssembly or a dedicated transcoding service).
- Caches the result in R2 for future requests.
- Returns the MP3 with appropriate headers.
This approach keeps Wikimedia Commons as the single source of truth while providing seamless playback on all devices.
Why Not Just Store MP3s?
We considered running a one‑time script to download all OGG files, convert them to MP3, and upload them to R2. This would simplify the architecture significantly. However, it would break the connection to Wikimedia Commons:
- If a new recording is added, our dictionary would not see it until we re‑ran the conversion.
- If an existing recording is improved or corrected, we would have a stale copy.
- We would be responsible for storing and serving audio files indefinitely, increasing our storage costs and maintenance burden.
By keeping Wikimedia Commons as the source of truth and transcoding on‑demand, we stay aligned with the open data ecosystem. The Special:FilePath URL always points to the latest version, and our cache ensures that popular files are served quickly after the first request.
3. Audio Playback UX
One‑Tap Playback
In the word detail card, each Form with an associated audio file displays a speaker icon. Tapping the icon triggers playback immediately, with no page reload or navigation.
jsx
<button onClick={() => playAudio(Form.audioUrl)}>
<SpeakerIcon />
<span>{Form.representation}</span>
</button>
Visual Feedback
While the audio is loading, the icon shows a spinner. While playing, it changes to a stop icon, allowing users to interrupt playback. All audio is handled through a central AudioPlayer service that ensures only one file plays at a time.
Offline Caching for Favorites
When a user favorites a word, we proactively cache its audio files in IndexedDB. This ensures that favorite words are fully usable offline, a critical feature for users in areas with unreliable internet.
javascript
async function cacheAudioForFavorite(wordId, audioUrl) {
const response = await fetch(audioUrl);
const blob = await response.blob();
await db.favoriteAudio.put({ wordId, blob, timestamp: Date.now() });
}
Graceful Degradation
If an audio file fails to load due to network issues, missing file, or transcoding failure, the icon remains visible but shows a tooltip explaining the problem. The dictionary remains usable even when audio is not available.
4. The Audio Index: Fast “Has Audio” Filtering
The Performance Problem
The Gballi browser includes a filter toggle: “Has Wikidata Form Pronunciation Audio”. Checking this box should instantly show only words that have at least one recorded pronunciation.
A naive implementation would scan every Lexeme’s Forms on each filter toggle, iterating over 11,000 words and their associated Forms, checking for audioUrl properties. This would be too slow for real‑time interaction, especially on mobile devices.
The Solution: Pre‑built Index
During the harvest process, we build an audio index: a simple array of Lexeme IDs that have at least one Form with a P443 recording.
javascript
// audio-index.json
[ "L307875", "L308234", "L309871", ... ]
This file is loaded once at sync time and stored in memory. When the user toggles the filter, we simply check whether the current Lexeme’s ID is in this set, an O(1) operation.
Filter Logic
javascript
const hasAudioFilterEnabled = true;
const filteredLexemes = allLexemes.filter(Lexeme =>
!hasAudioFilterEnabled || audioIndex.has(Lexeme.wikidataId)
);
This pattern appears throughout the dictionary: pre‑compute expensive lookups during the harvest, then use simple set membership checks at runtime.
5. Implementation Details
Worker Route for Audio Proxy
Here is the simplified worker route that handles audio transcoding:
javascript
if (request.method === 'GET' && url.pathname === '/audio-proxy') {
const originalUrl = url.searchParams.get('url');
if (!originalUrl) return new Response('Missing url', { status: 400 });
// Check R2 cache first
const cacheKey = `audio-cache/${hash(originalUrl)}.mp3`;
const cached = await env.dict.get(cacheKey);
if (cached) {
return new Response(cached.body, {
headers: { 'Content-Type': 'audio/mpeg' }
});
}
// Fetch original OGG
const oggResponse = await fetch(originalUrl);
if (!oggResponse.ok) {
return new Response('Audio not found', { status: 404 });
}
// Transcode OGG to MP3 (using FFmpeg WASM or external service)
const mp3Buffer = await transcodeOggToMp3(await oggResponse.arrayBuffer());
// Cache in R2
await env.dict.put(cacheKey, mp3Buffer, {
httpMetadata: { contentType: 'audio/mpeg' }
});
// Return MP3
return new Response(mp3Buffer, {
headers: { 'Content-Type': 'audio/mpeg' }
});
}
Audio Service on the Frontend
The frontend audio service ensures only one file plays at a time and handles iOS detection:
javascript
class AudioPlayer {
constructor() {
this.current = null;
this.needsTranscode = /iPad|iPhone|iPod/.test(navigator.userAgent) ||
(/^((?!chrome|android).)*safari/i.test(navigator.userAgent));
}
async play(url) {
this.stop();
const finalUrl = this.needsTranscode
? `https://dagbanli-harvest-worker.workers.dev/audio-proxy?url=${encodeURIComponent(url)}`
: url;
const audio = new Audio(finalUrl);
audio.play();
this.current = audio;
audio.onended = () => { this.current = null; };
}
stop() {
if (this.current) {
this.current.pause();
this.current = null;
}
}
}
Conclusion
Audio transforms the dictionary from a reference tool into an oral history archive. Every recording on Wikimedia Commons is a native speaker preserving their pronunciation for future generations. By building a pipeline that respects Wikimedia Commons as the source of truth while transcoding on‑demand for incompatible browsers, we ensure that these recordings reach the widest possible audience.
The audio index demonstrates a pattern we have repeated throughout the dictionary: pre‑compute expensive operations during harvest, store the results in lightweight lookup tables, and use them for instant filtering and search at runtime.
In the next post, we will dive into how we made the entire dictionary work offline, syncing the full dataset to IndexedDB, handling version updates, and building a resilient offline‑first experience.
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation