Strategies to bring more Indic languages on Wikimedia projects: research findings

Translate this post

The project ‘Needs assessment for documentation and revitalization of Indic languages using Wikimedia projects’ concluded recently. Focused on understanding the needs of Indic language communities with regards to digitizing their language and culture, this project involved surveys, interviews, workshops, and hope! The research result revealed several gaps with regards to utilizing wikimedia projects as sites for digital preservation of Indic languages, however it is also a very promising prospect. Native speakers of language and the practitioner of cultures associated with them have to be involved directly to add to ‘the sum of human knowledge’, how that is to happen is what we looked into. 

This research started with the aim to understand the current state of Indic languages and what can be done to increase their representation on the Wikimedia platforms. Before beginning the digitization of a given language, it is essential to understand the needs of the language communities. In the course of the research, we analyzed the specific needs of Indic communities for the digital preservation of the Indic languages via interviews and surveys. We are sharing the learnings of the project and the recommendations below. The executive report can be read here, and the complete report here


We found that there is little awareness among native language speakers (non-Wikimedians) regarding the existence of any Wikimedia projects other than Wikipedia. 89% of survey takers were unaware of Wikimedia projects other than Wikipedia. This needs to be remedied since we cannot expect people to contribute knowledge formats that they are unaware of Wikimedia projects being able to support. 

The text-centeredness of Wikimedia has been pointed out by a survey respondent and several interviewees. They mention unexplained removal of audio-visual content from Commons, lack of understanding of cultures where languages are used majorly for oral communication as some of the issues that make linguistic and cultural inclusivity difficult. Inclination towards Wikipedia, while it can be explained by its brand value, is downplaying the importance of other Wikimedia projects and their immense potential as sites for digital preservation.

Around 70% non-Wikimedian survey respondents chose recording folk songs and folktales as the preferred method to contribute to their language digitally. Several interviewees talked about the need of recording audio-visuals of folk culture. For instance, native speakers of Bodo and Braj language told us that oral culture in their language is fast disappearing, so recording them is pertinent. 

Pramod Rathor, a Braj speaker says: “The varieties of Braj folk songs; Suddas, Languriya, Aalha, Rasiya, Malhaar, Faag etc. sung in rural areas have near to no representation on digital platforms. As people are migrating to urban areas, these forms of songs are being lost, since they are not practiced anymore.”

Linguist Bidisha Bhattacharjee in her essay ‘Role of Oral Tradition to Save Language and Cultural Endangerment’ states, “The oral tradition is a rich source of preservation of cultural heritage and it reflects through the linguistic expression and linguistic variety of people.” 


  1. Moving out of the text-centrism and using Wikimedia projects innovatively: only about 40% of the Wikimedians mentioned that it’s possible to utilize sister projects of Wikimedia for digitization of oral culture.
  2. Citizen archivists have to be promoted to create oral culture content: This research has established the importance of oral cultural and linguistic content, the next step is to put forth the creation of such content in motion. Citizen archivists from the same community or region can capture the orality of the language well. The 1947 Partition Archive has successfully trained individuals and collected more than 10,000 oral histories. A training course similar to the Reading Wikipedia in the Classroom training of trainers might be useful as well. 
  3. Creation of oral culture content relevant for given languages: As mentioned above, oral culture like folk songs are disappearing fast. The Oral Culture Transcription Toolkit provides guidance on utilizing Commons and Wikisource for documenting oral culture content and for its representation in the textual form. However, major improvements in the technical infrastructure will help to make the process easier and involve less hopping between platforms.
  4. Providing needed support to interested individuals: There are certain avenues for supporting individuals and communities with internet, equipment, and mentorship support. Eddie Avila, director of Rising Voices, mentions: “Even if there are not existing activists in one’s language, opportunities for cross-linguistic, cross-regional mentorship between activists from another language can guide and inspire interested individuals.” In the context of Wikimedia, he says: 

“In terms of Wikimedia projects, policies might be tough to understand for those not familiar with the platform, but the mentoring model especially from those from the same language community can help remove some of these barriers to understanding. We have seen examples of how communities are adapting Wikimedia projects based on their own local context and approach to knowledge sharing.”

The complete report for this research project is available here. Click here to read the executive summary of the report. In the course of the project we also conducted a series of workshops in order to test the Oral Culture Transcription Toolkit on field by various participants. The toolkit was improved as per the experiences and needs of the participants and volunteers that were revealed upon utilizing it. We are sharing a short description of the workshops below, read more about them here

Oral Culture and Language Documentation Workshops

We conducted a series of workshops titled Oral Culture and Language Documentation Workshops, it involved participants from India and Bangladesh. During these workshops we introduced the topic of digital preservation of languages, with an emphasis on ground up participation from native language speakers themselves. We took this opportunity to introduce the Oral Culture Transcription toolkit to the participants and edit it as per feedback and observation. There were a total of 4 workshops; 3 online and 1 in-person workshop. The in-person workshop was conducted in collaboration with Central University of Himachal Pradesh.

Last but not the least, we would like to thank the project advisor Daniel Bögre Udell for his guidance and for assistance with the first workshop. Subhashish Panigrahi addressed the participants of the second workshop and shared his experiences and advice on digital preservation of languages, for which we are grateful. We would also like to thank the research interviewees for taking out the time for the interviews and the follow ups, the movement strategy team, and supporters. 

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?