Here are the statistics of Indic language Wikipedias for the month of October 2011. The data for this report is taken from http://stats.wikimedia.org/
I have restructured my report to make it shorter and easier to read and compare – but without losing any of the data points. I have divided it into Quality of Projects, Community Building, and Readership.
NOTE: I have used the Indian way way of denoting large numbers: Crore is equal to 10 million, and Lakh is 100,000.
In the table below are new users who have edited at least 10 times, existing editors with at least 5 edits in that month, and existing editors with more than 100 edits in that month. Once again, it is essential to look at all three numbers in connjuction with each other.
Something that I have been reflecting on is how even in relatively small communities (which is what almost all Indic communities are) there is still a relatively low number of new users coming on board and a very tiny number of editors have edited more than 100 times. The former is self-evident as a problem because it means we need to do so much more to encourage new editors. The latter is worrying because it means we also need to do much more to encourage editor retention as well as editor motivation.
Malayalam and Tamil have the healthiest position on this table – across all three parameters and looking at progress month-on-month. This is most probably because of the strong efforts at community building in both communities. It is really important that these communities continue to build on their strong foundations.
I am particularly excited about two languages in this list. Both Marathi and Bengali editor counts have increased across all parameters and that is very encouraging. They are large languages with massive potential. I am also really hopeful that the Marathi media coverage around last month’s WikiConference is going to support the community as they go about encouraging and supporting new and existing editors.
Overall, though, it must be said that the total number of new editors coming to new Indic wikipedias is low. So focus need to be on bringing new editors to wiki and retaining existing users.
Quality of Projects
The following table gives data on the total number of articles, new articles per day, and the number of edits per article on each language wikipedia as of October 31, 2011. The data for September 30, 2011 is also given to provide a comparison point on progress over the past month.
These three numbers should be read in conjunction with each other. Theoretically, the number of edits per article can be taken as a surrogate for the quality of that article (based on the assumption that more edits on an article increases its quality.) To that extent, it would be good to have large numbers in all three – and not only one or two of them. For instance, if there is a large number of articles but a low number of edits, it might point to either poor article quality or high use of bots.
(For context the number of speakers of each language is provided in the first table.)
The Tamil and Sanskrit Wikipedias are showing encouraging growth in the number of articles. The Sanskrit Wikipedia’s growth is particularly laudable given the tiny number of speakers and the efforts the community is putting into increasing community strength. Assamese is a relatively tiny Wikipedia, but is growing with increasing edits per article. Malayalam has the highest consistent number of edits per article – which speaks well of the kind of collaborative editing environment that is being fostered by Malayalam community.
Hindi with more than 1 lakh articles continues to be on the top of the article count list. However, article growth is relatively low and number of edits per article is also low. A reason for this is probably the low number of active contributors (see next section). One of the reasons might be (and this is something that I have mentioned in various channels in the past) the relatively higher emphasis on increasing article numbers to reach the 1 lakh article milestone instead of putting greater emphasis on community building. It is also similar to what is happening on Newari Wikipedia which has nearly 70,000 articles but with an average of just 4 edits per article and with no increase in article count.
The Pali, Bishnupirya Manipuri, Newari, Bhojpuri and Sindhi communities have been inactive as far new articles are concerned.
The following table gives a rough estimate on the number of readers for each language wikipedia.
Given the huge speaker base of Hindi, it is not surprising that the Hindi Wikipedia continues to be on top with close to 87 lakh readers. Marathi also has a huge readership of 59 lakh. News coverage during WikiConference India has helped the Marathi Wikipedia to increase its readership. The near doubling of readership for Malayalam can probably be attributed to a recennt update in Google search (canonical equivalence of atomic chillus and non-atomic chillus).
The above table shows a very interesting trend. The readership on Indic language wikipedias is much higher than most people think – and is growing dramatically! Despite being involved in Indic languages for quite a while now, I was personally amazed to know that there are 4.3 crore readers across all Indic languages in October. Even more amazingly, this number grew by 1.2 crores in October over September!
This data is particularly important because at many times, Indic language Wikipedians feel a sense of dissatisfaction because there is a perception that there aren’t enough readers. By looking at the above statistics itself we know that perception is not true. So we have large number of readers waiting to read what we write. But do we have enough community members to work on different language wikis? The answer is not encouraging. We should be able to convert a small of our readers into editors. For that we all need to work together.
Mail me at email@example.com if you have any queries related to this post or about Indian language wiki projects.
Consultant, Indic language Initatives, WMF India Programs