It’s been over a decade since Unicode standard was made available for Odia script. Odia is a language spoken by roughly 33 million people in Eastern India, and is one of the many official languages of India. Since its release, it has been challenging to get more content on Unicode, the reason being many who are used to other non-Unicode standards are not willing to make the move to Unicode. This created the need for a simple converter that could convert text once typed in various non-Unicode fonts to Unicode. This could enrich Wikipedia and other Wikimedia projects by converting previously typed content and making it more widely available on the internet. The Odia language recently got such a converter, making it possible to convert two of the most popular fonts among media professionals (AkrutiOriSarala99 and AkrutiOriSarala) into Unicode.
All of the non-Latin scripts came under one umbrella after the rollout of Unicode. Since then, many Unicode compliant fonts have been designed and the open source community has put forth effort to produce good quality fonts. Though contribution to Unicode compliant portals like Wikipedia increased, the publication and printing industries in India were still stuck with the pre-existing ASCII and ISCII standards (Indian font encoding standard based on ASCII). Modified ASCII fonts that were used as typesets for newspapers, books, magazines and other printed documents still exist in these industries. This created a massive amount of content that is not searchable or reproducible because it is not Unicode compliant. The difference in Unicode font is the existence of separate glyphs for the Indic script characters along with the Latin glyphs that are actually replaced by the Indic characters. So, when someone does not have a particular ASCII standard font installed, the typed text looks absurd (see Mojibake), however text typed using one Unicode font could be read using another Unicode font in a different operating system. Most of the ASCII fonts that are used for typing Indic languages are proprietary and many individuals/organizations even use pirated software and fonts. Having massive amounts of content available in multiple standards and little content in Unicode created a large gap for many languages including Odia. Until all of this content gets converted to Unicode to make it searchable, sharable and reusable, then the knowledge base created will remain inaccessible. Some of the Indic languages fortunately have more and more contributors creating Unicode content. There is a need to work on technological development to convert non-Unicode content to Unicode and open it up for people to use.
There are a few different kinds of fonts used by media and publication houses, the most popular one is Akruti. Two other popular standards are LeapOffice and Shreelipi. Akruti software comes bundled with a variety of typefaces and an encoding engine that works well in Adobe Acrobat Creator, the most popular DTP software package. Industry professionals are comfortable using it for its reputation and seamless printing. The problem of migrating content from other standards to Unicode arose when the Odia Wikimedia community started reaching out to these industry professionals. Apparently authors, government employees and other professional were more comfortable using one of the standards mentioned above. All of these people type using either a generic popular standard, Modular, or a universal standard, Inscript. Fortunately, the former is now incorporated into Mediawiki‘s Universal Language Selector (ULS) and the latter is in the process of getting added to ULS. Once this is done, many folks could start contributing to Wikipedia easily.
Content that has been typed in various modified ASCII fonts include encyclopedias that could help grow content on Wikisource and Wikiquote. All of these need to be converted to Unicode. The non-profit group Srujanika first initiated a project to build a converter for conversion of two different Akruti fonts: AkrutiOriSarala99 and OR-TT Sarala. The former being outdated and the other being less popular. The Rebati 1 converter which was built by the Srujanika team was not being maintained and was more of an orphan project. Fellow Wikimedian Manoj Sahukar and myself used parts of the “Rebati 1 converter” code and worked on building another converter. The new “Akruti Sarala – Unicode Odia converter” can convert the more popular AkrutiOriSarala font and its predecessor AkrutiOriSarala99, which is still used by some. Odia Wikimedian Mrutyunjaya Kar and journalist Subhransu Panda have helped by reporting broken conjuncts which helps in fixing all problems before publishing. Odia authors and journalists have already started using the font and many of them have regular posts in Odia. We are waiting for more authors to contribute to Wikipedia by converting their work and wikifying it.
Recently a beta version of another Unicode font converter for Shreelipi fonts based on Odia Wikipedian Shitikantha Dash‘s initial code is released. It works with at least 85 % accuracy.
Subhashish Panigrahi, Odia Wikipedian and Programme Officer, Centre for Internet and Society
- Quick links:
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?Start translation