Framework for Web Application

Published in IEEE Advance Computing conference 2009

Framework for Web Application Internationalization and Localization Supporting Indian Languages

Jalindar Baban Karande M. L. Dhore Sandip R. Shinde

Abstract—This paper under lines multilingual nature of India by analyzing census data of the country and need for development of multilingual software systems. This paper reviews current technology used for internationalization and localization and their limitations for Indian society. This paper proposes a framework for localization of web applications in Indian languages. The proposed framework is based on characteristics of Indic scrips and Unicode.
I. INTRODUCTION
Traditionally, Web sites have been developed for the English speaking community, with only a limited attempt to develop web sites for other languages. Today most of the functionality that traditional desktop based software systems are providing is being replaced by web based online applications. Organizations are expanding their business to global market. However, globalization of business needs web applications to be localized to a language and cultural environment of the users[1]. India has 22 officially recognized languages: Assamese, Bengali, English, Gujarati, Hindi, Kannada,Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu, Urdu, Bodo, Maithili, Dogri and Santhali. If Web application developed for American or any other community has to localize for Indian community, needs to be localized for at least 22 languages. Most of the online users of any web application communicates with each other using their languages. In countries like America nearly all people communicate in the same language, but situation is different in India. Fig 1 shows language distribution in India.
Even, there is no state in India where every citizen speaks the same language. For example Fig 2 shows laguage distribution in Maharashtra state in India.
Fig. 1. Distribution of popullation by Languages in India

Fig. 2. Distribution of popullation by Languages in Maharashtra
II. EXISTING FRAMEWORKS
Starvos Kokkotos has proposed architecture for development of internationalized software called ISDAi[2], which is highly modular design and costly for implemntation as it requires to built up different libraries and configuration files. Terence Parr [3] had proposed XML based string template for localization of strings and other data types like currency, date, time etc. using locale. N. Anbarasan[4] detailed on localization process for web in Indian languages and several issues related with localization process. Valentina Dagient proposed framework for internationalization of open source[5] as most of the open source projects are localized in different languages and culctures. Jesus Cardenosa et. al.[6] proposed approach for localization of existing software without changing source code. Most of the existing frameworks are based on Roman script and cases of multiple scripts are not considered.
III. INDIC SCRIPT
S. P. Mudur et. al[7] explained several properties of Indic scripts and how this properties made software development difficult in Indian languages. Fortunately there are some properties of Indic scripts like phonetic nature and most logical writing system which can be used positively for software development in Indian languages.
India is a multilingual country with 22 recognized official languages and over 6000 dialects. Of these languages, some are of PersoArabic origin with script and writing rules similar to Arabic. Most of the others have same writing rules derived from the ancient Brahmi script, even if their scripts are totally distinct. Panini’s phonetic classification of the Indian alphabets into vowels (V) and consonants (C) serves as a common base for all Indian languages of Bhrahmi origin. It also provides us with a unique encoding for any word in the language. There are differences in their written forms, as different letter shapes and different shaping rules get used. The methodology of combining these two basic groups (C and V) to form various syllables is in itself a unique and scientific approach, common to all the Indian scripts.
IV. PROPERTIES OF UNICODE In previous section we briefed common features of most of the Indian scripts. Fortunately these common features are preserved in Unicode system also. Consonants with same pronunciation are having same offset from starting point of respective script. Same thing is applicable to the vowels. Following table shows example of offset of one set of consonants.
Similarly offset for vowels and numbers are same in most of the languages.
Hindi Unicode Hindi offset Gujrathi Unicode Gujrathi offset Telugu Unicode Telugu offset
k 0915 21 0A95 21 0C15 21
K 0916 21 0A96 21 0C16 21
g 0917 21 0A97 21 0C17 21
G 0918 21 0A98 21 0C18 21
w 0919 21 0A99 21 0C19 21
TABLE I UNICODE AND OFFSET FOR ONE SET OF CONSONANTS
Offset for consonants are not same for Dravidian script
Tamil, but same in other Dravidian script i.e. Telugu, Malayalam, and Kannada.
V. PROPOSED MODEL

Fig. 3. Proposed Model for Localization of web in Indian Language
Fig 3 show proposed model for localization of the web application in Indian languages. This proposed model deals with static text and dynamic text differently. Dealing with static text has few similarity with Jesus Cardenosa et. al.[6] approach. Jesus Cardenosa et. al. [6] approach uses langID attribute of localization to identify language for which resource file is localized E.g.
In proposed model we use separate file for every language. The langID attribute is removed from localization tag. We identify language for which this resource is localized, by file name itself. We use convention <filename>..xml for naming resource file. Eg. Home page of a web application “Home.aspx” has to be localized for Hindi and Marathi language and Indian culture, we store all static text strings in file Home.aspx.HNIN.xml and Home.aspx.MRIN.xml respectively. To find localized UI string we need to append HNIN.xml or MRIN.xml to a file name for which we are searching localized strings, so time required for parsing file to find particular language resources is reduced. Here static text does not mean static part of the web application. It is the text which is having static contents. This include following:
Text displayed on web application.
Messages displayed at runtime, but having static contents
e.g. Error messages, Prompts to user, guide line messages etc. URL of the images displayed on the web page as different
culture may requires different version of some images. Dynamic text is treated in two separate ways. Text like name of person, place are not translated. This is transliterated from one script to other. In proposed approach transliteration from one Indic script to other is carried out using properties of Unicode discussed in previous chapter. As offset of most consonant and vowels are same in most Indian scripts, we proposed following formula for transliteration.
chn = cho So + Sn
where chn is Unicode of character in new script, cho is Unicode of character in old script, Sn is starting of Unicode for new script and So is starting of Unicode for old script.
To transliterate text from English(Roman script) to any Indic script or vice versa requires alternative way as Indic scripts are phonetic based whereas Roman script is not phonetic based. We prefer to use lookup table for transliteration between Roman script to Devnagari and vice versa. To transliterate text from other Indic script to Roman script and vice versa, we use Devnagari as intermediate language. This approach works well with most of the names but not all. To take care of correct transliteration system should keep database of previously transliterated strings. User may notices some error in transliteration. Transliteration completed with assistance of user is stored in database and referred in future transliteration.
For translation of other dynamic text, lexicon server or multilingual wordnet[8] sever is used. This multilingual dictionary is prepared for words in the context of application to reduce the size of dictionary, time to search and confusions because of same word in different context for different meaning. Multlingual Dictionary can be sliced to to produce small bilingual dictionary in XML file and ported to the client to perform translation at the client side. Transliteration is also performed at client side if client has sufficent processing power to reduce load at web server.
VI. INPUT METHOD EDITOR
The most critical problem in multilingual web application development is keyboard support for multiple languages. Keyboard codes can be embedded into web application if web application is supporting only one language, becomes complicated in case of multiple languages. One solution is to use operating systems keyboard if operating system is localized to user’s language. But if user’s operating system is not localized to his language, this option is of no use. Third party IME (Input Method Editors) are available in most of the languages e.g. BarahaIME 1.0 supports eleven languages. But this gives additional overhead on user to download and install third party IME.
Currently authors are using javascript based online keyboard to support input methods for users. This keyboard can be localized to different languages with little effort. authors are working on design of text box web control with keyboard layout which supports most of the Indic scripts and localized to any Indic language without change in source
VII. CONCLUSION
Localization of web applications is very necessary for every nonEnglish speaking country. Localization of web application in India has more problems than any other country due to multilingual culture of the country. Few people had tried to localize web applications to Indian languages but still 100% localization is not possible. Existing frameworks take care of static text only. Also existing framework not considers multilingual software development. Proposed model will help to localize web application and requires minimum human support. Proposed model also take care of dynamic text and multilingual software development.

ACKNOWLEDGMENT
 The authors would like to thank Prof. S. G. Pukale, Prof. P. S. Dhabe and Prof. N. Z. Tarapore for their valuable suggestions and guidance during the work. The authors would like to thank the staff members of the Department of Computer Engineering of Vishwakarma Institute of Technology, Pune, India for their support and cooperation.

REFERENCES
[1] J. Hogan, C. HoStuart, and B. Pham, “Key challenges in software internationalisation,” in AustralianWorkshoponSoftwareInternationalisation2004, vol. 32, 2004.
[2] stavros kokkotos and constantine spyropoulos, “An architecture for designing internationalized software,” in SoftwareTechnologyandEngineeringPractice, pp. 13–21, July 1997.
[3] T. Parr, “Web application internationalization and localization in action,” in 6thInternationalconferenceonWebengineering, vol. 263, pp. 64 – 70, ACM New York, NY, USA, 2006 2006.
[4] N. Anbarasan, “Software localization process and issues,” TamilInternet, 2003.
[5] V. Dagient and R. Laucius, “Internationalization of open source: Framework and some issues,” in IEEE2ndInternationalConferenceonInformationTechnology:ResearchandEducationITRE, IEEE Computer Society, 2004.
[6] J. Cardenosa, C. Gallardo, and A. Martin, “Internationalization and localization after system develpment: A practical case,” InternationalJournal”InformationTechnologiesandKnowledge, vol. Vol.1, pp. 121– 127, 2007.
[7] S. P. Mudur, N. Nayak, S. Shanbhag, and R. K. Joshi, “An architecture for the shaping of indic texts,” ComputersandGraphics, vol. 23, pp. 7–24, 1999.
[8] J. Ramanand, A. Ukey, B. K. Singh, and P. Bhattacharyya, “Mapping and structural analysis of multilingualwordnets,” in BulletinoftheIEEEComputerSocietyTechnicalCommitteeonDataEngineering, IEEE Computer Society, 2000.

2 comments:

  1. I thought finding a company that will help me provide all the services that all small business need is a tough one. Thank you for enlightening us about multilingual web development.

    Multilingual websites

    ReplyDelete