Almost Everything You Need to Know About Kannada Transliteration
Wikipedia says Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus trans- + liter-) in predictable ways.
Table of Contents
For Kannada we usually use Latin letters to represent1 Kannada script. Most of the time this conversion is based on phonetic similarity. It sounds simple but not really. There is a lot of confusion as no one learns this formally. Most of us learn by experimenting with phonetic typing or go by intuition. There are others ways to do it too. These are the notes I made while learning about it.
There are quite a few standards or conventions involved from naming the language, to script, to transliteration. Here are the few that matter
- ISO 15924 is a coding system for the representation of names of scripts. For Kannada script it is Knda and number 345. Remember this is for the Kannada script2 and not for the language.
- For Kannada Language representation we use ISO-639 series which is either kn or kan.
- ISO 15919 - is a standard for "Transliteration of Devanagari and related Indic scripts into Latin characters". IIt uses diacritics to map the much larger set of consonants and vowels in Indic scripts to the Latin script. At this point it is the most sophisticated system available only challenged by the popularity and usage of Hunterian transliteration.
- The Hunterian transliteration system is supposed to be the "national system of romanization in India" and the one officially adopted by the Government of India. Initially created for Devanagari, It has improved over time to cover other Indian languages. Though I must say I couldn't find the table for Kannada.
- The International Alphabet of Sanskrit Transliteration (I.A.S.T.) is one of the oldest transliteration scheme for lossless romanization of Indic scripts. Today it is a subset of the ISO 15919. 3. It is not a standard but a convention.
- Indian Script Code for Information Interchange (ISCII) similar to ASCII is a coding scheme for representing various writing systems of India. It is used for encoding Indic scripts and a Roman transliteration. As of now ISCII is replaced by Unicode. What is of interest is roman transliteration. Don't know if anyone uses it.
- National Library at Kolkata romanisation is transliteration scheme used by in dictionaries and grammars books etc. It's also called Library of Congress (USA) version and is very similar to ISO 15919. Here is the Kannada table.
- International Phonetic Alphabet(IPA) is an alphabetic system which uses phonetic notation. It uses Latin characters. Using IPA you can write down how a word is pronounced in any language of the world. It has one to one mapping between sound and letters. Here is the IPA chart for Kannada.
- The Unicode is a standard that provides a unique number for every character on planet earth 4. On web 5 UTF-8 become the most popular way to encode the Unicode character. Unicode Consortium develops and manages these standards.
- I am ignoring other transliteration systems like Harvard-Kyoto, ITRANS, Velthuis, SLP1 etc
At this point in my opinion ISO-15919 seems to be the winner for Kannada Transliteration.
TODO: As far as I know Kannada is very phonetic. So I think there should be a direct match between ISO-15924 and IPA.6
Unicode CLDR and ICU projects
Unicode CLDR aka Unicode Common Locale Data Repository Project by Unicode Consortium that provides data building blocks to support languages. This makes it easy for the software engineers to add internationalization and localization support or build sorting, transliteration features. All kinds of companies small and big use these libraries to add language feature to their products.
ICU (International Components for Unicode) is a set of open source C/C++ and Java libraries providing Unicode and internationalization/localization support for software applications. There are wrappers for other languages like pyICU for python.
The Unicode CLDR provides data and guidelines for transliteration. By default it uses ISO 15919 for transliteration. Internally all Indic scripts are first converted to an internal format called Inter-Indic and then transformed into target script. As per the guidelines for Kannada it supports
- Romanization or into latin script. This is the most used format
- Transliteration rules from Kannada (Indic) to Latin are reversible except ZWJ and ZWNJ used for rendering
- Provides transliteration into other Indic script (excluding Urdu) like from Kannada Script to Malayalam Script transliteration. In most situations they are reversible.
The Latin - Indic transliteration chart is available if you prefer to use it and there is also just Kannada ISO-15915 mapping available for download in pdf format. I am embedding the same below as its useful.
Personally I use Python so pyICU is my preferred library to use for transliteration7. Below you can find code examples for transliteration that I use the most. From Kannada to Latin and from Latin to Kannada.
from icu import * trans = Transliterator.createInstance("Kannada-Latin") print trans.transliterate(u"ಅಕೌಂಟೆಂಟ್ ನೇಮಕಾತಿ: ಪರೀಕ್ಷೆ ಬರೆದ 8 ಸಾವಿರ ಅಭ್ಯರ್ಥಿಗಳೆಲ್ಲ ಫೇಲ್ !") trans = Transliterator.createInstance("Latin-Kannada") print trans.transliterate(u"nanage kannaḍa barutte")
Print 1# akauṇṭeṇṭ nēmakāti: parīkṣe bareda 8 sāvira abhyarthigaḷella phēl !
Print 2# ನನಗೆ ಕನ್ನಡ ಬರುತ್ತೆ
Transliteration in Kannada is mostly used for inputting (or typing). Given the popularity of Latin (English) keyboard, it's probably the most easiest way to do as well. Since Kannada8 phonetic its easy even for the new users. Being phonetic has a benefit, we can read (pronounce) the transliterated text almost correctly9. You would see most boards using transliterated text for places, names etc. They use the most simplified format and hence sometimes confusions exist while pronouncing. For example
But given correct transliteration10, it can be pronounced correctly to a great degree. But the current romanized spellings are part of official documents11 for a long time now and it's not worth changing. But we need to adopt to this for newer names and boards, like for example movies, songs etc. Specially if you are planning to go international.
So "ಉಳಿದವರು ಕಂಡಂತೆ" becomes "uḷidavaru kaṇḍante"12 and not "Ulidavaru Kandanthe"
How to type Accents and Symbols
It has become easier than before now to type accents and symbols, required for say maḍivāḷa. On Android and iOS - press and hold "a" you will get various choices for a including "ā" in a popup overlay then you can choose. It works on all inbuilt keyboards.
On Ubuntu it's easy using a compose key. Setup your compose key using Preferences -> Keyboard.
To get ā you need to type Compose + - then a,
To get ḷ you need to type Compose + , then l etc
You can find the big list of compose key shortcuts here.
I have not used Windows in a long time. This Microsoft tutorial seems to say it's easy.
- For inputting Kannada text, sign boards etc ↩
- Kannada script is used by languages - Kannada, Tulu, Konkani, Kodava, Sanketi, Beary etc ↩
- IAST came first, so ISO 15919 follows the IAST scheme with minor changes ↩
- At least that is the idea ↩
- Even in other places ↩
- Of course in words ಮಂದ್ರ and ನಂದು, character ಂ sounds differently. There are a few like that. But mostly it should match. ↩
- Also check transliteration, indic-trans projects by LibIndic ↩
- Mostly phonetic ↩
- Exceptions exist like in mandra and nandu n is pronounced differently. Try ಮಂದ್ರ and ನಂದು ↩
- I know we should use IPA for pronunciation. I will write a separate blog post about Kannada TTS. ↩
- Even on OSM where ISO 15919 is used, we stick to official documents. ↩
- Difficult to type but that's the price I think and added trouble of buying a ASCII URL. Nothing is easy! ↩