Batch transliterating names into Kannada using Google API

Some times work at Janaagraha throws awesome challenges. Like as part of BEST project we are cleaning up voters list. Voters list in Karnataka will have names in both English and Kannada, Most of the volunteers have filled up only in English and hence we were left with transliterating names into Kannada. I was thinking about automating it. After all transliterating is not as complex as translation, right? Wrong. Its difficult to write one specially when there are so many spelling variations in English for the same name in Kannada.
Like for example both Sreenivas and Srinivas are ಶ್ರೀನಿವಾಸ್ in Kannnada. I found Google transliteration does that pretty well. But they have only Javascript APIs for web pages but nothing for server side code.

But Google worked and I found a non-public API of Google Transliteration API which gives JSON output for a given english input. Cooked up API in PHP to clean up JSON and give an array of results. Code is github for obvious reasons.

$kn = transliterate("thejesh,ramesh,
print $kn[0];

Probable drawbacks:
1. Its a non-public API provided by Google. Not sure when they will block it.
2. As of now it can transliterate only 5 words at a time
3. No information about API rate limiting. So be on the safer side.

Let me know what do you think.

3 Responses

  1. Hi Thejesh, I am a regular reader of your blog here. I wrote a python module for the same. Sharing it with you as you recently started liking python;)

  1. December 18, 2013

    […] Thejesh’s experience using the Google Transliteration API helped but the extent of changes ensured that we had to put in a lot of effort into the project. A combination of the Google Translate API run on our database helped automate the transliteration effort to an extent but we did have to mark about 50% of the transliterated text for manual verification. The manual verification was required because the same word in English script could be written in different ways in Hindi. For example, “bahar” could be बाहर or बहार; “to” could be either तो or टु. We also had to spend some time translating the static pages, literals and messages but that was a piece of cake compared to the database. […]

  2. February 13, 2014

    […] API to do that. But as far the album/song titles etc we had to do transliteration. I used my old script to that but the quality of the transliteration wasn’t great. So we kind of had to create our own […]