Data Science Lab in Pakistan makes Urdu-Hindi DictionaryPakistan
With the advent of smart keyboards, the Urdu script has arrived in mobile phones. However, its application remains tedious at best.
Most people resort to writing in Urdu by using the English script, which is also known as writing in Roman Urdu.
The Data Science Lab (DSL) at Lahore’s Information Technology University has made an application called the Urdu-Hindi Dictionary which can help overcome this language barrier and offers a powerful custom search that will enable you to find the translation of any word in Arabic, Urdu, Hindi or English.
More than 542 million people on the planet speak either Hindi or Urdu which makes them one of the most common languages spoken.
“Due to its popularity, Roman Urdu and Roman Hindi is emerging as a language so the Urdu-Hindi Dictionary has a lot of potential,” says Dr. Faisal Kamiran, director of the Data Science Lab.
“Our research is mainly about understanding this Roman Urdu text which is used very frequently in SMS, Facebook, Tweets and other forms of social media,” says Dr. Kamiran.
“It becomes a bit more complicated because there are several variations of the same word in Roman Urdu. The same word ‘kursi’ can also be written as kursee or even kurrsi.”
The application can be used to translate from Roman Urdu to English, Roman Hindi to English, and even English to Urdu, Hindi and Arabic.
Each category provides you with the translated word along with its definition, synonyms and verbs.
The Urdu-Hindi Dictionary offers a simple user interface (UI) with more than 10 ten different categories for user convenience such as fruits, vegetables, animals, profession, elements, and tools.
In order to retain the user’s interest, the application offers timed quizzes to test their vocabulary.
The application is currently available for download on Google Store.
It was relaunched on 11th November, 2016 and has already had 30,000 downloads on Android and 8,000 downloads on iOS.
According to the Data Science Lab, there were several challenges involved in making this application.
“One complication was that there is no standard for writing in Roman Urdu and quite a few variations of the same word existed,” says Dr. Kamiran.
Also read: No longer lost for words: How researchers rediscovered the mother of all mother tongues
The members of the Data Science Lab wrote a research paper that handled this variation and came up with an algorithm that would understand that all these different spellings still mean the same word.
“A person using this application could write the word “kursi” with any slight difference in spelling and the application would still understand what the user is trying to say and give them the translation of the word in their selected language,” he added.
The Citizen Feedback Monitoring Program (CFMP), a project of the Punjab Information Technology Board, is looking closely into this application for future use.
CFMP is a popular initiative by the Punjab government to fight corruption and get feedback from citizens who are utilising public services such as property registration or getting driver’s licenses.
When this program gets feedback from citizens, a lot of times the responses are in Roman Urdu.
The CFMP team had to spend a lot of money checking the feedback and trying to figure out what is being said in the SMS.
In addition to the long processing time, CFMP was spending Rs6 million per annum to classify all the feedback that they were getting into different categories.
The amount of time and resources being spent was a major hurdle in the sustainability of the program. In 2015, CFMP had to curtail the program and look for other means of connectivity such as robocalling.
However, with the use of this application, which has an accuracy of more than 71 per cent, translation can become easier.
According to the Analyst Team at CFMP, if the Urdu-Hindi app is implemented instead of the existing infrastructure, it could possibly save CFMP Rs30m over the next five years.
An application that can translate Roman Urdu into English can serve as a teaching tool and can be used to help empower the low-literate population.
The Urdu-Hindi dictionary also has many applications in social media mining.
A language that most people are comfortable using can be analysed and used in predicting election polls, sentiment analysis, review of emerging topics and analysis of government or private projects and services.
According to Dr. Kamiran, standardising a language presents many difficulties but someone will have to start doing it and the Urdu-Hindi dictionary is a step in the right direction.
“We also need to mine Roman Urdu because otherwise you are losing the language of the people,” says Dr. Kamiran. “Currently, we have a stable lexicon with us but our aim in the future is to go even further and make a translator.”
The article originally appeared on MIT Tech Review Pakistan and has been reproduced with permission.