India turns to AI to capture its 121 languages

But few of these languages are covered by natural language processing (NLP), the branch of artificial intelligence that enables computers to understand text and spoken words

New Update
India turns to AI to capture its 121 languages

For a few weeks this year, villagers in the southwestern Indian state of Karnataka read out dozens of sentences in their native Kannada language into an app as part of a project to build the country's first AI-based chatbot for Tuberculosis.

There are more than 40 million native Kannada speakers in India, and it is one of the country's 22 official languages and one of over 121 languages spoken by 10,000 people or more in the world's most populous nation.

But few of these languages are covered by natural language processing (NLP), the branch of artificial intelligence that enables computers to understand text and spoken words.

Hundreds of millions of Indians are thus excluded from useful information and many economic opportunities.

The villagers in Karnataka are among thousands of speakers of different Indian languages generating speech data for tech firm Karya, which is building datasets for firms such as Microsoft and Google to use in AI models for education, healthcare and other services.

The Indian government, which aims to deliver more services digitally, is also building language datasets through Bhashini, an AI-led language translation system that is creating open source datasets in local languages for creating AI tools.

The platform includes a crowdsourcing initiative for people to contribute sentences in various languages, validate audio or text transcribed by others, translate texts and label images.

Tens of thousands of Indians have contributed to Bhashini.

Latest Stories