Home » Columns, Featured, Stats4Good

Saving Endangered Languages with AI and the Science of Communication

1 September 2021 One Comment

David CorlissWith a PhD in statistical astrophysics, David Corliss is lead, Industrial Business Analytics, and manager, Data Science Center of Excellence, Stellantis. He serves on the steering committee for the Conference on Statistical Practice and is the founder of Peace-Work, a volunteer cooperative of statisticians and data scientists providing analytic support for charitable groups and applying statistical methods in issue-driven advocacy.

One of the greatest allies in communicating better in the sciences is science itself. Research in communication, learning, neurology, and more contribute tremendously to understanding how language works and is understood and retained.

As a person who was completely non-verbal for more than three years and struggled to acquire language after being so long delayed, I can speak personally to the impact the science of communication has had on so many lives. Even as this research has helped some gain language, it is also being used to help others at risk of losing their language—and data, analytics, and AI are playing an increasingly important role.

In the past few years, advances in voice recognition and natural language processing have allowed verbal computers to go from visionary to kludgy to commonplace. Enabled by research in the science of communication and data science, AI now plays an important role in understanding language. Chatbots are widespread and becoming more so every day.

Getting Involved
In opportunities this month, it’s time to think ahead to next year’s internships. The fall recruiting season is about to begin, with classes back in session. This affects people in occupations across the analytic spectrum. Personally, I’d love to see more community services organizations get a data science intern for the summer. For example, to help find more supporters, volunteers, and financial contributors! Having a Data for Good community service project as an option for interns can go a long way toward attracting top talent.

One application for this technology is learning new languages and analyzing words and patterns to create a digital—and therefore durable—version of the language, independent of human speakers. This allows rare languages to be digitally captured where native speakers are available, preserving the languages. Using AI to save a threatened language from extinction can save the culture itself.

Of the almost 7,000 languages spoken in the world, as many as 90 percent could disappear in the next 80 years. UNESCO—the United Nations Educational, Scientific, and Cultural Organization—has created an interactive map of threatened and endangered languages. Many are of local, native peoples overrun by outside languages and cultures as indigenous societies disappear. Today, a number of D4G organizations are creating projects and resources to help save these languages from extinction.

Google’s Woolaroo project is a leading example of how technology can be used to save endangered languages. A great example of Data for Good supported by the private commercial sector, this not-for-profit platform provides ground truthing—verified recognition of an object—using a smartphone camera. The technology combines Google Cloud Vision, based on object recognition in the human visual cortex, with AutoML classification algorithms. A photograph of an object is combined with the word for it in the language being preserved. Because it’s open source, anyone can contribute. Starting with Louisiana Creole, Woolaroo now covers 10 languages, including Yugambeh—an aboriginal language in Australia that had just one remaining native speaker.

Learn more and start contributing on their GitHub site.

Another Data for Good organization saving cultures by saving languages is OBTranslate. Like many translation services, the business side of the company charges fees to translate documents between many languages. Meanwhile, the D4G side of the operation crowdsources contributions from a volunteer community of statisticians, data scientists, language experts, and native speakers. Focused on African languages, the community helps create algorithms and data sets for more than 2,000 languages. OBTranslate also partners with university research teams to develop translation AI.

Both Google’s Woolaroo and OBTranslate offer opportunities for Data for Good volunteers to get involved in saving cultures and saving languages with AI.

A central theme for this column is the many ways people work to “Solve the equation—Save the world!” Because a society’s culture is contained and preserved in its forms of communication, saving endangered languages is one of the most compelling Data for Good stories at a human level. The loss of a language really amounts to a kind of extinction—an entire culture gone from the face of the Earth.

Of course, the best time to save languages is before they are so endangered that there are so few remaining speakers to be found a sample may not be representative. This shows another way statisticians can make a critical difference. Helping save a culture by saving its language takes more than coding. It also takes an understanding of sampling, missing data, and how to identify and mitigate bias. These core statistical skills are every bit as necessary as the ability to write algorithms, and that makes statisticians a vital part of the team.

Our planet is a world of worlds, each unique and worth saving. With thousands of languages at risk and a growing number of organizations working to preserve them, the ways to get involved and possibly save someone’s world are almost endless.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

One Comment »

  • Matt said:

    I don’t want to save languages that few people speak. Having fewer languages spoken worldwide means more people can communicate with one another.