Training AI to Understand Human Language


Supported by accelerating advancements in spoken language technologies, communication between humans and machines has never been easier. Spoken language technologies are incredibly useful in everyday life, but they also offer significant value in medical applications, a direction in which Professor Helen Meng is at the forefront of exploration. Using artificial intelligence and deep learning, she enables computers to analyse human speech and language, as well as extract key encoded information, including emotions, language proficiency and health. Her work has harnessed the power of spoken language technologies for everything from correcting pronunciation defects to screening neurocognitive disorder.

How Spoken Language Processing Can Improve Medical Outcomes

Today, smart voice assistants can help people with all sorts of things, replying to messages, arranging schedules and even recommending films, music and books. In recent years, speech and language technologies have also been used in medical examinations, and Professor Helen Meng is a pioneer in the field. In 2019, she and her team began the research project “Artificial Intelligence in Extraction and Identification of Spoken Language Biomarkers for Screening and Monitoring of Neurocognitive Disorders”, which aims to extract information about cognitive disorders hidden within spoken language.

“Spoken language data collected from cognitive assessments show that people with better cognitive functioning are able to communicate efficiently and effectively.  On the contrary, those with relatively lower cognitive functioning show reduced communication abilities. We are developing a mobile app to screen cognitive disorders by analyzing conversational features with AI technology. We have designed data collection protocols to support our efforts in collecting spoken language of older adults.”

Professor Meng began her research on the application of AI to medicine and healthcare as early as 2013, and founded the CUHK Stanley Ho Big Data Decision Analytics Research Centre the same year. In 2020, she helped establish the Centre for Perceptual and Interactive Intelligence with the support of the InnoHK Research Clusters, aiming to exploit big data for machine learning and conduct a variety of forward-looking research on AI.

She has also developed a computer model to study special features in Cantonese pronunciation and rhythm, helping enhance communication of patients suffering from dysarthria (a condition due to stoke, cerebral palsy or other causes) by reconstructing disordered speech to normal-sounding speech, using AI. “Dysarthric patients struggle a lot with communication. AI-powered speech reconstruction technology offers the potential to help them communicate a lot more easily.” The technology won the Open Group Championship of the SciTech Challenge 2021, a prominent competition in Hong Kong designed to nurture innovators and drive technology adoption.

Professor Meng’s team developed mobile app CogApp, which uses speech recognition technology to collect and analyse elderly people’s speech to detect early dementia.

Correcting Pronunciation with AI

Before the advancement of speech recognition and speech synthesis technologies, communication between people and machines was limited to using keyboards, computer mice and touchscreens to translate human language into commands or codes. For computers, speech is a digital signal, comprising the three dimensions of frequency, time and energy. When a computer “hears” speech, it examines the special features in these three dimensions.

“When I was studying in the U.S., I started thinking about how to make human-machine communication as natural as human-human communication,” said Professor Meng. “It sounds like a pipe dream, but research is about overcoming challenges to realize innovative ideas. We have to undergo continuous experimentation, undeterred, before achieving a goal.” In recent years, AI and deep learning have raised the performance of many different kinds of spoken language technologies, such as speech recognition, natural language processing, speech synthesis, and more. These technologies can attain human parity in a variety of contexts and conditions.

In 1997, Professor Meng came back to Hong Kong to interview Professor Sir Charles K. Kao, former Vice-Chancellor and President of CUHK, and other scholars. She was inspired by the experience and the linguistic environment of Hong Kong, seeing it as the best place to research speech recognition technology.

After completing her Ph.D. thesis in the Massachusetts Institute of Technology (MIT), Professor Meng continued with her research as a Research Scientist at MIT.  In addition, she was also invited to participate in the MIT Industrial Performance Center’s project Made By Hong Kong. The project involved interviewing top industrialists and scholars on their views about the future development of innovation and technology in Hong Kong.  Among the interviewees was Professor Sir Charles K. Kao, then Vice-Chancellor and President of CUHK.  It was through these interviews that she identified the tremendous research opportunities in Hong Kong particular to the field of speech and language technologies. “The biliterate and trilingual society, together with dynamic accent varieties in Hong Kong English and British English, present a wonderful context for research,” Professor Meng continued. 

Upon joining CUHK, Professor Meng established the Human-Computer Communications Laboratory (HCCL) in her department. HCCL supports research in novel technologies including computer speech recognition, speech synthesis and computer reading comprehension. Drawing knowledge from disciplines like psychology, linguistics and medicine, the Laboratory is collecting a large database of English speech with a Cantonese accent, and developing an AI system that can recognize, identify and correct mispronunciations.

“Our AI system can recognise common pronunciation patterns in Chinese-accented English that deviate significantly from native English productions. The system can detect and diagnose mispronunciations, and offer feedback to the learner through automatic synthesis of the correct pronunciations.  “Such feedback can help inform the learner during pronunciation practice,” explained Professor Meng.

Powered by this technology, Professor Meng’s students founded a company, SpeechX, and developed an English learning system.  The applications developed by SpeechX can effectively support learners in acquiring English as a second language.  At present, the application is being accessed more than a million times per day.

“Human speech is like magic. There is an abundant amount of information in speech for computers to analyze, not only in semantics but also sentiments, health and even education levels. My goal is to harness AI to decipher the information for the well-being of mankind.”

— Professor Helen Meng

Patrick Huen Wing Ming Professor of Systems Engineering and Engineering Management

Research areas: Multilingual Speech and Language Processing, Multimodal Human -Computer Interactions

Major achievements:

  • First Place Award in Shared Task and Best Overall Team, DialDoc@ACL (2022)
  • Most Successful Women Award (2022)
  • Open Group Champion, SciTech Challenge (2021)
  • Outstanding Women Professional Award (2017)
  • International Speech Communication Association (ISCA) Fellow (2016)
  • Microsoft Research Outstanding Collaborator Award (one of 32 academics worldwide) (2016)
  • IBM Faculty Award (2016)
  • IEEE Fellow (2013)
  • Scientific and Technological Progress Award, Higher Education Outstanding Scientific Research Output Awards, the Ministry of Education, China (2009)