Physiological signals could be the key to “Em

image: Multimodal neural network is used to predict user sentiment from multimodal features such as text, audio and visual data. In a new study, Japanese researchers factor physiological cues into sentiment estimation while talking with the system, dramatically improving system performance.
to see Continued

Credit: Shogo Okada of JAIST.

Ishikawa, Japan — Speech and language recognition technology is a rapidly developing field, which has led to the emergence of new voice dialogue systems, such as Amazon Alexa and Siri. An important step in the development of dialogue artificial intelligence (AI) systems is the addition of emotional intelligence. A system able to recognize the emotional states of the user, in addition to understanding language, would generate a more empathetic response, leading to a more immersive experience for the user.

“Multimodal Sentiment Analysis” is a group of methods that are the gold standard for an AI dialogue system with sentiment detection. These methods can automatically analyze a person’s psychological state from their speech, voice color, facial expression, and posture and are crucial for human-centric AI systems. The technique could potentially realize an emotionally intelligent AI with beyond-human capabilities that understands the user’s feeling and generates a response accordingly.

However, current methods for estimating emotions focus only on observable information and do not take into account information contained in unobservable signals, such as physiological signals. Such cues are a potential goldmine of emotions that could significantly improve sentiment estimation performance.

In a new study published in the journal IEEE Transactions on Affective Computingphysiological cues have been added to multimodal sentiment analysis for the first time by Japanese researchers, a collaborative team consisting of Associate Professor Shogo Okada of the Japan Advanced Institute of Science and Technology (JAIST) and Professor Kazunori Komatani of the Scientific and Industrial Research Institute at Osaka University. “Humans are very good at hiding their feelings. A user’s internal emotional state is not always accurately reflected by the content of the dialogue, but since it is difficult for a person to consciously control their biological signals, such as heart rate, it can be helpful to use to estimate his emotional level. State. It could create an AI with sentiment estimation capabilities that go beyond the human,” explains Dr. Okada.

The team analyzed 2468 exchanges with a dialogue AI obtained from 26 participants to estimate the level of pleasure felt by the user during the conversation. The user was then asked to rate how enjoyable or boring they found the conversation. The team used the multimodal dialog dataset named “Hazumi1911which uniquely combines voice recognition, voice color sensors, facial expression and posture detection with skin potential, a form of physiological response detection.

“Comparing all separate information sources, biosignal information was found to be more effective than voice and facial expression. When we combined linguistic information with biosignal information to estimate the state internal self-assessment while talking with the system, the performance of AI has become comparable to that of a human,” comments an excited Dr. Okada.

These results suggest that detecting physiological cues in humans, which typically remain hidden from view, could pave the way for highly emotionally intelligent AI-based dialogue systems, enabling more human-computer interactions. natural and satisfying. Additionally, emotionally intelligent AI systems could help identify and monitor mental illness by detecting a change in everyday emotional states. They could also be useful in education where AI could assess whether the learner is interested and excited about a topic of discussion, or bored, which would lead to changes in teaching strategy and more effective educational services.



Original article title:

Effects of physiological cues in different types of multimodal sentiment estimation


IEEE Transactions on Affective Computing



About Japan Advanced Institute of Science and Technology, Japan

Founded in 1990 in Ishikawa Prefecture, Japan Advanced Institute of Science and Technology (JAIST) was the first independent national graduate school in Japan. Today, after 30 years of steady progress, JAIST has become one of Japan’s top universities. JAIST has several satellite campuses and strives to develop competent leaders with a state-of-the-art educational system where diversity is key; about 40% of its alumni are international students. The university has a unique style of higher education based on a carefully designed and course-oriented curriculum to ensure that its students have a solid foundation on which to conduct cutting-edge research. JAIST also works closely with local and overseas communities by promoting collaborative industry-university research.

About Associate Professor Shogo Okada of Japan Advanced Institute of Science and Technology, Japan

Shogo Okada directs the Computer Modeling Laboratory for understanding and generating multimodal social signal patterns, under the intelligent robotics field of the Japan Advanced Institute of Science and Technology, where he holds the position of associate professor. His research focuses on the construction of a computational model of multimodal social signals using voice signal processing, image processing, motion sensor processing and pattern recognition techniques and the use of multimodal networks to machine learning and data mining applications. He has 79 publications with over 350 citations to his credit. For more information visit:

Warning: AAAS and EurekAlert! are not responsible for the accuracy of press releases posted on EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

About Roberto Frank

Check Also

Plains blizzard heralds unusually cold weather for Lower 48

Comment this story Comment The first major winter storm of the season, which has ravaged …