Lodi Valley News.com

Complete News World

An AI model can guess emotions by analyzing the tone of our voice

An AI model can guess emotions by analyzing the tone of our voice

A person's tone of voice can say a lot about his feelings. If it's easy for us humans to notice in a conversation with someone, can AI do the same? This is what researchers from Germany tried to answer.

Read more:

In one study, experts compared the accuracy of three models of… Machine learning Recognizing different emotions in audio samples with sounds of different pitches. The article was published in the magazine Frontiers in psychologyIt can be read in full here.

“We can show that Machine learning “It can be used to recognize emotions in audio clips no longer than 1.5 seconds,” said one of the study's authors, Hans Demerling, a researcher at the Center for Lifespan Psychology at the Max Planck Institute for Human Development.

“Our models achieved human-like accuracy when classifying nonsensical, emotionally-toned phrases spoken by actors,” Demerling added.

Emotions
Photo: Studio Brustock/Shutterstock

A machine that listens to human emotions

  • In this study, researchers extracted nonsense sentences from two data sets — one Canadian and one German.
  • These samples allowed them to investigate whether… Machine learning It can accurately recognize emotions regardless of language, cultural nuances, and semantic content.
  • Each clip was shortened to 1.5 seconds, which is how long it takes humans to recognize emotions in speech.
  • It is also the shortest possible vocal duration at which overlapping emotions can be avoided.
  • The emotions included in the study were: joy, anger, sadness, fear, disgust, and neutral tone of voice.

The training data obtained in the study allowed the researchers to create training models Machine learning Which worked in three ways:

  • Deep Neural Networks (DNN): Complex filters analyze components of sound such as frequency or pitch — for example, when a voice is louder because the speaker is angry — to identify underlying emotions.
  • Convolutional Neural Networks (CNN): They look for patterns in the visual representation of a soundtrack, in the same way they identify emotions based on the tempo and texture of the sound.
  • Hybrid Model (C-DNN): It blends both techniques, using both the audio and visual spectrum to predict emotions. The models were then tested for effectiveness on the two data sets.

Despite the findings and advances made by the study, the researchers also pointed out some limitations. For example, the sample sentences used may not convey the full range of genuine and spontaneous emotions.

It is also concluded that, in future work, there is a need to study audio clips lasting more or less than 1.5 seconds, to discover the ideal duration for emotion recognition.

Artificial intelligence speaking illustration
(Image: Artemisidiana/Shutterstock)