Speech-Based Emotion Recognition
Gao, Ye, Computer Science - School of Engineering and Applied Science, University of Virginia
Stankovic, John, EN-Comp Science Dept, University of Virginia
Many algorithms on speech-based emotion detection that utilize machine learning are published. They are often trained and tested on datasets that consist of audio clips in which the speaker emulates emotion such as anger, happiness, neutrality, and sadness. Despite the high accuracy that the algorithms have achieved, they are not suitable for real-life deployment for two reasons. First, the datasets are often times collected in strictly controlled environments where noises are minimum and the microphone is placed very close to the speaker, which is not representative of real-life environments in which background noises are present and people are not expected to be adjacent to the acoustic sensor(s) all the time. Second, each audio clip is usually uttered by an actor, and labeled with the emotion that the actor attempts to simulate. However, research indicates no evidence that the acoustic features of acted emotion are representative of the acoustic features of authentic, spontaneous emotions. As a result, algorithms trained on acted speech may not achieve the same excellent performance when deployed in real-life environments to detect emotions in people’s speech. This thesis explores different approaches to address the problem that high-performing machine learning classiﬁers on speech-based emotion recognition may not be ﬁt for use in real-life deployment, and proposes an acoustical classiﬁer for emotion detection ﬁt for real-life deployment. The classiﬁer is intended to be part of a smart healthcare system to monitor the users’ emotions.
MS (Master of Science)
Convolutional neural networks, Machine learning, Affective computing, Cyberphysical system