Emotion recognition is a relevant task in human-computer interaction. Several pattern recognition and machine learning techniques have been applied so far in order to assign input audio and/or video sequences to specific emotional classes. This paper introduces a novel approach to the problem, suitable also to more generic sequence recognition tasks. The approach relies on the combination of the recurrent reservoir of an echo state network with a connectionist density estimation module. The reservoir realizes an encoding of the input sequences into a fixed-dimensionality pattern of neuron activations. The density estimator, consisting of a constrained radial basis functions network, evaluates the likelihood of the echo state given the input. Unsupervised training is accomplished within a maximum-likelihood framework. The architecture can then be used for estimating class-conditional probabilities in order to carry out emotion classification within a Bayesian setup. Preliminary experiments in emotion recognition from speech signals from the WaSeP (c) dataset show that the proposed approach is effective, and it may outperform state-of-the-art classifiers.

Trentin, E., Scherer, S., Schwenker, F. (2010). Maximum Echo-State-Likelihood Networks for Emotion Recognition. In Proceedings of ANNPR 2010 (ArtificialNeural Networks in Pattern Recognition, Fourth IAPR Workshop) (pp.60-71). Springer [10.1007/978-3-642-12159-3_6].

Maximum Echo-State-Likelihood Networks for Emotion Recognition

Trentin E.;
2010-01-01

Abstract

Emotion recognition is a relevant task in human-computer interaction. Several pattern recognition and machine learning techniques have been applied so far in order to assign input audio and/or video sequences to specific emotional classes. This paper introduces a novel approach to the problem, suitable also to more generic sequence recognition tasks. The approach relies on the combination of the recurrent reservoir of an echo state network with a connectionist density estimation module. The reservoir realizes an encoding of the input sequences into a fixed-dimensionality pattern of neuron activations. The density estimator, consisting of a constrained radial basis functions network, evaluates the likelihood of the echo state given the input. Unsupervised training is accomplished within a maximum-likelihood framework. The architecture can then be used for estimating class-conditional probabilities in order to carry out emotion classification within a Bayesian setup. Preliminary experiments in emotion recognition from speech signals from the WaSeP (c) dataset show that the proposed approach is effective, and it may outperform state-of-the-art classifiers.
2010
3642121586
9783642121586
Trentin, E., Scherer, S., Schwenker, F. (2010). Maximum Echo-State-Likelihood Networks for Emotion Recognition. In Proceedings of ANNPR 2010 (ArtificialNeural Networks in Pattern Recognition, Fourth IAPR Workshop) (pp.60-71). Springer [10.1007/978-3-642-12159-3_6].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/5164
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo