This paper investigates the role of coherence constraints in recognizing facial expressions from images and video sequences. A set of constraints are introduced to bridge a pool of Convolutional Neural Networks (CNNs) during their training stage. Constraints are inspired by practical considerations on the regularity of the temporal evolution of the predictions, and by the idea of connecting the information extracted from multiple representations. We study CNNs with the aim of building a versatile recognizer of expressions in static images that can be further applied to video sequences. First, the importance of different face parts in the recognition task is studied, considering appearance and shape-related features. Then we focus on the Semi-Supervised learning setting, exploiting video data, where only a few frames are supervised. The unsupervised portion of the training data is used to enforce three types of coherence, namely temporal coherence, coherence among the predictions on the face parts and coherence between appearance and shape-based representation. Our experimental analysis shows that coherence constraints improve the quality of the expression recognizer, thus offering a suitable basis to profitably exploit unsupervised video sequences, also in cases in which some portions of the input face are not visible.
Graziani, L., Melacci, S., Gori, M. (2019). Coherence constraints in facial expression recognition. INTELLIGENZA ARTIFICIALE, 13(1), 79-92 [10.3233/IA-180015].
Coherence constraints in facial expression recognition
Graziani L.
;Melacci S.;Gori M.
2019-01-01
Abstract
This paper investigates the role of coherence constraints in recognizing facial expressions from images and video sequences. A set of constraints are introduced to bridge a pool of Convolutional Neural Networks (CNNs) during their training stage. Constraints are inspired by practical considerations on the regularity of the temporal evolution of the predictions, and by the idea of connecting the information extracted from multiple representations. We study CNNs with the aim of building a versatile recognizer of expressions in static images that can be further applied to video sequences. First, the importance of different face parts in the recognition task is studied, considering appearance and shape-related features. Then we focus on the Semi-Supervised learning setting, exploiting video data, where only a few frames are supervised. The unsupervised portion of the training data is used to enforce three types of coherence, namely temporal coherence, coherence among the predictions on the face parts and coherence between appearance and shape-based representation. Our experimental analysis shows that coherence constraints improve the quality of the expression recognizer, thus offering a suitable basis to profitably exploit unsupervised video sequences, also in cases in which some portions of the input face are not visible.File | Dimensione | Formato | |
---|---|---|---|
melacci_INTART2019.pdf
non disponibili
Tipologia:
PDF editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.02 MB
Formato
Adobe PDF
|
1.02 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1082489