Robust acoustic modeling is essential in the development of automatic speech recognition systems applied to spoken human-computer interaction. To this end, traditional hidden Markov models (HMM) may be improved by hybridizing them with artificial neural networks (ANN). Crucially, ANNs require input values that do not compromize their numerical stability. In spite of the relevance feature normalization has on the success of ANNs in real-world applications, the issue is mostly overlooked on the false premize that "any normalization technique will do". The paper proposes a gradient-ascent, maximum-likelihood algorithm for feature normalization. Relying on mixtures of logistic densities, it ensures ANN-friendly values that are distributed over the (0, 1) interval in a uniform manner. Some nice properties of the approach are discussed. The algorithm is applied to the normalization of acoustic features for a hybrid ANN/HMM speech recognizer. Experiments on real-world continuous speech recognition tasks are presented. The hybrid system turns out to be positively affected by the proposed technique.

Trentin, E. (2015). Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction. PATTERN RECOGNITION LETTERS, 66, 71-80 [10.1016/j.patrec.2015.07.003].

Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction

TRENTIN, EDMONDO
2015-01-01

Abstract

Robust acoustic modeling is essential in the development of automatic speech recognition systems applied to spoken human-computer interaction. To this end, traditional hidden Markov models (HMM) may be improved by hybridizing them with artificial neural networks (ANN). Crucially, ANNs require input values that do not compromize their numerical stability. In spite of the relevance feature normalization has on the success of ANNs in real-world applications, the issue is mostly overlooked on the false premize that "any normalization technique will do". The paper proposes a gradient-ascent, maximum-likelihood algorithm for feature normalization. Relying on mixtures of logistic densities, it ensures ANN-friendly values that are distributed over the (0, 1) interval in a uniform manner. Some nice properties of the approach are discussed. The algorithm is applied to the normalization of acoustic features for a hybrid ANN/HMM speech recognizer. Experiments on real-world continuous speech recognition tasks are presented. The hybrid system turns out to be positively affected by the proposed technique.
2015
Trentin, E. (2015). Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction. PATTERN RECOGNITION LETTERS, 66, 71-80 [10.1016/j.patrec.2015.07.003].
File in questo prodotto:
File Dimensione Formato  
19-TrentinFeatureNormalization.pdf

non disponibili

Descrizione: Articolo principale
Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 543.2 kB
Formato Adobe PDF
543.2 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/982917