Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction

IRIS

Robust acoustic modeling is essential in the development of automatic speech recognition systems applied to spoken human-computer interaction. To this end, traditional hidden Markov models (HMM) may be improved by hybridizing them with artificial neural networks (ANN). Crucially, ANNs require input values that do not compromize their numerical stability. In spite of the relevance feature normalization has on the success of ANNs in real-world applications, the issue is mostly overlooked on the false premize that "any normalization technique will do". The paper proposes a gradient-ascent, maximum-likelihood algorithm for feature normalization. Relying on mixtures of logistic densities, it ensures ANN-friendly values that are distributed over the (0, 1) interval in a uniform manner. Some nice properties of the approach are discussed. The algorithm is applied to the normalization of acoustic features for a hybrid ANN/HMM speech recognizer. Experiments on real-world continuous speech recognition tasks are presented. The hybrid system turns out to be positively affected by the proposed technique.

Trentin, E. (2015). Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction. PATTERN RECOGNITION LETTERS, 66, 71-80 [10.1016/j.patrec.2015.07.003].

Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction

Trentin, Edmondo

2015-01-01

Abstract

Robust acoustic modeling is essential in the development of automatic speech recognition systems applied to spoken human-computer interaction. To this end, traditional hidden Markov models (HMM) may be improved by hybridizing them with artificial neural networks (ANN). Crucially, ANNs require input values that do not compromize their numerical stability. In spite of the relevance feature normalization has on the success of ANNs in real-world applications, the issue is mostly overlooked on the false premize that "any normalization technique will do". The paper proposes a gradient-ascent, maximum-likelihood algorithm for feature normalization. Relying on mixtures of logistic densities, it ensures ANN-friendly values that are distributed over the (0, 1) interval in a uniform manner. Some nice properties of the approach are discussed. The algorithm is applied to the normalization of acoustic features for a hybrid ANN/HMM speech recognizer. Experiments on real-world continuous speech recognition tasks are presented. The hybrid system turns out to be positively affected by the proposed technique.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Rivista su cui è pubblicata l'opera
	
				PATTERN RECOGNITION LETTERS
			
	Citazione
	
				Trentin, E. (2015). Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction. PATTERN RECOGNITION LETTERS, 66, 71-80 [10.1016/j.patrec.2015.07.003].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
19-TrentinFeatureNormalization.pdf non disponiibile Descrizione: Articolo principale Tipologia: PDF editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 543.2 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	543.2 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/982917