Noise-tolerant speech recognition: the SNN-TA approach

IRIS

Neural network learning theory draws a relationship between “learning with noise” and applying a regularization term in the cost function that is minimized during the training process on clean (non-noisy) data. Application of regularizers and other robust training techniques are aimed at improving the generalization capabilities of connectionist models, reducing overfitting. In spite of that, the generalization problem is usually overlooked by automatic speech recognition (ASR) practioners who use hidden Markov models (HMM) or other standard ASR paradigms. Nonetheless, it is reasonable to expect that an adequate neural network model (due to its universal approximation property and generalization capability) along with a suitable regularizer can exhibit good recognition performance whenever noise is added to the test data, although training is accomplished on clean data. This paper presents applications of a variant of the so called segmental neural network (SNN), introduced at BBN by Zavaliagkos et al. for rescoring the N-best hypothesis yielded by a standard continuous density HMM (CDHMM). An enhanced connectionist model, called SNN with trainable amplitude of activation functions (SNN-TA) is first used in this paper instead of the CDHMM to perform the recognition of isolated words. Viterbi-based segmentation is then introduced, relying on the level-building algorithm, that can be combined with the SNN-TA to obtain a hybrid framework for continuous speech recognition. The proposed paradigm is applied to the recognition of isolated and connected Italian digits under several noisy conditions, outperforming the CDHMMs.

Trentin, E., Matassoni, M. (2003). Noise-tolerant speech recognition: the SNN-TA approach. INFORMATION SCIENCES, 156(1-2), 55-69 [10.1016/S0020-0255(03)00164-6].

Noise-tolerant speech recognition: the SNN-TA approach

Trentin, Edmondo;Matassoni, Marco

2003-01-01

Abstract

Neural network learning theory draws a relationship between “learning with noise” and applying a regularization term in the cost function that is minimized during the training process on clean (non-noisy) data. Application of regularizers and other robust training techniques are aimed at improving the generalization capabilities of connectionist models, reducing overfitting. In spite of that, the generalization problem is usually overlooked by automatic speech recognition (ASR) practioners who use hidden Markov models (HMM) or other standard ASR paradigms. Nonetheless, it is reasonable to expect that an adequate neural network model (due to its universal approximation property and generalization capability) along with a suitable regularizer can exhibit good recognition performance whenever noise is added to the test data, although training is accomplished on clean data. This paper presents applications of a variant of the so called segmental neural network (SNN), introduced at BBN by Zavaliagkos et al. for rescoring the N-best hypothesis yielded by a standard continuous density HMM (CDHMM). An enhanced connectionist model, called SNN with trainable amplitude of activation functions (SNN-TA) is first used in this paper instead of the CDHMM to perform the recognition of isolated words. Viterbi-based segmentation is then introduced, relying on the level-building algorithm, that can be combined with the SNN-TA to obtain a hybrid framework for continuous speech recognition. The proposed paradigm is applied to the recognition of isolated and connected Italian digits under several noisy conditions, outperforming the CDHMMs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2003
			
	Rivista su cui è pubblicata l'opera
	
				INFORMATION SCIENCES
			
	Citazione
	
				Trentin, E., Matassoni, M. (2003). Noise-tolerant speech recognition: the SNN-TA approach. INFORMATION SCIENCES, 156(1-2), 55-69 [10.1016/S0020-0255(03)00164-6].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
07-TrentinMatassoni.pdf non disponibili Tipologia: PDF editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 135.11 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	135.11 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/22262

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo