A mixture of recurrent neural networks for speaker normalization

Trentin, E.; Giuliani, D.

doi:10.1007/s005210170004

In spite of recent advances in automatic speech recognition, the performance of state-of-the-art speech recognisers fluctuates depending on the speaker. Speaker normalisation aims at the reduction of differences between the acoustic space of a new speaker and the training acoustic space of a given speech recogniser, improving performance. Normalisation is based on an acoustic feature transformation, to be estimated from a small amount of speech signal. This paper introduces a mixture of recurrent neural networks as an effective regression technique to approach the problem. A suitable Viterbi-based time alignment procedure is proposed for generating the adaptation set. The mixture is compared with linear regression and single-model connectionist approaches. Speaker-dependent and speaker-independent continuous speech recognition experiments with a large vocabulary, using Hidden Markov Models, are presented. Results show that the mixture improves recognition performance, yielding a 21% relative reduction of the word error rate, i.e. comparable with that obtained with model-adaptation approaches.

Trentin, E., Giuliani, D. (2001). A mixture of recurrent neural networks for speaker normalization. NEURAL COMPUTING & APPLICATIONS, 10(2), 120-135 [10.1007/s005210170004].

A mixture of recurrent neural networks for speaker normalization

Trentin E.;Giuliani D.

2001-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2001
			
	Rivista su cui è pubblicata l'opera
	
				NEURAL COMPUTING & APPLICATIONS
			
	Citazione
	
				Trentin, E., Giuliani, D. (2001). A mixture of recurrent neural networks for speaker normalization. NEURAL COMPUTING & APPLICATIONS, 10(2), 120-135 [10.1007/s005210170004].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
06-TrentinGiuliani.pdf non disponibili Tipologia: Post-print Licenza: PUBBLICO - Pubblico con Copyright Dimensione 185.45 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	185.45 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/11372

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

A mixture of recurrent neural networks for speaker normalization

Trentin E.;Giuliani D.

2001-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)