Robust acoustic modeling is essential in the development of automatic speech recognition systems applied to spoken human-computer interaction. To this end, traditional hidden Markov models (HMM) may be improved by hybridizing them with artificial neural networks (ANN). Crucially, ANNs require input values that do not compromize their numerical stability. In spite of the relevance feature normalization has on the success of ANNs in real-world applications, the issue is mostly overlooked on the false premize that "any normalization technique will do". The paper proposes a gradient-ascent, maximum-likelihood algorithm for feature normalization. Relying on mixtures of logistic densities, it ensures ANN-friendly values that are distributed over the (0, 1) interval in a uniform manner. Some nice properties of the approach are discussed. The algorithm is applied to the normalization of acoustic features for a hybrid ANN/HMM speech recognizer. Experiments on real-world continuous speech recognition tasks are presented. The hybrid system turns out to be positively affected by the proposed technique.
|Titolo:||Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction|
|Citazione:||Trentin, E. (2015). Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction. PATTERN RECOGNITION LETTERS, 66, 71-80.|
|Appare nelle tipologie:||1.1 Articolo in rivista|