Standard feedforward neural networks benefit from the nice theoretical properties of mixtures of sigmoid activation functions, but they may fail in several practical learning tasks. These tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper presents a connectionist model which exploits adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(·),p(·)), where f(·) is the activation function and p(·) is the likelihood of the unit being relevant to the computation of the network output over the current input. The function f(·) is optimized in a supervised manner, while p(·) is realized via a statistical parametric model learned through unsupervised (or, partially supervised) estimation. Since f(·) and p(·) influence each other's learning process, the overall machine is implicitly a co-trained coupled model and, in turn, a flexible, non-standard neural architecture. Feasibility of the approach is corroborated by empirical evidence yielded by computer simulations involving regression and classification tasks.

I., C., & Trentin, E. (2014). Combination of supervised and unsupervised learning for training the activation functions of neural networks. PATTERN RECOGNITION LETTERS, 37(1), 178-191 [10.1016/j.patrec.2013.06.013].

Combination of supervised and unsupervised learning for training the activation functions of neural networks

TRENTIN, EDMONDO
2014

Abstract

Standard feedforward neural networks benefit from the nice theoretical properties of mixtures of sigmoid activation functions, but they may fail in several practical learning tasks. These tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper presents a connectionist model which exploits adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(·),p(·)), where f(·) is the activation function and p(·) is the likelihood of the unit being relevant to the computation of the network output over the current input. The function f(·) is optimized in a supervised manner, while p(·) is realized via a statistical parametric model learned through unsupervised (or, partially supervised) estimation. Since f(·) and p(·) influence each other's learning process, the overall machine is implicitly a co-trained coupled model and, in turn, a flexible, non-standard neural architecture. Feasibility of the approach is corroborated by empirical evidence yielded by computer simulations involving regression and classification tasks.
I., C., & Trentin, E. (2014). Combination of supervised and unsupervised learning for training the activation functions of neural networks. PATTERN RECOGNITION LETTERS, 37(1), 178-191 [10.1016/j.patrec.2013.06.013].
File in questo prodotto:
File Dimensione Formato  
14-CastelliTrentin.pdf

non disponibili

Tipologia: Post-print
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 737.03 kB
Formato Adobe PDF
737.03 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11365/47083
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo