Standard feedforward neural networks benefit from the nice theoretical properties of mixtures of sigmoid activation functions, but they may fail in several practical learning tasks. These tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper presents a connectionist model which exploits adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(·),p(·)), where f(·) is the activation function and p(·) is the likelihood of the unit being relevant to the computation of the network output over the current input. The function f(·) is optimized in a supervised manner, while p(·) is realized via a statistical parametric model learned through unsupervised (or, partially supervised) estimation. Since f(·) and p(·) influence each other's learning process, the overall machine is implicitly a co-trained coupled model and, in turn, a flexible, non-standard neural architecture. Feasibility of the approach is corroborated by empirical evidence yielded by computer simulations involving regression and classification tasks.
I., C., Trentin, E. (2014). Combination of supervised and unsupervised learning for training the activation functions of neural networks. PATTERN RECOGNITION LETTERS, 37(1), 178-191 [10.1016/j.patrec.2013.06.013].
Combination of supervised and unsupervised learning for training the activation functions of neural networks
TRENTIN, EDMONDO
2014-01-01
Abstract
Standard feedforward neural networks benefit from the nice theoretical properties of mixtures of sigmoid activation functions, but they may fail in several practical learning tasks. These tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper presents a connectionist model which exploits adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(·),p(·)), where f(·) is the activation function and p(·) is the likelihood of the unit being relevant to the computation of the network output over the current input. The function f(·) is optimized in a supervised manner, while p(·) is realized via a statistical parametric model learned through unsupervised (or, partially supervised) estimation. Since f(·) and p(·) influence each other's learning process, the overall machine is implicitly a co-trained coupled model and, in turn, a flexible, non-standard neural architecture. Feasibility of the approach is corroborated by empirical evidence yielded by computer simulations involving regression and classification tasks.File | Dimensione | Formato | |
---|---|---|---|
14-CastelliTrentin.pdf
non disponibili
Tipologia:
Post-print
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
737.03 kB
Formato
Adobe PDF
|
737.03 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/47083
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo