Machine Learning (ML) strongly relies on optimization procedures that are based on gradient descent. Several gradient-based update schemes have been proposed in the scientific literature, especially in the context of neural networks, that have become common optimizers in software libraries for ML. In this paper, we re-frame gradient-based update strategies under the unifying lens of a Moreau-Yosida (MY) approximation of the loss function. By means of a first-order Taylor expansion, we make the MY approximation concretely exploitable to generalize the model update. In turn, this makes it easy to evaluate and compare the regularization properties that underlie the most common optimizers, such as gradient descent with momentum, ADAGRAD, RMSprop, and ADAM. The MY-based unifying view opens to the possibility of designing novel update schemes with customizable regularization properties. As case-study we propose to use the network outputs to deform the notion of closeness in the parameter space.
Betti, A., Ciravegna, G., Gori, M., Melacci, S., Mottin, K., Precioso, F. (2023). Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning. In AIxIA 2023 – Advances in Artificial Intelligence XXIInd International Conference of the Italian Association for Artificial Intelligence, AIxIA 2023 (pp.218-230). Cham : Springer [10.1007/978-3-031-47546-7_15].
Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning
Gori, M.;Melacci, S.;
2023-01-01
Abstract
Machine Learning (ML) strongly relies on optimization procedures that are based on gradient descent. Several gradient-based update schemes have been proposed in the scientific literature, especially in the context of neural networks, that have become common optimizers in software libraries for ML. In this paper, we re-frame gradient-based update strategies under the unifying lens of a Moreau-Yosida (MY) approximation of the loss function. By means of a first-order Taylor expansion, we make the MY approximation concretely exploitable to generalize the model update. In turn, this makes it easy to evaluate and compare the regularization properties that underlie the most common optimizers, such as gradient descent with momentum, ADAGRAD, RMSprop, and ADAM. The MY-based unifying view opens to the possibility of designing novel update schemes with customizable regularization properties. As case-study we propose to use the network outputs to deform the notion of closeness in the parameter space.File | Dimensione | Formato | |
---|---|---|---|
melacci_AIXIA2023.pdf
non disponibili
Tipologia:
PDF editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
286.62 kB
Formato
Adobe PDF
|
286.62 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1252641