Machine Learning (ML) strongly relies on optimization procedures that are based on gradient descent. Several gradient-based update schemes have been proposed in the scientific literature, especially in the context of neural networks, that have become common optimizers in software libraries for ML. In this paper, we re-frame gradient-based update strategies under the unifying lens of a Moreau-Yosida (MY) approximation of the loss function. By means of a first-order Taylor expansion, we make the MY approximation concretely exploitable to generalize the model update. In turn, this makes it easy to evaluate and compare the regularization properties that underlie the most common optimizers, such as gradient descent with momentum, ADAGRAD, RMSprop, and ADAM. The MY-based unifying view opens to the possibility of designing novel update schemes with customizable regularization properties. As case-study we propose to use the network outputs to deform the notion of closeness in the parameter space.

Betti, A., Ciravegna, G., Gori, M., Melacci, S., Mottin, K., Precioso, F. (2023). Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning. In AIxIA 2023 – Advances in Artificial Intelligence XXIInd International Conference of the Italian Association for Artificial Intelligence, AIxIA 2023 (pp.218-230). Cham : Springer [10.1007/978-3-031-47546-7_15].

Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning

Gori, M.;Melacci, S.;
2023-01-01

Abstract

Machine Learning (ML) strongly relies on optimization procedures that are based on gradient descent. Several gradient-based update schemes have been proposed in the scientific literature, especially in the context of neural networks, that have become common optimizers in software libraries for ML. In this paper, we re-frame gradient-based update strategies under the unifying lens of a Moreau-Yosida (MY) approximation of the loss function. By means of a first-order Taylor expansion, we make the MY approximation concretely exploitable to generalize the model update. In turn, this makes it easy to evaluate and compare the regularization properties that underlie the most common optimizers, such as gradient descent with momentum, ADAGRAD, RMSprop, and ADAM. The MY-based unifying view opens to the possibility of designing novel update schemes with customizable regularization properties. As case-study we propose to use the network outputs to deform the notion of closeness in the parameter space.
2023
978-3-031-47545-0
978-3-031-47546-7
Betti, A., Ciravegna, G., Gori, M., Melacci, S., Mottin, K., Precioso, F. (2023). Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning. In AIxIA 2023 – Advances in Artificial Intelligence XXIInd International Conference of the Italian Association for Artificial Intelligence, AIxIA 2023 (pp.218-230). Cham : Springer [10.1007/978-3-031-47546-7_15].
File in questo prodotto:
File Dimensione Formato  
melacci_AIXIA2023.pdf

non disponibili

Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 286.62 kB
Formato Adobe PDF
286.62 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1252641