dc.date.accessioned | 2024-02-19T16:02:08Z | |
dc.date.available | 2024-02-19T16:02:08Z | |
dc.date.created | 2024-01-20T16:07:34Z | |
dc.date.issued | 2023 | |
dc.identifier.citation | Compagnoni, Enea Biggio, Luca Orvieto, Antonio Proske, Frank Norbert Kersting, Hans Lucchi, Aurelien . An SDE for Modeling SAM: Theory and Insights. Proceedings of Machine Learning Research (PMLR). 2023, 202 | |
dc.identifier.uri | http://hdl.handle.net/10852/108262 | |
dc.description.abstract | We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the fullbatch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones – by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments. | |
dc.language | EN | |
dc.publisher | JMLR | |
dc.title | An SDE for Modeling SAM: Theory and Insights | |
dc.title.alternative | ENEngelskEnglishAn SDE for Modeling SAM: Theory and Insights | |
dc.type | Journal article | |
dc.creator.author | Compagnoni, Enea | |
dc.creator.author | Biggio, Luca | |
dc.creator.author | Orvieto, Antonio | |
dc.creator.author | Proske, Frank Norbert | |
dc.creator.author | Kersting, Hans | |
dc.creator.author | Lucchi, Aurelien | |
cristin.unitcode | 185,15,13,35 | |
cristin.unitname | Risiko og stokastikk | |
cristin.ispublished | true | |
cristin.fulltext | original | |
cristin.qualitycode | 1 | |
dc.identifier.cristin | 2231220 | |
dc.identifier.bibliographiccitation | info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Proceedings of Machine Learning Research (PMLR)&rft.volume=202&rft.spage=&rft.date=2023 | |
dc.identifier.jtitle | Proceedings of Machine Learning Research (PMLR) | |
dc.identifier.volume | 202 | |
dc.type.document | Tidsskriftartikkel | |
dc.type.peerreviewed | Peer reviewed | |
dc.source.issn | 2640-3498 | |
dc.type.version | PublishedVersion | |