Hide metadata

dc.date.accessioned2024-02-19T16:02:08Z
dc.date.available2024-02-19T16:02:08Z
dc.date.created2024-01-20T16:07:34Z
dc.date.issued2023
dc.identifier.citationCompagnoni, Enea Biggio, Luca Orvieto, Antonio Proske, Frank Norbert Kersting, Hans Lucchi, Aurelien . An SDE for Modeling SAM: Theory and Insights. Proceedings of Machine Learning Research (PMLR). 2023, 202
dc.identifier.urihttp://hdl.handle.net/10852/108262
dc.description.abstractWe study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the fullbatch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones – by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.
dc.languageEN
dc.publisherJMLR
dc.titleAn SDE for Modeling SAM: Theory and Insights
dc.title.alternativeENEngelskEnglishAn SDE for Modeling SAM: Theory and Insights
dc.typeJournal article
dc.creator.authorCompagnoni, Enea
dc.creator.authorBiggio, Luca
dc.creator.authorOrvieto, Antonio
dc.creator.authorProske, Frank Norbert
dc.creator.authorKersting, Hans
dc.creator.authorLucchi, Aurelien
cristin.unitcode185,15,13,35
cristin.unitnameRisiko og stokastikk
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.cristin2231220
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Proceedings of Machine Learning Research (PMLR)&rft.volume=202&rft.spage=&rft.date=2023
dc.identifier.jtitleProceedings of Machine Learning Research (PMLR)
dc.identifier.volume202
dc.type.documentTidsskriftartikkel
dc.type.peerreviewedPeer reviewed
dc.source.issn2640-3498
dc.type.versionPublishedVersion


Files in this item

Appears in the following Collection

Hide metadata