Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments

dc.date.accessioned	2020-09-10T17:55:32Z
dc.date.available	2020-09-10T17:55:32Z
dc.date.created	2020-09-06T09:01:00Z
dc.date.issued	2020
dc.identifier.citation	Truong, Tuyen Trung Nguyen, Hang-Tuan . Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments. Applied Mathematics and Optimization. 2020
dc.identifier.uri	http://hdl.handle.net/10852/79322
dc.description.abstract	In this paper, we provide new results and algorithms (including backtracking versions of Nesterov accelerated gradient and Momentum) which are more applicable to large scale optimisation as in Deep Neural Networks. We also demonstrate that Backtracking Gradient Descent (Backtracking GD) can obtain good upper bound estimates for local Lipschitz constants for the gradient, and that the convergence rate of Backtracking GD is similar to that in classical work of Armijo. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify a heuristic argument that Backtracking GD stabilises to a finite union of sequences constructed from Standard GD for the mini-batch practice, and show that our new algorithms (while automatically fine tuning learning rates) perform better than current state-of-the-art methods such as Adam, Adagrad, Adadelta, RMSProp, Momentum and Nesterov accelerated gradient. To help readers avoiding the confusion between heuristics and more rigorously justified algorithms, we also provide a review of the current state of convergence results for gradient descent methods. Accompanying source codes are available on GitHub.
dc.language	EN
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.title	Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments
dc.type	Journal article
dc.creator.author	Truong, Tuyen Trung
dc.creator.author	Nguyen, Hang-Tuan
cristin.unitcode	185,15,13,65
cristin.unitname	Flere komplekse variable, logikk og operatoralgebraer
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2
dc.identifier.cristin	1827552
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Applied Mathematics and Optimization&rft.volume=&rft.spage=&rft.date=2020
dc.identifier.jtitle	Applied Mathematics and Optimization
dc.identifier.doi	https://doi.org/10.1007/s00245-020-09718-8
dc.identifier.urn	URN:NBN:no-82431
dc.type.document	Tidsskriftartikkel
dc.type.peerreviewed	Peer reviewed
dc.source.issn	0095-4616
dc.identifier.fulltext	Fulltext https://www.duo.uio.no/bitstream/handle/10852/79322/1/Truong-Nguyen2020_Article_BacktrackingGradientDescentMet.pdf
dc.type.version	PublishedVersion