Hide metadata

dc.date.accessioned2019-06-20T05:25:14Z
dc.date.available2019-06-20T05:25:14Z
dc.date.created2019-01-15T11:57:49Z
dc.date.issued2018
dc.identifier.citationCameron, David Gordon Elmsheuser, J. Heinrich, L Lavrijsen, W Nilsson, P. Tsulaia, V Vogel, M . Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms. Journal of Physics, Conference Series. 2018, 1085(3), 1-6
dc.identifier.urihttp://hdl.handle.net/10852/68436
dc.description.abstractData processing applications of the ATLAS experiment, such as event simulation and reconstruction, spend considerable amount of time in the initialization phase. This phase includes loading a large number of shared libraries, reading detector geometry and condition data from external databases, building a transient representation of the detector geometry and initializing various algorithms and services. In some cases the initialization step can take as long as 10-15 minutes. Such slow initialization has a significant negative impact on overall CPU efficiency of the production job, especially when the job is executed on opportunistic, often short-lived, resources such as commercial clouds or volunteer computing. In order to improve this situation, we can take advantage of the fact that ATLAS runs large numbers of production jobs with similar configuration parameters (e.g. jobs within the same production task). This allows us to checkpoint one job at the end of its configuration step and then use the generated checkpoint image for rapid startup of thousands of production jobs. By applying this technique we can bring the initialization time of a job from tens of minutes down to just a few seconds. In addition to that we can leverage container technology for restarting checkpointed applications on the variety of computing platforms, in particular of platforms different from the one on which the checkpoint image was created. We will describe the mechanism of creating checkpoint images of Geant4 simulation jobs with AthenaMP (the multi-process version of the ATLAS data simulation, reconstruction and analysis framework Athena) and the usage of these images for running ATLAS Simulation production jobs on volunteer computing resources (ATLAS@Home) and on Supercomputers.en_US
dc.languageEN
dc.publisherIOP Publishing
dc.rightsAttribution 3.0 Unported
dc.rights.urihttps://creativecommons.org/licenses/by/3.0/
dc.titleLeveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platformsen_US
dc.typeJournal articleen_US
dc.creator.authorCameron, David Gordon
dc.creator.authorElmsheuser, J.
dc.creator.authorHeinrich, L
dc.creator.authorLavrijsen, W
dc.creator.authorNilsson, P.
dc.creator.authorTsulaia, V
dc.creator.authorVogel, M
cristin.unitcode185,15,4,60
cristin.unitnameHøyenergifysikk
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.cristin1657064
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Journal of Physics, Conference Series&rft.volume=1085&rft.spage=1&rft.date=2018
dc.identifier.jtitleJournal of Physics, Conference Series
dc.identifier.volume1085
dc.identifier.issue3
dc.identifier.startpage1
dc.identifier.endpage6
dc.identifier.doihttp://dx.doi.org/10.1088/1742-6596/1085/3/032028
dc.identifier.urnURN:NBN:no-71606
dc.type.documentTidsskriftartikkelen_US
dc.type.peerreviewedPeer reviewed
dc.source.issn1742-6588
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/68436/1/Cameron_2018_J._Phys.%25253A_Conf._Ser._1085_032028.pdf
dc.type.versionPublishedVersion
cristin.articleid032028


Files in this item

Appears in the following Collection

Hide metadata

Attribution 3.0 Unported
This item's license is: Attribution 3.0 Unported