Hide metadata

dc.contributor.authorCubedo, Sivert Andresen
dc.date.accessioned2021-09-25T22:03:32Z
dc.date.available2021-09-25T22:03:32Z
dc.date.issued2021
dc.identifier.citationCubedo, Sivert Andresen. Fast Multi-GPU communication over PCI Express. Master thesis, University of Oslo, 2021
dc.identifier.urihttp://hdl.handle.net/10852/88547
dc.description.abstractToday the demand for large-scale Machine Learning (ML) models is increasing. Training such models require more and more hardware resources. Distributing ML training is a way to reduce training time. However, this depends on the ability of machines to work together. In this thesis, we have developed a proof of concept plugin for the NVIDIA Collective Communication Library (NCCL), enabling inter- machine PCIe communication. NCCL is a state-of-the-art Collective Operations library for Nvidia GPUs. Our plugin is implemented using Dolphin NTB adapters, allowing for inter-machine PCIe communication. We are able to show that network interconnects do affect distributed ML training time. Our plugin is able to make the Collective Operation time insignificant compared to the computation time when training ML models.eng
dc.language.isoeng
dc.subjectdistributed training
dc.subjectPCIe
dc.subjectNCCL
dc.subjectcollective operations
dc.subjectmachine learning
dc.titleFast Multi-GPU communication over PCI Expresseng
dc.typeMaster thesis
dc.date.updated2021-09-26T22:01:33Z
dc.creator.authorCubedo, Sivert Andresen
dc.identifier.urnURN:NBN:no-91133
dc.type.documentMasteroppgave
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/88547/1/sivertac-thesis.pdf


Files in this item

Appears in the following Collection

Hide metadata