Fast Multi-GPU communication over PCI Express

dc.contributor.author	Cubedo, Sivert Andresen
dc.date.accessioned	2021-09-25T22:03:32Z
dc.date.available	2021-09-25T22:03:32Z
dc.date.issued	2021
dc.identifier.citation	Cubedo, Sivert Andresen. Fast Multi-GPU communication over PCI Express. Master thesis, University of Oslo, 2021
dc.identifier.uri	http://hdl.handle.net/10852/88547
dc.description.abstract	Today the demand for large-scale Machine Learning (ML) models is increasing. Training such models require more and more hardware resources. Distributing ML training is a way to reduce training time. However, this depends on the ability of machines to work together. In this thesis, we have developed a proof of concept plugin for the NVIDIA Collective Communication Library (NCCL), enabling inter- machine PCIe communication. NCCL is a state-of-the-art Collective Operations library for Nvidia GPUs. Our plugin is implemented using Dolphin NTB adapters, allowing for inter-machine PCIe communication. We are able to show that network interconnects do affect distributed ML training time. Our plugin is able to make the Collective Operation time insignificant compared to the computation time when training ML models.	eng
dc.language.iso	eng
dc.subject	distributed training
dc.subject	PCIe
dc.subject	NCCL
dc.subject	collective operations
dc.subject	machine learning
dc.title	Fast Multi-GPU communication over PCI Express	eng
dc.type	Master thesis
dc.date.updated	2021-09-26T22:01:33Z
dc.creator.author	Cubedo, Sivert Andresen
dc.identifier.urn	URN:NBN:no-91133
dc.type.document	Masteroppgave
dc.identifier.fulltext	Fulltext https://www.duo.uio.no/bitstream/handle/10852/88547/1/sivertac-thesis.pdf

Files in this item

Name:: sivertac-thesis.pdf
Size:: 421.4Kb
Format:: application/

View/Open

Appears in the following Collection

Institutt for informatikk [4956]

Hide metadata

Fast Multi-GPU communication over PCI Express

Files in this item

Appears in the following Collection

Browse

For library staff

RSS Feeds