Hide metadata

dc.date.accessioned2024-03-23T17:44:33Z
dc.date.available2024-03-23T17:44:33Z
dc.date.created2023-11-17T13:59:17Z
dc.date.issued2023
dc.identifier.citationKanduri, Chakravarthi Scheffer, Lonneke Pavlović, Milena Rand, Knut Dagestad Chernigovskaia, Maria Pirvandy, Oz Yaari, Gur Greiff, Victor Sandve, Geir Kjetil Ferkingstad . simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods. GigaScience. 2023, 12, 1-16
dc.identifier.urihttp://hdl.handle.net/10852/110070
dc.description.abstractAbstract Background Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. Results We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state–associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. Conclusions This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.
dc.languageEN
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titlesimAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods
dc.title.alternativeENEngelskEnglishsimAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods
dc.typeJournal article
dc.creator.authorKanduri, Chakravarthi
dc.creator.authorScheffer, Lonneke
dc.creator.authorPavlović, Milena
dc.creator.authorRand, Knut Dagestad
dc.creator.authorChernigovskaia, Maria
dc.creator.authorPirvandy, Oz
dc.creator.authorYaari, Gur
dc.creator.authorGreiff, Victor
dc.creator.authorSandve, Geir Kjetil Ferkingstad
cristin.unitcode185,15,31,0
cristin.unitnameSenter for bioinformatikk
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.cristin2198166
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=GigaScience&rft.volume=12&rft.spage=1&rft.date=2023
dc.identifier.jtitleGigaScience
dc.identifier.volume12
dc.identifier.startpage1
dc.identifier.endpage16
dc.identifier.doihttps://doi.org/10.1093/gigascience/giad074
dc.type.documentTidsskriftartikkel
dc.type.peerreviewedPeer reviewed
dc.source.issn2047-217X
dc.type.versionPublishedVersion
dc.relation.projectNFR/311341
dc.relation.projectKF/215817
dc.relation.projectNFR/331890
dc.relation.projectNFR/300740
dc.relation.projectSIGMA2/NN9603K
dc.relation.projectEC/H2020/825821


Files in this item

Appears in the following Collection

Hide metadata

Attribution 4.0 International
This item's license is: Attribution 4.0 International