Hide metadata

dc.date.accessioned2024-02-16T17:57:54Z
dc.date.available2024-02-16T17:57:54Z
dc.date.created2023-05-08T12:49:43Z
dc.date.issued2023
dc.identifier.citational Hajj, Ghadi Pensar, Johan Sandve, Geir Kjetil Ferkingstad . DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation. PLOS ONE. 2023, 18(4)
dc.identifier.urihttp://hdl.handle.net/10852/108142
dc.description.abstractData simulation is fundamental for machine learning and causal inference, as it allows exploration of scenarios and assessment of methods in settings with full control of ground truth. Directed acyclic graphs (DAGs) are well established for encoding the dependence structure over a collection of variables in both inference and simulation settings. However, while modern machine learning is applied to data of an increasingly complex nature, DAG-based simulation frameworks are still confined to settings with relatively simple variable types and functional forms. We here present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations. A succinct YAML format for defining the simulation model structure promotes transparency, while separate user-provided functions for generating each variable based on its parents ensure simulation code modularization. We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences. DagSim is available as a Python package at PyPI. Source code and documentation are available at: https://github.com/uio-bmi/dagsim
dc.languageEN
dc.publisherPLOS
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleDagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation
dc.title.alternativeENEngelskEnglishDagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation
dc.typeJournal article
dc.creator.authoral Hajj, Ghadi
dc.creator.authorPensar, Johan
dc.creator.authorSandve, Geir Kjetil Ferkingstad
cristin.unitcode185,15,5,43
cristin.unitnameVitenskapelige beregninger og maskinlæring
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.cristin2146141
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=PLOS ONE&rft.volume=18&rft.spage=&rft.date=2023
dc.identifier.jtitlePLOS ONE
dc.identifier.volume18
dc.identifier.issue4
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0284443
dc.type.documentTidsskriftartikkel
dc.type.peerreviewedPeer reviewed
dc.source.issn1932-6203
dc.type.versionPublishedVersion
cristin.articleide0284443


Files in this item

Appears in the following Collection

Hide metadata

Attribution 4.0 International
This item's license is: Attribution 4.0 International