Hide metadata

dc.contributor.authorVolkmann, Alexander
dc.contributor.authorDe Bin, Riccardo
dc.contributor.authorSauerbrei, Willi
dc.contributor.authorBoulesteix, Anne-Laure
dc.date.accessioned2019-07-30T05:15:15Z
dc.date.available2019-07-30T05:15:15Z
dc.date.issued2019
dc.identifier.citationBMC Medical Research Methodology. 2019 Jul 24;19(1):162
dc.identifier.urihttp://hdl.handle.net/10852/68771
dc.description.abstractBackground: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the need for appropriate model building for clinical variables. Medical literature on classical prognostic scores, as well as biostatistical literature on appropriate model selection strategies for low dimensional (clinical) data, are often ignored in the context of omics research. The goal of this paper is to fill this methodological gap by investigating the added predictive value of gene expression data for models using varying amounts of clinical information. Methods: We analyze two data sets from the field of survival prognosis of breast cancer patients. First, we construct several proportional hazards prediction models using varying amounts of clinical information based on established medical knowledge. These models are then used as a starting point (i.e. included as a clinical offset) for identifying informative gene expression variables using resampling procedures and penalized regression approaches (model based boosting and the LASSO). In order to assess the added predictive value of the gene signatures, measures of prediction accuracy and separation are examined on a validation data set for the clinical models and the models that combine the two sources of information. Results: For one data set, we do not find any substantial added predictive value of the omics data when compared to clinical models. On the second data set, we identify a noticeable added predictive value, however only for scenarios where little or no clinical information is included in the modeling process. We find that including more clinical information can lead to a smaller number of selected omics predictors. Conclusions: New research using omics data should include all available established medical knowledge in order to allow an adequate evaluation of the added predictive value of omics data. Including all relevant clinical information in the analysis might also lead to more parsimonious models. The developed procedure to assess the predictive value of the omics data can be readily applied to other scenarios. Keywords: Data integration, Cox regression, Model building
dc.language.isoeng
dc.rightsThe Author(s)
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.titleA plea for taking all available clinical information into account when assessing the predictive value of omics data
dc.typeJournal article
dc.date.updated2019-07-30T05:15:16Z
dc.creator.authorVolkmann, Alexander
dc.creator.authorDe Bin, Riccardo
dc.creator.authorSauerbrei, Willi
dc.creator.authorBoulesteix, Anne-Laure
dc.identifier.cristin1712872
dc.identifier.doihttps://doi.org/10.1186/s12874-019-0802-0
dc.identifier.urnURN:NBN:no-71922
dc.type.documentTidsskriftartikkel
dc.type.peerreviewedPeer reviewed
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/68771/1/12874_2019_Article_802.pdf
dc.type.versionPublishedVersion
cristin.articleid162


Files in this item

Appears in the following Collection

Hide metadata

Attribution 4.0 International
This item's license is: Attribution 4.0 International