Hide metadata

dc.contributor.authorLund, Kine Veronica
dc.date.accessioned2014-03-15T22:09:56Z
dc.date.available2014-03-15T22:09:56Z
dc.date.issued2013
dc.identifier.citationLund, Kine Veronica. The Instability of Cross-Validated Lasso. Master thesis, University of Oslo, 2013
dc.identifier.urihttp://hdl.handle.net/10852/38874
dc.description.abstractIn a situation where the number of available covariates greatly exceeds the number of observations, the fitting of a regression model to explain the connection between the response and the explanatory variables can be a challenging task. The problem can be compared to a set of equations with more unknowns than there are equations and requires application of a regularisation method to result in a useful solution. There are several such methods, with different properties. This thesis focuses on one such method: the Lasso in combination with crossvalidation (CV) to determine the level of regularisation. Specifically, we consider the method when applied on survival data where the covariates are thousands of gene expression levels. The combination of Lasso and CV proves to be unstable in the sense that repeated application of the standard R implementation often give varying results. This study s main focus is to investigate what the causes of this instability may be. Data was simulated to map the factors that affect the stability. The simulated data sets properties are easy to control and the effects on the regularisation results are easily observed. The tests show that the CV process cause marked instability (varying results) when the division into training and test sets involve test sets with size larger than one. Moreover, the stability of the regularisation depends on the properties of the data set. A unique prediction result is preferable to easily choose a prognostic gene signature. However, a range of signatures from repeated regularisations can be utilised to indicate the accuracy of the suggested signature. This thesis maps several factors that affect the stability of Lasso and CV, and will hopefully contribute to caution - be a warning flag - when utilising the Lasso method to find a prognostic model.eng
dc.language.isoeng
dc.subjectbiostatistics
dc.subjectlasso
dc.subjectcross
dc.subjectvalidation
dc.titleThe Instability of Cross-Validated Lassoeng
dc.typeMaster thesis
dc.date.updated2014-03-15T22:09:56Z
dc.creator.authorLund, Kine Veronica
dc.identifier.urnURN:NBN:no-42481
dc.type.documentMasteroppgave
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/38874/1/KineVeronicaLund_MasterThesis.pdf


Files in this item

Appears in the following Collection

Hide metadata