Sammendrag
There exist many methods for survival prediction from high-dimensional genomic data. Most of them combine the Cox proportional hazards model with some dimension reduction estimation technique, like partial least squares (PLS). For PLS it is not obvious how it should be applied to the Cox model, and different approaches have been suggested. The perhaps most reasonable one, Park et al. (2002), uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We present a modified version of the method of Park et al. (2002), which estimates the baseline hazard and the gene effects in separate steps. Our approach has the advantages of leading to PLS directions that have more reasonable biological interpretations, providing estimates of survival probabilities for new patients and enabling a faster and less memory-demanding estimation procedure. In addition our method allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. Applying our method to two different microarray gene expression breast cancer data sets, one with additional non-genomic covariates, shows that our method gives at least as good predictions as the method of Park et al. (2002), and that there is a lot to be gained by including gene expressions together with the clinical variables.