Abstract
Nested case-control studies (NCC) reduce the cost of large cohort studies, but are statistically less efficient since all information is only available for cases and controls. In particular in a competing risk situation the traditional partial likelihood estimator for NCC can not handle controls sampled for cases of one disease as controls for another disease. This may be especially problematic if one outcome is common, but the other is rare. There has, however, been developed methods based on inverse probability weighting (IPW) that allow for reusing controls (and cases). Also maximum likelihood methods for NCC have been developed.
Furthermore, in addition to the information collected on cases and controls, some information is usually known for the entire cohort, like gender, age, etc. This information can be utilized to obtain more accurate estimates both for the IPW and MLE approaches.
This is a comparison of such methods. It is carried out both on simulated data and on data from the Norwegian Medical Birth Registry with death of cancer being the rare endpoint and death of all other causes being the common. In simulations with only one covariate IPW and MLE methods performed similarly. With two covariates where one covariate was known for the entire cohort methods that utilized this information gave efficiency improvements in particular for the fully observed covariate. If the two covariates were dependent we also found improvement for the covariate only known for cases and controls. Analysis on the Norwegian Medical Birth Registry data showed similar results than the simulation, but due to the high number of controls, at least for cancer endpoint, the improvements are not that pronounced.