Model-based estimation of transcript concentrations from spotted microarray data

Frigessi, Arnoldo; van de Wiel, Mark A.; Holden, Marit; Glad, Ingrid K.; Lyng, Heidi

Research report

Åpne

stat-res-06-04.pdf (2.580Mb)

06-04supplement.pdf (205.2Kb)

År

2004

Sammendrag

In this paper a bootstrap algorithm Much data from spotted microarrays remain unused because obtained with different protocols, platforms or designs, making comparisons across experiments impossible. We have developed a model-based method, which provides absolute transcript levels. Transcript levels are universal, and can be included in further analyses with similar estimates obtained with different techniques in other laboratories. It is a first step both towards genuine meta-analyses, including comparisons across different organisms, and the building of data bases of transcript levels in cells. Our method is based on statistical modelling incorporating all available information about the experiment, from target preparation to image analysis, coherently propagating uncertainties from data to estimates. It requires some genes spotted in replicates, their number being related to the levels of experimental factors included in the model, but not to the number of spotted genes. No uncertainty in the estimates caused by decimated data sets, indirect comparisons, normalisation or imputation of missing values, is introduced, leading to a far more precise analysis of microarray data than provided by conventional methods. Using a flexible Bayesian technique we estimate the highly multivariate joint posterior distribution of all transcripts, which enables extended exploitation of the data. In the present work we apply our method to cervical cancer data. We show that the estimated transcript concentrations are accurate and reproducible, and demonstrate improved statistical tools for selecting genes based on their concentration in highly unbalanced experimental settings.