Abstract
Fusion genes have key roles in the development and progression of many human cancers. Detection of oncogenic fusion genes is already implemented in routine diagnostics and fusion proteins are targets in molecularly tailored treatment. The advent of high-throughput screening methods shows promise in detection of novel fusion events in neoplastic diseases. However, most novel fusion genes detected from deep sequencing studies are present in cancer cells from only one particular patient, and may thus be passenger events of random genetic changes. Recurrent fusion genes, on the other hand, are commonly important driver events in the carcinogenesis. The aim of the project was to establish a database infrastructure for genome-scale and exon-level expression microarray data from cancer samples, and to develop automated methods for searching these data for novel cancer-specific transcript patterns. This is powerful, because exon microarray data are available for much larger patient cohorts than RNA-sequencing data. A specific emphasis was put to re-identify expression changes initially identified in RNA-sequencing experiments to enable searches for RNA-changes which are common across multiple individual samples. Proof-of-principle was demonstrated by detection of known fusion genes from gene expression data from a set of 51 prostate cancer samples. Implementation of the pipeline was made for colorectal cancer, a disease for which no highly recurrent fusion gene is known. The correlation between RNA-sequencing results from six colorectal cancer cell lines and exon-level microarray data from two series of colorectal tumour samples was analyzed in the context of fusion genes. As input to the analyses were fusion transcripts nominated from paired-end RNA-sequencing data. Analysis of exon microarray expression data for differential expression levels revealed six potential recurrent fusion events. Candidates will be subject to further analysis by reverse transcriptase polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE) to explore their roles as fusion gene transcripts.