Rapid and sensitive methods for protein sequence comparison and database searching

Rognes, Torbjørn

Doctoral thesis

View/Open

PhD-Rognes-DUO.pdf (1.984Mb)

Year

2001

Abstract

The efforts by the international genome sequencing projects have resulted in huge and exponentially growing databases of public DNA and protein sequence information. The complete genome sequence of many organisms has already been published, and even the human genome passed the phase of sequencing as of writing.

However, a detailed analysis of these genomes, genes, and gene products is necessary in order to reach a better understanding of their function in the cells of the organism. The major part of the analysis requires experimental biology and biochemistry, however, much information can be obtained by sequence analysis using computational methods.

Fundamental tasks in this analysis are the comparison of two sequences and the searching of databases of amino acid and nucleotide sequences for a similar sequence. This will often reveal valuable information about the possible structure and function of the protein. Several programs exist for performing such searches with varying sensitivity and speed. Accurate database searches may require large computational resources. As the databases are getting larger, longer time is required to search them. In addition, more sensitive tools are required in order to identify less obvious relationships between protein. The aim of this work was hence to develop novel algorithms for database searching with increased sensitivity and speed.

This work presents three new methods for performing both sensitive and rapid database searches. Two of the methods gain speed by taking advantage of 8-way parallel processing technology now available in common computers. By the use of some of these tools, a new family of proteins have also been identifi ed.

List of papers

Paper I SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments. Torbjørn Rognes, Erling Seeberg. Bioinformatics, Volume 14, Issue 10, 1 January 1998, Pages 839–845. The paper is not available in DUO due to publisher restrictions. The published version is available at: https://doi.org/10.1093/bioinformatics/14.10.839
Paper II Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors Torbjørn Rognes, Erling Seeberg. Bioinformatics, Volume 16, Issue 8, 1 August 2000, Pages 699–706. The paper is not available in DUO due to publisher restrictions. The published version is available at: https://doi.org/10.1093/bioinformatics/16.8.699
Paper III ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches Torbjørn Rognes. Nucleic Acids Research, Volume 29, Issue 7, 1 April 2001, Pages 1647–1652. The paper is not available in DUO due to publisher restrictions. The published version is available at: https://doi.org/10.1093/nar/29.7.1647
Paper IV Luisa Luna, Torbjørn Rognes, Ann-Christin Eikså, Marit Otterlei, Erling Seeberg. Identification of a human member of a new family of DNA repair proteins with homology to E. coli Exonuclease III (manuscript in prep.)