Abstract
The efforts by the international genome sequencing projects have resulted in huge and exponentially growing databases of public DNA and protein sequence information. The complete genome sequence of many organisms has already been published, and even the human genome passed the phase of sequencing as of writing.
However, a detailed analysis of these genomes, genes, and gene products is necessary in order to reach a better understanding of their function in the cells of the organism. The major part of the analysis requires experimental biology and biochemistry, however, much information can be obtained by sequence analysis using computational methods.
Fundamental tasks in this analysis are the comparison of two sequences and the searching of databases of amino acid and nucleotide sequences for a similar sequence. This will often reveal valuable information about the possible structure and function of the protein. Several programs exist for performing such searches with varying sensitivity and speed. Accurate database searches may require large computational resources. As the databases are getting larger, longer time is required to search them. In addition, more sensitive tools are required in order to identify less obvious relationships between protein. The aim of this work was hence to develop novel algorithms for database searching with increased sensitivity and speed.
This work presents three new methods for performing both sensitive and rapid database searches. Two of the methods gain speed by taking advantage of 8-way parallel processing technology now available in common computers. By the use of some of these tools, a new family of proteins have also been identifi ed.