Hide metadata

dc.contributor.authorVestaberg, Sindre Ask
dc.date.accessioned2023-08-23T22:05:03Z
dc.date.available2023-08-23T22:05:03Z
dc.date.issued2023
dc.identifier.citationVestaberg, Sindre Ask. KIVS - Graph K-mer Indexer & Variant Signature Finder: Improving the performance of index creation for alignment-free genotyping. Master thesis, University of Oslo, 2023
dc.identifier.urihttp://hdl.handle.net/10852/103874
dc.description.abstractGenotyping is the process of determining what genotypes (DNA sequences) an individual has at specific locations in the genome. The traditional approach to determine these genotypes is through variant calling. However, variant calling is computationally intensive as it requires the individual's genome to be aligned to a reference genome, which is an expensive process. Thus, alignment-free alternatives were developed that, while less accurate, are significantly faster than alignment-based methods by skipping the variant calling step. These alignment-free methods rely on identifying important k-mers (strings of k bases) for a species, to then look for these in individual genomes. These important k-mers are refered to as variant signatures, as they signify the presence of a variant. Finding these variant signatures requires computationally intensive preprocessing of data on known genetic variation for the species. For the human genome, the 1000 Genomes Project provides this vast knowledge base on genetic variation to great benefit for alignment-free genotyping. KAGE is a recent and competitive alignment-free genotyper, both in terms of accuracy and speed. Compared to other existing solutions, such as Malva and PanGenie, KAGE is able to genotype both faster and more accurately. However, while KAGE has impressive performance when genotyping, this is not the case for the preprocessing of k-mers and variant signatures. Analyzing the vast amount of variant data to find and index all relevant k-mers is a time consuming process and makes it impractical to construct new indexes or update existing ones. As such, efficient solutions to these preprocessing steps would significantly improve the practicality of alignment-free solutions such as KAGE. This thesis explores performance improvements for these preprocessing steps, resulting in KIVS, a high performance Python module for k-mer and variant signature analysis. KIVS achieves high performance and usability by being implemented in C++, wrapped in an easy-to-use Python interface. The genome and its possible variations are also represented by an optimized graph using 2-bit encoding to further improve performance. While made with KAGE integration in mind, KIVS is a standalone module that can be used by other genotyping implementations as well.eng
dc.language.isoeng
dc.subjectpython
dc.subjectc++
dc.subjectbiology
dc.subjectcython
dc.subjectbioinformatics
dc.subjectgenotyping
dc.subjectKAGE
dc.subjectinformatics
dc.titleKIVS - Graph K-mer Indexer & Variant Signature Finder: Improving the performance of index creation for alignment-free genotypingeng
dc.typeMaster thesis
dc.date.updated2023-08-24T22:02:00Z
dc.creator.authorVestaberg, Sindre Ask
dc.type.documentMasteroppgave


Files in this item

Appears in the following Collection

Hide metadata