Hide metadata

dc.contributor.authorAarag, Miriam Tamara Grødeland
dc.date.accessioned2022-03-07T23:00:23Z
dc.date.available2022-03-07T23:00:23Z
dc.date.issued2021
dc.identifier.citationAarag, Miriam Tamara Grødeland. Comparing Artificial Neural Networks and Microbial Genome Representation Methods for Taxonomic Classification. Master thesis, University of Oslo, 2021
dc.identifier.urihttp://hdl.handle.net/10852/92084
dc.description.abstractTaxonomic classification of microorganisms is useful as microorganisms play an intricate role in health. However, the microorganisms are difficult to classify both because of their diversity and unstable gene pools. Researchers are attempting to solve this issue using machine learning to handle the ever-growing amount of genomic data. While several tools are and have been developed for this purpose, there is little public research directly comparing the underlying methods used by these tools. The research discussing how to compare different ways of representing a genome numerically for example, is limited. Most of the research and tools are also developed for and tested on marker gene analysis, while other types of analysis, such as metagenomic and metatranscriptomic are less common. This master thesis explores taxonomic classification on whole genome data by performing direct comparisons on different ways of representing a genome through k-mers and testing on different types of neural networks. A training, validation, and test set was made from the GTDB database which covers a wide range of bacterial and archaea whole genomes. These genomes were transformed into k-mer representation vectors using the following methods: MinHash sketching, frequencies of random k-mers, presence of random k-mers, and discriminative k-mers. Each of these methods were tested on a set of three different artificial neural networks, standard neural network, multilayer perceptron network, and convolutional neural network. All models were measured for accuracy and precision on a test set to determine the combination of representation method and model that would be the most suitable for taxonomic classification of microorganisms. The findings indicated that a MinHash representation method on a multilayer perceptron network was the most promising. The findings also indicated k-mer counting will give better performance than k-mer presence, when the representation vectors are of equal length. For discriminative k-mers, the results were negative, but inconclusive as alterations in the implementation could potentially give a very different result. Overall, more research is necessary to form comprehensive guidelines for future classification tools.eng
dc.language.isoeng
dc.subject
dc.titleComparing Artificial Neural Networks and Microbial Genome Representation Methods for Taxonomic Classificationeng
dc.typeMaster thesis
dc.date.updated2022-03-07T23:00:23Z
dc.creator.authorAarag, Miriam Tamara Grødeland
dc.identifier.urnURN:NBN:no-94668
dc.type.documentMasteroppgave
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/92084/1/master_thesis_mtaarag.pdf


Files in this item

Appears in the following Collection

Hide metadata