Hide metadata

dc.contributor.authorRing, Kristoffer Hofaker
dc.date.accessioned2015-08-10T22:01:05Z
dc.date.available2016-05-04T22:31:00Z
dc.date.issued2015
dc.identifier.citationRing, Kristoffer Hofaker. PyBayenv: A framework for interpreting, testing and optimizing Bayenv analyses. Master thesis, University of Oslo, 2015
dc.identifier.urihttp://hdl.handle.net/10852/44752
dc.description.abstractLoci involved in local adaptation may potentially be identified by the correlation between population allele frequencies and environmental variables. Several statistical methods for this purpose have been developed and a relatively new method known as BAYENV has become a popular and consequently receiving a lot of attention. By using a set of presumed neutral SNPs as a null model, BAYENV attempts to control for the effects of population structure when testing for correlation to environmental variables. BAYENV has proven to perform well when compared to the alternatives in studies evaluating differential based methods. However, there are several challenges associated with the BAYENV method. The use of Markov Chain Monte Carlo (MCMC) algorithms to evaluate complex statistical models makes the method vulnerable to a high run-to-run variability. Hence, it is recommendable to compare the results from several independent runs of the algorithm before drawing conclusions. Moreover, the method presents its results on the form of a Bayes Factor whose interpretation is not as well known as its frequentistic counterpart, the p-value - especially not in the context of multiple hypothesis testing. Additionally, the extensive use of MCMC algorithms, as well as a multi-step procedure for carrying out the analysis, makes BAYENV both time intensive and cumbersome to use. Here we address several of the issues regarding the use of BAYENV and interpretation of its results. We propose an automated method to assign a significance level for an empirical distribution Bayes factors. The method, named the Second Difference Method (SDM), make use of the second difference to detect where the distribution has a significant change in the positive direction. By using SDM on the results from two SNP datasets, we find the method to be more reliable than conventional methods such as a percentage or static cutoff in terms of FDR. As a measure to reduce the overall time consumption of BAYENV we suggest a method where SNPs with low allele frequency difference between populations are excluded from the test phase of BAYENV This method showed promising results when tested on a dataset containing SNP data from Atlantic cod (Gadus morhua L.). To make the BAYENV analysis more user friendly and to test our hypotheses, we developed a wrapper program for BAYENV named PyBAYENV. Among other features in PyBAYENV we implemented a mode where several instances of BAYENV were allowed to run in parallel. By parallelizing the process we were able to greatly reduce the time spent when performing multiple BAYENV analyses.eng
dc.language.isoeng
dc.subjectBayesian
dc.subjectanalysis
dc.subjectSNPs
dc.subjectpopulation
dc.subjectgenomics
dc.subjectgenomic
dc.subjectadaptation
dc.titlePyBayenv: A framework for interpreting, testing and optimizing Bayenv analyseseng
dc.typeMaster thesis
dc.date.updated2015-08-10T22:01:05Z
dc.creator.authorRing, Kristoffer Hofaker
dc.identifier.urnURN:NBN:no-49028
dc.type.documentMasteroppgave
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/44752/1/master_thesis-Kristoffer-H-Ring-final.pdf


Files in this item

Appears in the following Collection

Hide metadata