PyBayenv: A framework for interpreting, testing and optimizing Bayenv analyses

dc.contributor.author	Ring, Kristoffer Hofaker
dc.date.accessioned	2015-08-10T22:01:05Z
dc.date.available	2016-05-04T22:31:00Z
dc.date.issued	2015
dc.identifier.citation	Ring, Kristoffer Hofaker. PyBayenv: A framework for interpreting, testing and optimizing Bayenv analyses. Master thesis, University of Oslo, 2015
dc.identifier.uri	http://hdl.handle.net/10852/44752
dc.description.abstract	Loci involved in local adaptation may potentially be identified by the correlation between population allele frequencies and environmental variables. Several statistical methods for this purpose have been developed and a relatively new method known as BAYENV has become a popular and consequently receiving a lot of attention. By using a set of presumed neutral SNPs as a null model, BAYENV attempts to control for the effects of population structure when testing for correlation to environmental variables. BAYENV has proven to perform well when compared to the alternatives in studies evaluating differential based methods. However, there are several challenges associated with the BAYENV method. The use of Markov Chain Monte Carlo (MCMC) algorithms to evaluate complex statistical models makes the method vulnerable to a high run-to-run variability. Hence, it is recommendable to compare the results from several independent runs of the algorithm before drawing conclusions. Moreover, the method presents its results on the form of a Bayes Factor whose interpretation is not as well known as its frequentistic counterpart, the p-value - especially not in the context of multiple hypothesis testing. Additionally, the extensive use of MCMC algorithms, as well as a multi-step procedure for carrying out the analysis, makes BAYENV both time intensive and cumbersome to use. Here we address several of the issues regarding the use of BAYENV and interpretation of its results. We propose an automated method to assign a significance level for an empirical distribution Bayes factors. The method, named the Second Difference Method (SDM), make use of the second difference to detect where the distribution has a significant change in the positive direction. By using SDM on the results from two SNP datasets, we find the method to be more reliable than conventional methods such as a percentage or static cutoff in terms of FDR. As a measure to reduce the overall time consumption of BAYENV we suggest a method where SNPs with low allele frequency difference between populations are excluded from the test phase of BAYENV This method showed promising results when tested on a dataset containing SNP data from Atlantic cod (Gadus morhua L.). To make the BAYENV analysis more user friendly and to test our hypotheses, we developed a wrapper program for BAYENV named PyBAYENV. Among other features in PyBAYENV we implemented a mode where several instances of BAYENV were allowed to run in parallel. By parallelizing the process we were able to greatly reduce the time spent when performing multiple BAYENV analyses.	eng
dc.language.iso	eng
dc.subject	Bayesian
dc.subject	analysis
dc.subject	SNPs
dc.subject	population
dc.subject	genomics
dc.subject	genomic
dc.subject	adaptation
dc.title	PyBayenv: A framework for interpreting, testing and optimizing Bayenv analyses	eng
dc.type	Master thesis
dc.date.updated	2015-08-10T22:01:05Z
dc.creator.author	Ring, Kristoffer Hofaker
dc.identifier.urn	URN:NBN:no-49028
dc.type.document	Masteroppgave
dc.identifier.fulltext	Fulltext https://www.duo.uio.no/bitstream/handle/10852/44752/1/master_thesis-Kristoffer-H-Ring-final.pdf

Files in this item

Name:: master_thesis-Kristoffer-H-Rin ...
Size:: 17.10Mb
Format:: application/

View/Open

Appears in the following Collection

Institutt for informatikk [4956]

Hide metadata

PyBayenv: A framework for interpreting, testing and optimizing Bayenv analyses

Files in this item

Appears in the following Collection

Browse

For library staff

RSS Feeds