Distribution modelling by MaxEnt: from black box to flexible toolbox

Mazzoni, Sabrina

Doctoral thesis

View/Open

PhD-Mazzoni-DUO.pdf (18.01Mb)

Year

2016

Abstract

The easier access to increasingly powerful computational approaches and tools in the field of distribution modelling, has contributed to a proliferation of data, applications, practitioners, guidelines, and novel theoretical understandings. Recognising the dynamic link in how these elements influence one another is critical as the discipline and practices develop. The challenge of how to implement the statistically and computationally complex theory behind the MaxEnt modelling method has been overcome by the practical simplicity of the powerful, platform independent and free Java™ tool, maxent.jar. Lowering this computational, and accessibility threshold, has meant the increased use and further development of relevant digital ecological data, such as biodiversity/occurrence records held in natural history collections worldwide (GBIF -Global Biodiversity Information Facility) and GIS layers of spatio-temporal environmental background layers being developed across a diverse range of fields.

However, the computational advantages of the fixed options offered by the software have come at the expense of a full exploration of the potentials of this statistical method. Over time, the popularity of the practical shortcuts have resulted in an uncritical acceptance of the defaults, a conflation of the statistical method with the software’s black box approach, and a disconnection between theoretical and practical implications of the modelling process. A more flexible and explicit integration of these two, facilitates a much needed comparison between, and testing of, these theoretical and practical defaults, options and settings.

The aim of this thesis is to reduce the gap between the how practitioners can work with these practical tools, their understanding the body of DM theory, and MaxEnt in particular. PAPER 1 lays out the theoretical description of a novel interpretation of MaxEnt, with new settings and options, such as a new model selection and model assessment criteria, and improved user control of the variable selection process. To test this new theory in a practical way, new informatics driven approaches and tools were developed. PAPER 2 provides their detailed description and presents them as a modular toolbox in the form of a set of flexible Rscripts and functions. This new MaxEnt modelling approach and toolbox are used in PAPER 3, which looks specifically at how to identify and tackle the potential effects of sampling bias in presence only (PO) data obtained from museum collections. The application value of this alternative MaxEnt modelling procedure (aMp) is further explored and tested in PAPERS 4 and 5, where conservation management issues are addressed, as well as model purpose, model fitting and properties of the data. PAPER 4 explores how distribution modelling can be combined with phylogeographic analysis to address spatial temporal conservation issues. PAPER 5 makes use of fine grained remotely sensed LiDAR data, to explore issues related both to data properties (accuracy, spatial autocorrelation) and model complexity (variable and model selection, and model improvement). All MaxEnt models are evaluated against an independently collected field dataset, and theoretical and practical implications are discussed. PAPER 6 makes full use of this new theoretical approach and practical toolbox, and addresses MaxEnt model selection strategy by testing eight different combinations of model complexity and data properties. Finally, the paper discusses additional benefits these tool enhancements of the MaxEnt model performance and also the ecological interpretability are discussed.

In modelling, there is no single or best approach that works for everyone. There are always alternative approaches owing to our individual differences as practitioners, not solely based on the modelling tools or purposes alone. This thesis makes explicit use of both Ecological and Informatics approaches to perform a broad-scoped assessment of the relative performance of different combinations of MaxEnt options and their settings for DM with different modelling purposes, including of the specific properties of the data. By adding a flexible and traceable way to tackle this both theoretically and practically, I’ve attempted the reduce gap between the how the practitioners can work with the tools and the body of theory.

List of papers

Paper 1 Rune Halvorsen, Sabrina Mazzoni, Anders Bryn and Vegar Bakkestuen. Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography 38.2 (2015): 172-183. The paper is removed from the thesis in DUO due to publisher restrictions. The published version is available at: https://doi.org/10.1111/ecog.00565
Paper 2 Sabrina Mazzoni, Rune Halvorsen and Vegar Bakkestuen. MIAT: Modular R-wrappers for flexible implementation of MaxEnt distribution modelling. Ecological Informatics 30 (2015): 215-221. Published with an Attribution-NonCommercial-NoDerivatives 4.0 International License. https://doi.org/10.1016/j.ecoinf.2015.07.001
Paper 3 Bente Støa, Rune Halvorsen, Sabrina Mazzoni and Vladimir I. Gusarov. Sampling bias in presence-only data used for species distribution modelling: assessment and effects on models. To be published. The paper is removed from the thesis in DUO awaiting publishing.
Paper 4 Mika Bendiksby, Sabrina Mazzoni, Marte H. Jørgensen, Rune Halvorsen and Håkon Holien. Combining genetic analyses of archived specimens with distribution modelling to explain the anomalous distribution of the rare lichen Staurolemma omphalarioides: long-distance dispersal or vicariance?." Journal of Biogeography 41.11 (2014): 2020-2031. The paper is removed from the thesis in DUO due to publisher restrictions. The published version is available at: https://doi.org/10.1111/jbi.12347
Paper 5 Rune Halvorsen, Sabrina Mazzoni, John Wirkola Dirksen, Erik Næsset, Terje Gobakken and Mikael Ohlson.How important are choice of model selection method and spatial autocorrelation of presence data for distribution modelling by MaxEnt?. Ecological Modelling 328 (2016): 108-118. The paper is removed from the thesis in DUO due to publisher restrictions. The published version is available at: https://doi.org/10.1016/j.ecolmodel.2016.02.021
Paper 6 Sabrina Mazzoni, Rune Halvorsen, Vegar Bakkestuen, Inger Auestad, Trine Bekkby, Johannes Breidenbach, Desalegn Chala, John Wirkola Dirksen, Anette Edvardsen, Dag Endresen, Lars Erikstad, Hege Gundersen, Einar Heegaard, Knut Anders Hovstad, Eli Rinde and Anders Kvalvåg Wollan. Optimal MaxEnt model selection procedure evaluated by use of independent test data. To be published. The paper is removed from the thesis in DUO awaiting publishing.