Using the KDE method to model ecological niches: A response to Blonder et al. (2017)
2017; Wiley; Volume: 26; Issue: 9 Linguagem: Inglês
10.1111/geb.12610
ISSN1466-8238
AutoresHuijie Qiao, Luis E. Escobar, Erin E. Saupe, Liqiang Ji, Jorge Soberón,
Tópico(s)Ecology and Vegetation Dynamics Studies
ResumoRecently, we noted (Qiao, Escobar, Saupe, Ji, & Soberón, 2016) that multivariate kernel density estimation (KDE) may not outperform other methods for estimating hypervolume geometries and, moreover, under certain circumstances the algorithm would not detect ‘holes’ in environmental space, as Blonder, Lamanna, Violle, and Enquist (2014) had proposed. In our original note (Qiao et al., 2016), we explained that KDE (a) is sensitive to both sample size and environmental dimensionality, (b) may overestimate niche volumes in low dimensions and constrict niche volume estimates in high dimensions, and (c) is useful only to the extent that the realized niche is sought and not the fundamental niche. Here, we also note that bandwidth is a crucial parameter for KDE, and its selection needs to be evaluated rigorously and all assumptions stated clearly. In their response to our comments, Blonder et al. (2017) indicated that (a) KDE output depends in useful ways on dataset size and bias, (b) other species distribution modelling methods make equally stringent but different assumptions about dataset bias, (c) we made an incorrect data transformation in our original experiments that may result in unfair comparisons, and (d) hypervolume methods are more general than KDE and have other benefits for niche modelling. We address these points below, which we divide into two main categories: methodological concerns and theoretical concerns. Blonder et al. (2017) criticized our transformation of units during the modelling process. We had log-transformed the data when constructing KDE hypervolumes following the log-transformed data framework in the ‘hypervolume’ R package demo code (see https://cran.r-project.org/web/packages/hypervolume/index.html), developed by Blonder et al. (2014). However, we agree with with Blonder et al. (2017) that this transformation is not necessary if hypervolumes are subsequently delineated and plotted in untransformed space. Consequently, we re-ran our analyses using untransformed units and two bandwidth configurations to explore their influence on model results (Supplementary Information Figure S1): the default estimated using the ‘estimate_bandwidth()’ function in the ‘hypervolume’ package from Blonder et al. (2014), and the value obtained from this function when divided by two to obtain a model with high fit with the data. Our results remained consistent using the default bandwidth from estimate_bandwidth(). That is to say, our previous conclusions (Qiao et al., 2016) hold regardless of whether we use transformed or untransformed data; KDE overestimated niche volumes in low environmental dimensions, underestimated niche volumes in high dimensions, was unable to detect holes in environmental space with low sample size (Supporting Information Figures S2c and S3c) and was plagued by decreased sensitivity (Supporting Information Figure S4). The other evaluation metrics, including specificity, hypervolumes and the Jaccard similarity index (Supporting Information Figures S5–S7), show patterns similar to those reported in our previous analyses (Qiao et al., 2016). Our re-analysis using a smaller bandwidth detected holes (Supporting Information Figures S2.3c and S3.3c), but at the cost of increased type II error (Supporting Information Figures S1.3a, S2.3a and S3.3a). This is an important intervention in the model parameterization that should not be overlooked; a pragmatic a posteriori parameterization was necessary for KDE to reconstruct the ‘holey’ niche of the virtual species effectively. Of course, when dealing with data from real species with unknown niche shapes, bandwidth selection would be more complex. The bandwidth is a crucial parameter for KDE, which we noted in our original manuscript, and bandwidth selection deserves further research. We recommend that researchers using KDE explain their assumptions during bandwidth selection, explore a series of bandwidth configurations and present the results of these models for more informed conclusions. The remaining points made by Blonder et al. (2017) are primarily conceptual in nature. Blonder et al. (2014, 2017) argue that fundamental niches can have holes and complex shapes in higher dimensions. Although we argue that this is still far from certain, what is most relevant for discussions herein is that niches in high dimensions may be highly clustered in the central regions of environmental space (Drake, 2015), such that any holes in niche ‘hyperspace’ are, once again, difficult to identify and determine. Blonder et al. (2017) quote several references in the literature to suggest that fundamental niches may have complex forms. However, our own perusal of these references indicates that these estimates (obtained either from first-principle models or experimental data) have either simple convex shapes, or the data presented include only a few points, making the estimation of ‘complex shapes’ a doubtful exercise at best. This point, however, may be moot. KDE can be a useful method to fit shapes in multivariate spaces; interpretation about the meaning of the shapes may be best left to the researchers. We note that an overfitted KDE may be no more informative than using the original species occurrence records to identify the occupied environments, and a complex model with high fit (e.g., narrow KDE bandwidth) would be redundant. Comparison of KDE with other, more physiologically realistic methods (e.g., range bagging; Drake, 2015) is warranted. We agree that KDE is a promising method and should be included in the toolbox of ecological niche modellers. KDE has both pros and cons that may not be present in other algorithms, making it complementary and not opposed to other methods. Indeed, our original cautionary note (Qiao et al., 2016) was inspired by our interest in preventing the adoption of single algorithms as ‘silver bullets’ for characterizing fundamental and/or realized niches for any given species. The algorithm of choice depends on the nature of the research question, as Blonder et al. (2017) also note, and on the nature of the research data. The authors thank Pedro Peres-Neto and Ben Blonder for a productive discussion about applications of hypervolume methods in modern ecology. The authors declare that there is no conflict of interest regarding the publication of this manuscript. Additional Supporting Information may be found online in the supporting information tab for this article. FIGURE S1 Re-analysis of type I and II error estimated using the multivariate kernel density estimation (KDE) method with sample sizes of m = 1,000 for different sampling configurations and bandwidth methods. We compared the following three study designs: (1) the original experiment proposed in our cautionary note (Fig. 2, Qiao et al., 2016); (2) untransformed data using the ‘estimate_bandwidth()’ function in the ‘hypervolume’ package from Blonder et al. (2014); and (3) untransformed data with the smaller bandwidth configuration used in Blonder et al. (2017). For each group, the red rectangle denotes a virtual fundamental niche (FN), while the blue points represent (a) unbiased, (b) biased, (c) ‘holey’, as indicated by the black rectangle, and (d) two-clustered observations of the virtual FN. The green polygons are the estimated niche from the KDE method based on the blue observations. The overlap (pink) of the virtual FN and the estimated niche is the portion of virtual FN correctly predicted by the KDE method. The yellow area outside of the virtual FN denotes type I error resulting from the KDE method. The white area with cross-shading denotes type II error resulting from the KDE method FIGURE S2 Re-analysis of type I and II error estimated using the multivariate kernel density estimation (KDE) method using sample sizes of m = 100 for different sampling configurations and bandwidth methods. Legend as in Fig. S1 using a different number of occurrences (i.e., 100). Note that with 100 occurrences, KDE was unable to identify the hole in (1) and (2) FIGURE S3 Re-analysis of type I and II error estimated using the multivariate kernel density estimation (KDE) method using sample sizes of m = 10 for different sampling configurations and bandwidth methods. Legend as in Fig. S1 using different number of occurrences (i.e., 10). Note that with only 10 occurrences, KDE was unable to identify the hole in (1) and (2) FIGURE S4 Re-analysis of sensitivity for each modelling method based on the different fundamental niche shapes. We followed Qiao et al. (2016) and Blonder et al. (2017) to compare model performance using different niche configurations during analysis, as follows: (1) transformed data using the ‘estimate_bandwidth()’ function in the ‘hypervolume’ package from Blonder et al. (2014); (2) untransformed data using the ‘estimate_bandwidth()’ function; and (3) untransformed data with a smaller bandwidth (‘default’ bandwidth divided by two). Sensitivity of each modelling method was based on three virtual fundamental niche shapes: (a) range boxes, (b) convex-hulls, and (c) ellipsoids. The ‘true’ fundamental niches were estimated using four modelling methods: range box (RB; green), convex hull (CH; blue), minimum-volume ellipsoid (MVE; yellow), and multivariate kernel density estimation (KDE; red). Each boxplot represents the sensitivity of the models according to 10 independent subsamples of observations (m=10, 100, 1,000) collected randomly in a two- to eight-dimensional dataset (e). Boxes closer to the top indicate better predictions (n) in the form of high sensitivity. Note that results are similar to those reported previously by Qiao et al. (2016) FIGURE S5 Re-analysis of specificity for each modelling method based on the different fundamental niche shapes. Legend as in Fig. S4. Boxes closer to the top indicate better niche predictions (n) in the form of high specificity FIGURE S6 Re-analysis of volume for each modelling method based on the different fundamental niche shapes. Legend as in Fig. S4. Boxes closer to the black dot indicate highest similarity between the volume of the niche predictions (n) and the true shape (FN) FIGURE S7 Re-analysis of the Jaccard similarity index for each modelling method based on the different fundamental niche shapes. Legend as in Fig. S4. Boxes closer to the top indicate highest similarity between niche predictions (n) and the true shape (FN) Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
Referência(s)