Inter-Research >  > Prepress Abstract

ESR prepress abstract   -  DOI: https://doi.org/10.3354/esr01060

Ensemble Random Forests as a tool for modeling rare occurrences

Zachary A. Siders*, Nicholas D. Ducharme-Barth, Felipe Carvalho, Donald Kobayashi, Summer Martin, Jennifer Raynor, T. Todd Jones, Robert N. M. Ahrens

*Corresponding author:

ABSTRACT: Relative to target species, priority conservation species occur rarely in fishery interactions resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then down-samples the majority class for each decision tree, and averages across Random Forest to generate an ensemble prediction. Through simulation, we show ERFs outperform Random Forest with and without down-sampling as well as the synthetic minority over-sampling technique from highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs’ perceived performance as shown through simulation and case studies. For case studies from the Hawaii deep-set longline fishery, giant manta ray (Mobula birostris syn. Manta birostris) and scalloped hammerhead (Sphyrna lewini) had high spatial covariance in their presences and high model test performance while false killer whale (Pseudorca crassidens) had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting of interacting covariates through balancing; and (4) minimization of false positives as the majority of Random Forest within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, it is likely to be a valuable bycatch and species distribution modeling, as well as spatial conservation planning tool, especially for protected species where presences can be rare events.