Ensemble Random Forests as a tool for modeling rare occurrences

Zachary A. Siders; Nicholas D. Ducharme-Barth; Felipe Carvalho; Donald Kobayashi; Summer Martin; Jennifer Raynor; T. Todd Jones; Robert N. M. Ahrens

doi:10.3354/esr01060

ESR

Endangered Species Research

via Mailchimp

ESR 43:183-197 (2020) - DOI: https://doi.org/10.3354/esr01060

Ensemble Random Forests as a tool for modeling rare occurrences

Zachary A. Siders^1,*, Nicholas D. Ducharme-Barth², Felipe Carvalho³, Donald Kobayashi³, Summer Martin³, Jennifer Raynor⁴, T. Todd Jones³, Robert N. M. Ahrens³

¹UF/IFAS SFRC Fisheries and Aquatic Sciences Program, University of Florida, Gainesville, FL 32611, USA
²Oceanic Fisheries Programme, Pacific Community, Nouméa 98800, New Caledonia
³NOAA Fisheries, Pacific Islands Fisheries Science Center, Honolulu, HI 96818, USA
⁴Department of Economics, Wesleyan University, Middletown, CT 06457, USA

*Corresponding author: zsiders@ufl.edu

ABSTRACT: Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then down-samples the majority class for each decision tree. Results are averaged across Random Forests to generate an ensemble prediction. Through simulation, we show that ERFs outperform Random Forest with and without down-sampling, as well as with the synthetic minority over-sampling technique, for highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs’ perceived performance, as shown through simulation and case studies. In case studies from the Hawaii deep-set longline fishery, giant manta ray Mobula birostris syn. Manta birostris and scalloped hammerhead Sphyrna lewini presence had high spatial covariance and high model test performance, while false killer whale Pseudorca crassidens had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting for interacting covariates through balancing; and (4) minimization of false positives, as the majority of Random Forests within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, they are likely to be a valuable tool in bycatch and species distribution modeling, as well as spatial conservation planning, especially for protected species where presence can be rare.

KEY WORDS: Rare event bias · Species distribution modeling · Protected species · Bycatch · Machine learning · Random Forest

Full text in pdf format
Supplementary material

Cite this article as: Siders ZA, Ducharme-Barth ND, Carvalho F, Kobayashi D and others (2020) Ensemble Random Forests as a tool for modeling rare occurrences. Endang Species Res 43:183-197. https://doi.org/10.3354/esr01060

Export citation
Share: Facebook - - linkedIn

Cited by

Previous article Next article

ESR 43:183-197 (2020) - DOI: https://doi.org/10.3354/esr01060

Ensemble Random Forests as a tool for modeling rare occurrences

Zachary A. Siders1,*, Nicholas D. Ducharme-Barth2, Felipe Carvalho3, Donald Kobayashi3, Summer Martin3, Jennifer Raynor4, T. Todd Jones3, Robert N. M. Ahrens3

Zachary A. Siders^1,*, Nicholas D. Ducharme-Barth², Felipe Carvalho³, Donald Kobayashi³, Summer Martin³, Jennifer Raynor⁴, T. Todd Jones³, Robert N. M. Ahrens³