DOI: https://doi.org/10.3354/meps13689
copiedUsing machine learning to link spatiotemporal information to biological processes in the ocean: a case study for North Sea cod recruitment
ABSTRACT:
Marine organisms are subject to environmental variability on various temporal and spatial scales, which affect processes related to growth and mortality of different life stages. Marine scientists are often faced with the challenge of identifying environmental variables that best explain these processes, which, given the complexity of the interactions, can be like searching for a needle in the proverbial haystack. Even after initial hypothesis-based variable selection, a large number of potential candidate variables can remain if different lagged and seasonal influences are considered. To tackle this problem, we propose a machine learning framework that incorporates important steps in model building, ranging from environmental signal extraction to automated variable selection and model validation. Its modular structure allows for the inclusion of both parametric and machine learning models, like random forest. Unsupervised feature extractions via empirical orthogonal functions (EOFs) or self-organising maps (SOMs) are demonstrated as a way to summarize spatiotemporal fields for inclusion in predictive models. The proposed framework offers a robust way to reduce model complexity through a multi-objective genetic algorithm (NSGA-II) combined with rigorous cross-validation. We applied the framework to recruitment of the North Sea cod stock and investigated the effects of sea surface temperature (SST), salinity and currents on the stock via a modified version of random forest. The best model (5-fold CV r2 = 0.69) incorporated spawning stock biomass and EOF-derived time series of SST and salinity anomalies acting through different seasons, likely relating to differing environmental effects on specific life-history stages during the recruitment year.
KEYWORDS

Linking spatiotemporal information to North Sea cod recruitment via machine learning.
Image: B. Kühn, M. Taylor; Gears from
https:// commons. wikimedia.org/wiki/File:Gear_7.svg
(CC-BY-SA license)
Environmental processes on different temporal and spatial scales shape the life cycle of many marine organisms. Given the complexity of the interactions, identifying environmental variables that best explain biological processes can be like searching for a needle in the haystack. Kühn and co-authors propose a regression-type machine learning framework to extract information from spatiotemporal environmental data and link it to biological data via dimension reduction, multi-objective genetic algorithm and cross-validation procedures. When applied to the case study of North Sea cod recruitment, the algorithm identified spawning stock biomass, sea surface temperature and salinity as important factors in different seasons, likely relating to specific life-history stages during the recruitment year.
Bernhard Kühn (Corresponding Author)
bernhard.kuehn@thuenen.de
Marc H. Taylor (Co-author)
Alexander Kempf (Co-author)
