First open access ensemble climate envelope predictions of Assamese macaque Macaca assamensis in Asia: a new role model and assessment of endangered species

Species distribution models are a key component for understanding a species’ potential occurrence, specifically in vastly undersampled landscapes. The current species distribution data for the Assamese macaque Macaca assamensis are outdated, but suggest a patchy distribution in moist broadleaved forests in South and Southeast Asia. Therefore, in this study, we used a species distribution model to explore the potential climatic niche of this species and assess its distribution and potential barriers in 12 South and Southeast Asian countries. We combined primary and secondary species occurrence records from different countries. We applied Classification and Regression Tree (CART), TreeNet (boosting), RandomForest (bagging) and Maximum Entropy (MaxEnt) machine-learning algorithms with elevation as well as 19 bioclimatic variables for the first ensemble predictions ever completed for this species. Our results suggested that the predicted distribution of the Assamese macaque is strongly associated with precipitation of warmest quarter (BIO18), temperature annual range (BIO7) and temperature seasonality (BIO4). Our prediction shows a continuous potential climatic niche of the species from east of the Kaligandaki River in Nepal to Lao People’s Democratic Republic. There are also potential niche patches in Bhutan, Southeast China, Thailand and Cambodia, while Pakistan and Afghanistan have no potential niche for the species. We believe that our workflow presents a new conservationoriented open access research template to progress empirical primate conservation worldwide.


INTRODUCTION
In addition to basic knowledge about diversity and life history, conservation of the world's primates requires data on many more basic metrics, including distribution (Cushman & Huettmann 2010, Drew et al. 2011).However, there are many elusive species and it is often difficult to obtain information on the characteristics of their distribution, such as geography and habitat requirements.Ecological niche modeling techniques -often referred to as species distribution models (SDMs) (Guisan et al. 2006) -are used to quantify and map the distribution of elusive species on a large spatial scale using known occurrence records (Huettmann & Diamond 2001, Guisan et al. 2006).SDMs aid rapid assessment and can be used as a quantification tool for estimating the predicted range and potential distribution of and barriers to the species occurrences in areas not previously surveyed (Thorn et al. 2009), which may be difficult to access (see Kandel et al. 2015 for an example).Wildlife management is often based on the information embodied in maps generated using SDMs.Such maps may also help to identify barriers and provide predicted population estimates in space and time (Drew et al. 2011).Even when 'just' based on climate envelopes, they provide a baseline for predicting a species' response to human-induced threats and impacts, such as landscape alterations and climate change, and for the identification and prioritization of key conservation sites (Thorn et al. 2009, Vidal-García & Serio-Silva 2011).
This study aimed to assess the distribution of the Assamese macaque Macaca assamensis (taxonomic serial number TSN 573018; see Supplement 1 at www. int-res.com/ articles/ suppl/ n036 p149 _ supp.pdf for more taxonomic details) using the latest research methods and data.This is important because the IUCN has already classified the Assa mese macaque as a N ear Threatened species (Boonratana et al. 2008).The limited empirical data available suggest that this species is patchily distributed in South and Southeast Asian countries (Wada 2005).However, a coherent, empirical and useful distribution map across its range is still unavailable, and barriers are not well described or explained on a landscape level.So far, this species has been re por ted from the lower and middle ranges of the Hima layas reaching from central N epal eastward through Bhutan, northeastern India, northern and eastern Myanmar, southern China, northern and western Thailand, Lao People's Democratic Republic (PDR) and northern Vietnam.While the species is recorded between 500 and 3500 m above sea level, an isolated record is available from the Sundarbans coastal swamps of southwestern Bangladesh (Fooden 1982).
The lack of meaningful and available distribution data, or in lieu, ecological niche models and computationally intense data-mining methods, prevents us from addressing conservation management questions with high and modern rigor.This is an almost global situation for behavior-dominated animal research (Jochum & Huettmann 2010; but see Yang et al. 2015 for an application), which hampers effective conservation management for those species, while climate change progresses and habitats decay further.It is inherently clear that the Assamese macaque and conservation management must be seen in a wider and more global landscape context using the best-available tools possible.Furthermore, a better and more relevant understanding of this taxon is needed given the massive habitat changes in Asian countries (Huettmann 2012 and related chapters within).We believe that the provision of a quantitative distribution formula and use of the latest computational tools and GIS data would help primatologists, biogeographers, evolutionary biologists and conservation managers to identify geographically suitable habitat areas for this species for the first time; this should be done in a consistent fashion using the latest open access data and global assessment tools, which are essential for anyone interested in such questions (sensu Nemitz et al. 2012, Kandel et al. 2015).Although this study is designed at the macro level, we posit that it can further our understanding of the historic macaque evolutionary pattern and that it can help in minimizing anthropogenic pressure on the species' predicted potential habitat.

Study area
Our study area consists of landscapes from 12 Asian nations: Afghanistan, Pakistan, China, Nepal, India, Bhutan, Bangladesh, Myanmar, Thailand, Vietnam, Lao PDR and Cambodia (Fig. 1).Of these, the first eight are part of Hindu-Kush Hima laya (HKH see www.icimod.orgfor details); the others are inherently connected to HKH through climate, watersheds and culture.

Study species
The Assamese macaque Macaca assamensis inhabits the foothills of the Central and Eastern Himalayas and the adjoining mountain chains of South and Southeast Asia (Fooden 1980, 1982, Molur et al. 2002, Boonratana et al. 2008).Fooden (1982) reported for this species that forest type (e.g.broadleaf evergreen forests) rather than the temperature per se determines the distribution of Assamese macaque.For more details on this species see also the IUCN (www.iucnredlist.org/details/12549/0)and associated information, e.g. the Integrated Taxonomic Information System (ITIS; www.itis.gov/servlet/Single Rpt/ Single Rpt?search_topic=TSN&search_value=573018#null).
The species -along with M. sinica, M. radiata, M. thibetana, M. munzala (Sinha et al. 2005) and the recently discovered M. leucogenys (Li et al. 2015) -is assigned to the sinica group of macaques.This is a wider species group that is defined by shared and derived characters of male and female reproductive anatomy (Fooden 1980(Fooden , 1982)), phylogenetic relationship and evolutionary legacy (Chakraborty et al. 2007).There are 2 clearly recognized subspecies of M. assamensis on the basis of relative tail length in adult males: the Southeast Asian M. assamensis assamensis (TSN 945194, commonly known as the eastern Assamese macaque) has a shorter tail in relation to the sub-Himalayan M. assamensis pelops (TSN 945195, commonly known as western Assa mese macaque; see Supplement 1 for more taxono mic species details).The Brahmaputra River of northeastern India is the zoogeographical barrier separating these 2 subspecies (Fooden 1982).Assa mese macaques live sympatrically with congeneric macaques i.e.M. mulatta, M. thibetana, M. munzala, M. leonina, M. arctoides and possibly with the newly discovered M. leucogenys in Southeast China; in many areas, 3 or 4 macaque species inhabit the same forest (Sharma et al. 2012).

Collating 'presence-only' data of Assamese macaque
Our methods are outlined as a workflow in Fig. 2. We compiled our own field survey (primary) pres-ence locations as well as published (secondary) occurrences of the species that were accessible to us.All relevant data are given in Supplement 2 (see www. intres.com/articles/suppl/n036p149_supp 2+ 4.zip) and are publicly available there.First, we collected the data from our own fieldwork in N epal (2005− 2013), India (2009India ( − 2012) ) and Lao PDR (2011−2013).Second, we compiled occurrence data for Assamese macaque from the Glo bal Biodiversity Information Facility website (www.gbif.org).GBIF had 17 occurrence records we could use, all of which came from museum records.Third, a few presence locations were provided by researchers who had previously conducted primate surveys in Nepal, India, Thailand and Vietnam within the Assamese ma caque range.Finally, we extracted some presence locations of this species from known locations and from various published sources (Chalise & Ghimire 1998, Chalise 1999, Molur et al. 2002, Wada 2005, Regmi & Kandel 2008).All data are confirmed presences, and data can be viewed in detail by any investigators interested; we have made these data publicly available with ISOcompliant metadata in the public domain via dSPACE of the library of the University of Alaska Fairbanks (UAF).This work resulted in a total of 296 'presenceonly' locations of Assamese ma caque (Fig. 1).

Predictors used for climate envelope modeling
As outlined in Breiman (2001) and Drew et al. (2011), our models are correlative and predictive; therefore, we employed many environmental proxies in our machine-learning algorithms.We used 20 different predictor variables (see Table 1 for details) to estimate the ecological niche.These variables in cluded temperature and precipitation as well as their derivatives and elevation data.We used open access data for 19 bioclimatic variables from WorldClim (www.worldclim.org).These data are well described by Hijmans et al. (2005).The 90 m Shuttle Radar Topography Mission digital elevation model (DEM) is described in detail by US Geological Survey (http:// gdex.cr.usgs.gov/gdex/;see Herrick et al. 2013 andKandel et al. 2015).Because we rely on tree-based models, which are non-parametric, there is no relevant need to deal with correlation issues in the dataset per se.

Modeling process
Our data input and models are open access (Supplement 2 and www.worldclim.org).We generally fol lowed Breiman (2001) to extract ma jor signals from the data and for inference from predictions.Our model methods are robust when using op portunistic data locations; see also Coudrat & N ekaris (2013), Kandel et al. (2015) and Moyes et al. (2016) for similar ap pli cations.Before preparing the mo del, we first prepared a dataset including pseudo-absence (random) points and lattice points within the study area using the Geospatial Mo delling Environment (GME) software (www.spatial ecology.com/ gme/).The random points were generated with a minimum distance of 800 m (n = 140 280 point locations), and a lattice points grid (total 154 810) file was created with regular spaced points with a neighboring distance of 10 km.The specific spacings (random and lattice) were made to obtain a more representative and equally distributed sample size for the vast study area.Then the values from all 20 predictor variables were extrac ted to presence, random and lattice points using GME software.The random points were combined with the presence points into one binomial data table and analyzed further.
We used 2 different platforms for data mining and predictive modeling, namely, SPM7 by Salford Systems (www.salford-systems.com/en/;see Popp et al. 2007 for an example) and MaxEnt (Phillips et al. 2006).In SPM7, we used 3 different algorithms, namely, Classification and Regression Tree (CART), TreeN et (TN ; boosting) and RandomForest (RF; bagging).We employed their default settings because we know these tend to perform the best for data problems such as those faced here.From all these algorithms, we made predictions (relative index of occurrence (RIO)) on the lattice points from the presence−random data model.
Maximum Entropy (MaxEnt; Phillips et al. 2006) was run with default settings and logistic output, and the result obtained in ASCII format was converted to a raster format in ArcGIS 10 (ESRI).The raster RIO value for each point in the lattice was then extracted in GME again.This allowed us to compare predicted results from all algorithms.
The actual RIO value ranges of the different models were not numerically consistent; therefore, to make a better comparison, the RIO values of all the algorithms were rescaled from a 0 (less suitable) to 1 (suitable) range.Then we prepared an ensemble RIO by taking the mean of the rescaled RIOs from all 4 models.In many cases, the RIO continuous maps can make it difficult to decide whether an area is suitable or not (thresholding problem).Hence, we decided to prepare a binary (suitable or unsuitable) map using a threshold.We used expert knowledge to determine the threshold.The list of 6 internal and 3 external macaque experts is given in Supplement 3. We prepared maps of every RIO 0 to 1 value at 0.1 intervals (e.g.0, 0.1, 0.2, … 0.9 and

Model selection
All models in the ensemble showed high area under the curve (AUC) metrics (Table 2).Nearly all their AUC metrics were over 0.95.This is sufficient for rapid assessment studies and our large scale, allowing for robust inference (sensu Breiman 2001).The resulting map is shown in Fig. 3.However, using pseudo-absences and AUC within the model algo-

Variable importance
Tables 3 & 4 show a summary of the predictors and their ranking, according to the model algorithms in the ensemble.We find that predictors BIO4 (temperature seasonality), BIO7 (temperature annual range), BIO18 (precipitation of warmest quarter), elevation and BIO6 (mean temperature of coldest month) are the top 5 predictors for Assamese macaque.A 2dimensional plot of the 2 top predictor combinations can be seen in Fig. 4, exemplifying where this species occurs in the 2-dimensional predictor niche space (Fig. 5 shows that the TreeN et model is optimized and stable).

Partial dependence plots
The individual partial dependence plots in Figs. 6 to 10 show how Assamese macaques link functionally with the identified top 5 predictors.These can be generalized and interpreted approximately as resource selection functions (RSFs) (Manly et al. 2002).The y-axis shows the RIO (presence or random) and the x-axes show the specific predictors.

Predicted Assamese macaque hotspots
We have predicted here the suitable geographic areas (hotspots) for Assamese macaques using a multi variate set of bioclimatic and elevational predictors (Fig. 2).However we have not dealt with genetics, interspecific competition, predators, diseases, humaninduced survival threats, natural calamities and geographic barriers in our analysis.These predicted hot-and cold spots are completely based on the suitable climatic niche and the few known presence locations, all expressed in a quantitative fashion for further applications.Machine learning and the ensemble model were very well able to extract major signals from the complex data cube we assembled.
Our ensemble model, which we derived from averaging all models (CART, TreeN et, RandomForests and Maxent), shows that when using an expert threshold, a total area of about 534 200 km 2 is potential habitat for Assamese macaque in South and Southeast Asia.India has the largest suitable area (141 300 km 2 ), followed by My anmar (124 100 km 2 ), Laos (82 000 km 2 ), Vietnam (72000 km 2 ), China (41700 km 2 ) and then Nepal (34 700 km 2 ) (details in Supplement 5).As ex pected, our results show that there are no suitable areas for Assamese macaques in Pakistan and Afghanistan.Although the en semble model predicted a 200 km 2 geographic area in Cambodia, it seems that this area is not enough to support viable macaque populations in this country.However, since this small predicted area adjoins the Laos and Vietnam regions, overall, this area could be supportive for the adjoining population of Assamese macaques.This deserves more research, because to date, As samese macaques have not been reported from Cambodia.The predicted areas and the As samese ma caque hot-and coldspots maps are available online as Supplements 6-11.

DISCUSSION
The data, models and their outputs produced here are the first of their kind for predicting the climate envelope distribution of Assamese macaque Macaca assamensis using an open access data-sharing concept.Our models performed well, with AUCs > 0.95.Our models consistently selected BIO4, BIO7, BIO18, elevation and BIO6 as a multivariate 'predictor package' important for determining the macaque distribution; our results support that animals as complex as monkeys cannot be studied well with reductionistic single-predictor approaches and parsimony.These predictors we identified serve as pro xies, but are the key determinants of the vegetation structure and distribution pattern for this complex tro pical study area.Our models also confirmed that there are no geographically suitable areas in Afghani stan and Pakistan, possibly due to the absence there of any suitable habitat for Assamese macaque.
As typically found in such models, we used climate envelope predictions in place of forest type in our analysis, and this provided a rather strong prediction.Hence, our ecological niche models revealed only the potential geographic distribution of the As sa mese macaque based on the bioclimatic and altitude variables.Nevertheless, our study provides the first ever database (see Supplements 2 & 4 at www. int-res.com/ articles/suppl/n036p149_ supp 2+ 4.zip), which can be further refined with additional data collected subsequently on the distribution of Assamese ma ca que.Socio-economic data come to mind as the next choice for insight and management.It should be noted that, when computing distribution, these models do not yet fully account for the possible interactions with other sympatric macaques, predators, diseases and dispersal (Elith et al. 2006); hence, the generated distribution map represents the potential climatic niche of the species and not the realized niche.Further research is required to develop better models in space and time which will take into consideration the realized niche and a pixel scale analysis of different niche types.
The pioneering thoughts about why species are where they are, and why they are absent, ultimately lead to the concept of comparing the environmental factors inside and outside a species' known distribution range (Hutchinson 1957in Cushman & Huettmann 2010).Our predictive mo dels have identified mid-elevation moist broadleaf evergreen forest as an important habitat (which is ex pressed through proxies of that vegetation zone).The areas where our models predicted unsuitable habitat, for instance, are areas west of the Kaligandaki River in Nepal, a region lacking broadleaf evergreen forests (Fooden 1980, 1982, Polunin & Stainton 1999).Temperature and precipitation are 2 of the important factors that determine the occurrence of different types of forest in the Himalayas (Polunin & Stainton 1984).More attention should be given to the Kaligandaki River boundary zone and its features for Assamese macaque in the future, as it appears to be a major factor influencing the occurrence of the species.
This statement is further supported by our model results, as most of our models selected BIO4 (temperature seasonality), BIO7 (temperature annual range) and BIO18 (precipitation of warmest quarter) as the top climatic predictors.Fooden (1982) suggested that forest type rather than temperature determines the distribution of Assamese macaque; however, this is not based on quantitative analysis or modern methodology and does not include the latest data.In addition, vegetation is largely driven by the 'temperature niche' and thus these factors cannot really be separated from each other but come together as a multivariate package.The complete predicted ab sence of geographically suitable areas for the species in Pakistan and Afghanistan in our models supports this wider notion, as mid-elevation broadleaf evergreen forests are widely absent in these countries (Polunin & Stainton 1999).Al though a few troops of Assamese macaque have been re corded in forests dominated by deciduous trees and bamboo as well as in agricultural lands (Fooden 1982, Regmi & Kandel 2008, Regmi et al. 2013), these are still situated in the imme diate vicinity of broadleaf evergreen forest ecosystems with rock outcrops along rivers and streams.
Traditional biogeography theory already suggests that species do not inhabit all areas that meet their niche requirements; rather, barriers to dispersal will often restrict species to a subset of these areas (Peterson 2006, Lomolino et al. 2010).
Asia is already known for some major biogeographical barriers, e.g.Wallace Line (Lomolino et al. 2010) and several plant species in the Himalayas (Miehe & Pendry 2016).The distribution of Indian fish shows a barrier that is explained by the Satpura hypothesis (Hora 1944(Hora , 1949)).Here we add our prediction for a Kaligandaki River barrier based on the Assamese macaque (Fooden 1982 identified the Brahmaputra River in northeastern India as the zoogeographical barrier separating the 2 subspecies).
Our ecological niche models have predicted some geographically suitable areas in the Western Ghats of India for Assamese macaques, which are outside the known distribution range of the species (Boon-ratana et al. 2008).However, there are occurrences of 3 species of macaques in the Western Ghatsrhesus macaque M. mulatta, bonnet macaque M. radiate and lion-tailed macaque M. silenus.In ecological niche models, some geographically suitable areas are expected to be uninhabited by the species (Peterson 2006), possibly due to biotic interactions, predators, historic mass extinction because of diseases and natural catastrophes, niche shifts and niche evolution due to climate change (over thousands and millions of years) and dispersal barriers.Such niche spaces can remain empty, or are occupied by sister taxa, for instance.Although some predicted regions (e.g.Western Ghats in India) are not used by Assamese macaques, such model predictions remain useful for determining species' ranges, analyzing their evolution and informing conservation, as ecological niche models are known to offer considerable predictive ability for understanding the distribution of a species, and possibly also a related (sister) species that shares a similar ecological niche (Peterson 2006).
Similarly, ecological niche models may not predict all the areas where species were historically present.In our models, our prediction did not capture habitat for Assamese macaque in the Sundarbans areas of Bangladesh where there are historic records of the species (Fooden 1982, Wada 2005, Kawamoto et al. 2006, Boonratana et al. 2008).The isolated Sundarbans record of M. assamensis in Bangladesh is peculiar, but apparently valid (Fooden 1982); however, the places outside of the known distribution range of M. assamensis (Boonratana et al. 2008) are usually a result of misidentification of specimens of rhesus macaque Macaca mulatta, which is superficially similar to M. assamensis (see Fooden 1982, p. 2 for details).Here, niche models and our maps can be used to provide a taxonomic quality check for the historical species data provided for such locations.
Our models have covered all the known Assamese macaque locations mentioned by Fooden (1982) from east and central Nepal through Bhutan, northeastern India (Sikkim, northern West Bengal, N agaland, Arunachal Pradesh), northern and eastern Myanmar, southern China (Xizang, Yunnan, Guangxi), northern and western Thailand, Lao PDR and northern Vietnam.The models also predicted a few highly suitable areas in Uttarakhand in northwestern India.Interestingly, a disjunct population of Assamese macaque was recently discovered from this area (www.sanctuaryasia.com/component/content/article/7880-kala-bandar-the-mystery-macaque-of-uttara khand.html),well beyond the western distribution (i.e.Kaligandaki River in N epal) of the macaque.It should be noted here that we were, however, unable to obtain presence location data for the Assamese macaque from Uttarakhand.Similarly, the sympatric and sister macaque species -the white-cheeked macaque M. leucogenys -has recently been described from our predicted highly suitable areas in Southeast China (Li et al. 2015).We think this exemplifies the power, elegance and applicability of our methods and models for discovering new populations of the known species and new sympatric species as well as for conservation.
In Bhutan, broadleaf forests and Assamese macaques are confined up to 2900 m above sea level throughout the country (Wangchuk et al. 2004).Our model has predicted the potential habitats of Assamese macaque in these areas (along highways between Thimpu to Phuntsholing, Punakha, Trongsa, Zhemgang and Trashigang) where the macaques have frequently been reported (Wangchuk et al. 2004).Assamese macaques are sympatric with golden langur Trachypithecus geei in Zhemgang and Trongsa in central Bhutan, capped langur Trachypithecus pileatus in eastern Bhutan and hanuman langur Semnopithecus entellus in western Bhutan (Wangchuk et al. 2004).Thus, it seems that conserving broadleaf forest in Bhutan and elsewhere would help protect the Assamese macaques and other sympatric primates.
The isolated distribution of the Assamese macaque would appear to make it unlikely that a viable population of the species can be maintained in N epal in the long term (Wada 2005), and this may also be the case in other areas of their distribution range.There have been only a few studies to determine the minimal viable population size necessary for the conservation not only of Assamese monkeys, but also of Macaca in general (Wada 2005).A species' viability must be measured by evaluating population dynamics and environmental effects (Fa & Lind 1996, Nielsen et al. 2008).Considering these facts, we suggest as next research goals using the data we provided for carrying out dispersal and spatial population viability assessments (Nielsen et al. 2008) based on future distributions and habitat shift assessments of Assamese monkeys under scenarios of future climate change, increased human consumption, habitat transitions and human population.This should follow a threshold analysis of minimum population and habitat sizes for direct management of such species, all of which are currently lacking.We emphasize here that information on these subjects is sparse and difficult to find.

CONCLUSIONS
Our data, model and findings present the bestavailable distribution and barrier information for Assamese macaque Macaca assamensis.This study further offers a new research template for primates and has direct relevance for conservation.Primatologists and conservation managers could, for instance, plan future surveys of macaques in the highly suitable areas where data are not shared or available, and where surveys have never taken place; new macaque populations might be discovered in such potential sites.The model outcomes can also be applied to identify fragmented habitat, to reduce anthropogenic pressure on suitable habitat, and for effective habitat management and restoration, e.g. for translocation of macaques to suitable sites and for managing and mitigating human−macaque conflict, all of which are major challenges for wildlife managers throughout South and Southeast Asia.

Fig. 1 .
Fig. 1.Study area map with occurrence points for Assamese macaque.The map also depicts the countries in the study area

Fig. 6 .
Fig. 6.Partial dependence plot of presence and pseudo-absence (random) in TreeN et for BIO4 showing a relative index of occurrence (RIO) threshold (presence) at a BIO4 of 4500 units

Table 1 .
List and description of predictor variables.DEM: digital elevation model 1.0) to evaluate the expert cut-off value for each model.These maps were evaluated by different experts (internal and external; see Supplement 3 for details) in the distribution and ecology of the species.Based on the expert-suggested threshold RIO values, geographically suitable area maps for Assamese macaque presence or absence were prepared and further assessed for validity.

Table 2 .
Area under curve (AUC) and kappa values of the contributing models.N ote that these models are binomial, presence and random.Models were run in default mode with 'balanced' setting

Table 3 .
Variable importance for top 5 predictors of the contributing ensemble models Fig. 3. Ensemble model prediction (relative index of occurrence, RIO) maps showing the geographically suitable areas for Assamese macaquerithm (e.g.subsampled training data) will likely result in somewhat inflated metrics.Other model assessments, such as expert opinion and model prediction tests, are meant to be used, for instance, in areas of known absences.

Table 4 .
Summary from Table 3 for the top 5 predictors of the contributing ensemble models