Validation of a sea lice dispersal model: principles from ecological agent-based models applied to aquatic epidemiology

: Sea lice are one of the most economically costly and ecologically concerning prob-lems facing the salmon farming industry. Here, we validated a coupled biological and physical model that simulated sea lice larvae dispersal from salmon farms in the Broughton Archipelago (BA), British Columbia, Canada. We employed a concept from ecological agent-based modeling known as ‘pattern matching’, which identifies similar emergent properties in both the simulated and observed data to confirm that the simulation contained sufficient complexity to recreate the emergent properties of the system. One emergent property from the biophysical simulations was the existence of sub-networks of farms. These were also identified in the observed sea lice count data in this study using a space−time scan statistic (SaTScan) to identify significant spatio-tempo-ral clusters of farms. Despite finding support for our simulation in the observed data, which con-sisted of over a decade’s worth of monthly sea lice abundance counts from salmon farms in the BA, the validation was not entirely straightforward. The complexities associated with validating this biophysical dispersal simulation highlight the need to further develop validation techniques for agent-based models in general, and biophysical simulations in particular, which often result in patchiness in their dispersal fields. The methods utilised in this validation could be adopted as a template for other epidemiological dispersal models, particularly those related to aquaculture, which typically have robust disease monitoring data collection plans in place.


INTRODUCTION
Agent-based models (ABMs) refer to simulations where individual agents are given the ability to interact with each other as well as with their environment (Grimm et al. 2005, Railsback & Grimm 2010. ABMs have existed as a concept since the 1940s, but the computational power necessary to execute them did not exist until much later. Consequently, ABMs have become increasingly widespread since the 1990s (Niazi & Hussain 2011), including in the life sciences, with application to fields such as ecology, wildlife management, and epidemiology. The main appeal of using an ABM is that among the many single interactions between agents, or between agents and their environment, emergent behaviours in complex systems can be modeled (McLane et al. 2011).
One area in which ABMs have only recently been utilised effectively is in marine epidemiology, where disease particles (bacteria, viruses, protists, parasites, etc.) act as individual agents, interacting with each other and hosts as well as with ocean currents and other physical parameters of the waters they inhabit, such as temperature and salinity (Asplin et al. 2004, Salama et al. 2013, Skarðhamar et al. 2018). These types of simulations are able to create an ABM in the ocean by combining a particle-tracking model with biological characteristics assigned to the particles, which then allows them to interact with the physical environment (including underlying circulation) and sometimes with each other. A coupled biological and physical model (also referred to as a 'biophysical model') that can reflect both the life cycle and behaviours of the disease-causing particle of interest as well as its movement within the area of interest is needed in order to capture how a pathogen disperses throughout a specific population of hosts and/or geographical areas. The characteristics assigned to the particles reflect the specific biology of the simulated organism (Cantrell et al. 2020a). This type of biophysical model has been used to simulate dispersal of a wide variety of waterborne particles, from neonate sea turtles (Robson et al. 2017), to the causative agents of cholera (Augustijn et al. 2016), and to salmonid pathogens such as infectious hemato poietic necrosis virus or infectious salmon anaemia virus (Foreman et al. 2015, Gautam et al. 2018.
These biophysical models have also been used to model the dispersal of the copepod Lepeophtheirus salmonis, a common marine salmonid ectoparasite in many salmon farming regions in the Northern Hemisphere (Adams et al. 2015, Salama et al. 2016, Cantrell et al. 2018, 2020a. This ectoparasite, commonly known as the sea louse, is one of the most persistent and costly pests of farmed salmon (Costello 2009, Jansen et al. 2012. Sea lice have a free-living larval stage that can be dispersed 100s of km, depending on local currents (Kragesteen et al. 2018). While in the larval stage, they undergo 2 nauplii stages before becoming infective copepods, which are then able to attach to a salmonid host (Hamre et al. 2013). Once adults, they eat the mucus, blood, and scales of their hosts, creating lesions which make the fish susceptible to secondary infections and sometimes acting as a vector for bacterial infections (Brauner et al. 2012, Novak et al. 2016.
The interactions of the sea lice larvae with their hosts, the physical environment, and each other are complex enough to result in some emergent behaviour for the larval population. Emergent behaviour refers to the fact that 'a system can have qualities that are not analytically tractable from the attributes of its internal components' (Baas & Emmeche 1997); in other words, the whole is greater than the sum of its parts. Examples of such behaviour in sea lice populations include density-dependent dispersal between farms (Jansen et al. 2012), complex connectivity networks that can span entire coastlines (Samsing et al. 2019), or strong seasonality to dispersal patterns (Samsing et al. 2017). Understanding emergent properties is often the driving motivation in constructing an ABM. Due to the complexity of quantifying emergent behaviour as well as the difficulties associated with obtaining suitable data, and despite the fact that agent-based modeling has been widely used across multiple disciplines for more than a decade (ecology, economics, socio-biology, epidemiology, etc.), the problem of validating an ABM remains a challenge (Manson 2003, Grimm et al. 2005, and there is no universal consensus on how best to approach validation. In ecological modeling a validation concept known as 'pattern matching' has emerged, referring to spatial patterns from the model compared to idealised yet realistic characteristics of the natural system (Manson 2003). Grimm et al. (1996) defined a 'pattern' as a characteristic, clearly identifiable structure in data extracted from nature. Consequently, a pattern goes beyond random variation and thus indicates an underlying process that generates recognizable structure (Jeltsch et al. 1999, Wiegand et al. 2003. The following are examples of such patterns: distribution of dispersal distances, the spatial pattern of species occurrence in fragmented landscapes (Hanski 1994), wave-like patterns in the spread of rabies (Jeltsch et al. 1997), the spatial pattern of savannah trees (Jeltsch et al. 1999), and the size-class distribution of acacia trees (Jeltsch et al. 1999). If the ABM captures such complexity, then there is evidence that the assumptions placed in the model are adequate to create the emergent behaviours expected from the system. Pattern-matching validation illustrates the importance of blending quantitative and qualitative methods in the validation of multi-agent systems (Manson 2003).
Despite some uncertainties around validating biophysical models, they have provided important and useful information for managers and ecologists. Setting up model assumptions to simulate 'worst case scenarios' can allow managers to plan using a pre-cautionary approach to avoid such outcomes (Samsing et al. 2019), as well as to develop surveillance or monitoring programs that reflect epidemiological realities (Pande et al. 2015). Simulation of different conditions to assess outcomes most similar to observations allows for hypothesis testing involving mechanisms of dispersal (Kough et al. 2015) or environmental drivers of disease outbreaks (Aalto et al. 2020). Additionally, as climate change leads to warming oceans, the geographic ranges may change for both hosts and infectious agents. Biophysical models will be key for identifying susceptible host populations, predicting disease transmission pathways across larger areas, exploring the impacts of future climatic scenarios on transmission processes, and developing intervention strategies (Cantrell et al. 2020b).
We have previously published in-depth explorations of a biophysical model to simulate sea lice dispersal from salmon farms in the Broughton Archipelago (BA), British Columbia (BC), Canada (Cantrell et al. 2018). We investigated both the spatial (Cantrell et al. 2018) and temporal (Cantrell et al. 2020a) patterns in the output as well as the environmental drivers of the dispersal patterns (Cantrell et al. 2020a). The physical model that underlies the ABM has been validated (Foreman et al. 2009). Here, we validated emergent properties of the combined biophysical model using the pattern-matching approach. We used quantitative methods to identify clusters in the observed sea lice monitoring data and sub-networks in the simulated data. We then qualitatively com-pared the clustering and the sub-networks identified in each of the 2 data sets. To identify clusters in the observed sea lice count data set, we utilised space− time cluster analysis (SaTScan software; https:// www. satscan .org) to detect significant spatio-temporal clusters in the observed sea lice count data set, and temporal scanning analysis to identify seasonal or annual variation (Kulldorff 1997, 2015, Kulldorff et al. 2009).

Study area
The BA is a group of islands off the northeastern tip of Vancouver Island, BC. The model domain is within the BA (Fig. 1) and includes 20 farm sites as well as ecologically important juvenile salmon outmigration routes. There are 5 species of salmonids in the area, with wild salmon runs in BC that routinely exceed 10 7 tonnage of fish for certain rivers (Ye et al. 2015). There is widespread interest in protecting the wild salmon of the entire BC coast as well as the BA in particular, as returns over the past few decades have shown a general trend of de cline in abundance of spawning adults for several salmonid species (Miller et al. 2014, Price et al. 2017. Consequently, farms in the BA treat for sea lice every early spring/ late winter to suppress counts before the wild salmon migration period over the spring and summer months (specific migration win-  Cantrell et al. 2018Cantrell et al. , 2020a have been given matching colours. Sub-network 1 includes Farms 1-5; Sub-network 2 includes Farms 10, 11, and 15; Sub-network 3 includes Farms 6, 7, 16, 17, and 18. Sub-network 4 represents a group of farms which exhibited low connectivity in the bio-physical simulation, and includes Farms 8,9,12,13,14,19 and 20 dows vary between years and species) (Carr-Harris et al. 2018). Adult salmon re turn to the rivers to spawn between May and October, again with variability between species and year (Puget Sound Indian Tribes and Washington State Department of Fish and Wildlife 2017). The 7 large river mouths (all locations of active salmon runs) and annual large spring freshet event contribute to a complex and temporally variable circulation pattern. There is a validated physical circulation model for the BA (Fore man et al. 2009), which was used to inform the biophysical model in Cantrell et al. (2020a).
In BC, only one treatment type was used during the study period (January 2005-December 2017): an infeed chemical treatment with ema mectin benzoate (EMB; commercially available as SLICE ® ). Unlike in most of the North Atlantic, there has been little evidence of resistance to Slice in BC over this period (with the possible exception of a few years in Klemtu, an area isolated to the north of the BA). It is thought that this lack of resistance is mainly due to (1) low frequency of use -many sites only require a single application of EMB over the whole production cycle; (2) the large populations of wild salmonids on the Pacific coast, which acts as adequate 'refugia' that re-introduce naive parasites. This dilutes any genes conferring resistance which may have developed in the population (McEwan et al. 2015). The efficacy of SLICE is approximately 90% for at least 35 d after treatment (Stone et al. 2000).
In Cantrell et al. (2020a), we identified an emergent behaviour of 3 sub-networks of farms in the BA that are more highly connected to each other than to the rest of the farms in the area. Thus, they could also be thought of as farm clusters. We also identified a fourth group of farms that were not highly connected to any others, referred to as Sub-network 4 for ease of discussion (Fig. 1). The coherence of sub-networks is the emergent behaviour used throughout in the context of the pattern-matching validation techniques (i.e. comparisons of clustering in SaTScan analysis of the observed data to sub-networks identified in the simulated data).

Biophysical simulation
The simulated data used in this paper comes from the ABM published in Cantrell et al. (2018), with a detailed description of the model.
A Finite Volume Community Ocean Model (FV -COM) with unstructured triangular grid was created to span the entire BA region. Unstructured grids allow for varying model resolution, with a wider grid size in open parts of the BA and a finer grid in the more complex areas of the model domain. Daily freshwater river discharge values (see Fig. 1 for river locations) and M2, N2, S2, K1, P1, and O1 hourly tidal constituents were prescribed as forcing at the model boundaries. Hourly wind forcing data were captured with 9 weather stations deployed across the BA region, with winds interpolated between and extrapolated beyond station locations to all grid elements (Foreman et al. 2009). The FVCOM physical circulation model outputs (i.e. wind, temperature and salinity fields, and resulting currents) were validated in Foreman et al. (2009).
Hourly output from the hydrodynamic model was used by an offline particle-tracking model, in which each simulated particle represented an individual in a cohort of sea lice larvae. Each of these particles was coupled to a biological model that dictated the maturation and survivorship of the particle based on the salinity and temperature encountered (Stucchi et al. 2011, Cantrell et al. 2018. The details of the equations governing the biology of the sea lice larvae can be found in Cantrell et al. (2018). In short, particles were released as pre-infectious nauplii, which matured into infectious copepods at a temperature-dependent rate, with lower temperatures re sulting in slower maturation rates. Salinity impacts nauplii particle survival, with salinity below 30 psu resulting in decreased survival and mature larvae having a constant reduction in survival of −0.31 d −1 .
The offline particle-tracking model simulated the release of 50 particles from each farm (n = 20) every hour, for the duration of the simulation from 11 March until 20 July 2009, resulting in 129 d of total simulation time. The location and status of each particle was tracked for 11 d in order to simulate the hypothetical lifespan of a sea lice copepod in temperature conditions typical of the BA region (Stucchi et al. 2011), with particle locations recorded at 20 min time steps and an internal time step of 60 s. The position of the particles from the biophysical simulation was assumed to be a measure of infectious pressure each farm exerts on each other farm as well as on itself. In this simulation, infectious pressure was measured in particles km −2 .

Observed data (sea lice counts and treatments)
From January 2005 to December 2017, veterinarians at BA salmon farms recorded approximately monthly sea lice counts, including chalimus, mobile and gravid female stages for Lepeophtheirus salmonis, and combined unidentified Caligus species counts. The number of fish sampled farm −1 yr −1 ranged from 6−540, with 95% CIs of 58.2−61.9, and with the mean number of fish sampled during any sea lice counting event being around 60 (i.e. typically 3 pens with 20 fish from each). Lice abundance used in all analyses here combined mobile lice and gravid females into a total motile lice abundance. This was then divided by the number of fish sampled, resulting in a motile abundance value fish −1 for farm sampling events during the period 2005−2017. These data were collected from all aquaculture companies and curated into a single data set as part of the BC Salmon Farmers Association (BCSFA) Marine Environmental Research Program (Project MERP_17A). The present validation study includes data from January 2005 to December 2017, totalling 2569 observations. The sea lice counts did not occur as exactly 1 observation site −1 mo −1 , as some sites would have been harvesting, fallowing, or re-stocking over this period (Table 1).
All treatments administered in the BA region during this time period were recorded. The BA is unique in the salmon farming industry in that despite widespread resistance to SLICE in other regions of the world (Sutherland et al. 2015), it remains efficacious in the BA (Saksida et al. 2010), likely due to the large influx of 'naïve' sea lice regularly introduced to salmon farms by large wild salmon migrations up the rivers every year (Saksida et al. 2011, McEwan et al. 2015, Kreitz man et al. 2018).

SaTScan spatio-temporal analysis
SaTScan is used to detect clusters in spatio-temporal data. This task is accomplished by systematically and gradually scanning a window across space and/ or time, noting the number of observed and expected observations inside the window compared to outside the window at each location and time step. In the SaTScan software, the scanning window is a cylinder with a circular base (in space−time analyses) and varying height representing the steps in time. The maximum window sizes in space and time are predefined by the user, and it is generally recommended to carry out a sensitivity analysis spanning biologically meaningful values (Pfeiffer et al. 2008). The window with the maximum likelihood is the most likely cluster, that is, the cluster least likely to be due to chance. A p-value is assigned to this cluster using Monte Carlo methods (Kulldorff 2015). Additionally, SaTScan classifies clusters as 'high' or 'low' risk, with high-risk clusters having higher lice abundance inside the cluster than outside, and conversely, low-risk clusters having smaller lice abundance inside than outside the cluster. Fig. 2 is a schematic illustration of the analysis.
For SaTScan analyses, the motile lice abundance data were log transformed and used in a normal distribution model as described by Kulldorf et al. (2009 Illustration is an adaptation of a figure from Ahmadkhani et al. (2018) between sites (rather than Euclidean distance) were used (defined in a neighbour file). Sensitivity analysis was carried out to define the upper limit of spatial cluster sizes, ranging from 15% (representing 3 farms) to 50% (representing 10 farms). Results from Cantrell et al. (2020a), which quantitatively identified sub-networks, were qualitatively compared to those from the SaTScan analyses to identify matching patterns in the emergent properties (i.e. clustering/sub-networks).

SaTScan purely temporal analysis (seasonal and annual clusters)
SaTScan is also able to conduct purely temporal scan statistics to identify clusters in seasonal or annual variation. For the seasonal analysis, data were analysed on a connecting loop, ignoring the year in which observations are made and only considering the month. For the annual variation cluster analysis, data were aggregated up to year and analysed across the yearly values. Both can classify clusters as 'high' or 'low' risk. For both of these analyses, the log-transformed sea lice count data was again utilised for the normal distribution model, as in Section 2.4. The minimum and maximum temporal lengths were specified, and a sensitivity analysis was conducted ranging from 15−50% of the study period. In the case of the seasonal analysis, this means ranging from about 2−6 mo. For the yearly analysis, this translated into possible maximum cluster sizes ranging from 2−7 yr.

Descriptive analysis of the observed data (sea lice counts and treatment data)
Overall, mean sea lice abundance during the period from March−June has declined since 2005, with an atypical rise in abundance in 2015 (Fig. 3a). The monthly data (summed across years) has a pattern of abundance steeply declining by Month 3 (March), and remaining low until the end of summer (Month 8), before increasing for the remainder of the year (Fig. 3b). This is largely due to the focussed application of sea lice treatments at the start of each year (Fig. 3d) prior to the period of wild smolt out-migration.
There were 152 treatments on the 20 farms between 2005 and 2017. Typically, sites administered at most one treatment in a given year. The number of treatments yr −1 site −1 varied between 0.45 and 0.8 (Fig. 3c), with a slightly higher number of treatments administered in 2005−2006 and 2013−2014 compared to other years in the study.

Sub-networks in simulated data detected in observed data
3.2.1. Observed sea lice abundance within the study domain Boxplots for motile sea lice abundance farm −1 are presented in Fig. A1 in the Appendix. The observed sea lice abundance did not have obvious patterns of similarity that reflected sub-networks previously defined in the simulated data. Because this type of summary visualisation ultimately was not adequate to identify the similarities (or otherwise) among farms in a putative cluster, a more complex and sensitive method of spatio-temporal cluster analysis (SaTScan) was utilised.

SaTScan spatio-temporal analysis
SaTScan analyses identified between 3 and 6 significant clusters of lice abundance over the study period, depending on the specified maximum spatial window size (Table 2). When limiting the spatial window to a maximum of 3 farms (15% of the farms in the study), 5 clusters were identified; conversely, fewer clusters were identified when the maximum number of farms per cluster was increased to include more than 35% of all farms (maximum 7−10 farms cluster −1 ). 'Clusters' may be as small as one farm, which indicates that the mean lice count of this farm is significantly different than the mean lice count of the farms outside of the spatio-temporal window that one farm occupies for that time period -in such case, the emphasis is placed on the temporal aspect of the detected cluster within that farm.
With smaller spatial windows (15 and 20%), the significant clusters were fully nested within the simulated sub-networks, i.e. all farms in each significant cluster belonged to the same sub-network for a given period of time (Table 2). As the spatial window increased from 25−50% of the farms, the smaller significant clusters tended to aggregate and expand outwardly into neighbouring simulated sub-networks (Table 2). There were both high-and low-risk areas identified throughout the study and across the various spatial windows. Table 2. Fig. 4 summarises the significant clusters, identified with a spatial window that included up to 20% of the farms, to illustrate how smaller clusters were nested within the simulated sub-networks. The figure is divided into 4 sections to illustrate the progression of the significant clusters over the study period. Fig. 5 summarises the output for a larger cluster size, up to 30% of farms, to illustrate the effects of expanding the maximum cluster size. Fig. 5b illustrates the results from years 2007−2012. This is the time frame that encompasses the year of the simulation study, which simulated the environmental conditions from the year 2009 (Cantrell et al. 2018). In both Figs. 4 & 5, the black, semi-translucent 'X's placed on farms indicate the farm was not active for the entire dura-tion of time period represented in the panel. Light grey 'X's indicate the farm was not active for some duration of the time period represented in the panel (for at least 1 mo of the time period shown). Though farms typically are fallowed for 6 or more months in total, this can be split between 2 years).

Figs. 4 & 5 are visual representations of
Farms 11, 19, and 20 were never identified as part of a cluster. Farms 19 and 20 were part of the low connectivity group of farms identified in the simulation study and were frequently fallowed during the study period (Figs. 4 & 5). Starting with a maximum cluster size of 4 and up, Farms 1, 2, and 3 (all part of Sub-network 1 in the simulation) were always clustered together. In addition, irrespective of the exact members of their shared cluster, this group was always characterised as low risk. Farms 7, 16, 17, and 18 were a highly significant cluster starting at a cluster size of 4, and all were part of Sub-network 3 in the simulation. When the maximum cluster sizes were set to 5−10, the clusters persisted over most of the time period of the data set, meaning the ephemeral clusters which only lasted for 5 or fewer months (not shown for ease of interpretation) joined other clusters.

SaTScan purely temporal analysis (seasonal and annual clusters)
Varying the maximum cluster size for the yearly analysis from 15−50% of all years in the data set did not result in substantially different cluster results ( Table 3). The years 2004−2006 remained the only significant cluster once the maximum cluster size was large enough to include all 3 years. This cluster was identified as high risk, indicating the sea lice counts were significantly higher in these years than in the following years.
The seasonal analysis identified a significantly higher risk during the winter months. The months included in this high-risk cluster grew as the maximum cluster size increased, from only Month 12, when the maximum cluster size was set to be 15% of all months in the data set, to Months 10−2, when the maximum cluster size was set to be 50% of all months in the data set.

DISCUSSION
Despite a long history of use in many fields, validating the emergent behaviours of ABMs remains difficult and often relies on qualitative comparisons between simulated outcomes and known patterns in the observed data. Here, we attempted to validate the clusters of fish farms identified in a previous ABM with outcomes from an observed sea lice count data set. We used a SaTScan analysis to identify space−time clusters in the observed sea lice data set to qualitatively compare to the clustering determined in the simulated data.
While there were complications in interpreting output from this validation analysis, there remains support for the simulation in the observed sea lice count time series. The SaTScan analyses identified clusters in the ob served data that were consistent, even when the maximum cluster size was varied, and these consistencies were similar to the clusters identified from the biophysical simulation. SaTScan provides evidence for the clustering identified in the biophysical model to also be present in the observed data, as this analysis identified clusters of farms similar to the clusters in the biophysical simulation. This pattern matching gives us confidence that the model we constructed was complex enough to capture the emergent behaviour of the system, and lends support for the conclusions from the model. However, some key differences between clusters in the observed data and sub-networks in the simulated data remain. Some of the farms that were identified as part of 'Sub-network' 4, or the sub-network that was actually a group of farms with low connectivity to any other farm, were actually identified in this study as being in a consistent cluster. Farms 8, 9, 13, and 14 were part of Sub-network 4 in the simulation, and in the SaTScan analysis were clustered  Table 2. Summary of the SaTScan analyses indicating the maximum cluster size set in each analysis, the salmon farms that were defined as a cluster, the time window over which each cluster existed, and whether the cluster is high or low risk. Only significant clusters are shown, and all had p-values < 0.002. For ease of interpretation, the farms have been colour coordinated to match the sub-networks to which they belonged in the simulation study (see Fig. 1): pink: Sub-network 1; green: Sub-network 2; teal: Sub-network 3; purple: the unconnected farms referred to as Sub-network 4 together at multiple cluster sizes. Here, the difference between a sub-network and a cluster in this context becomes important. A sub-network means a group of sites that are connected to each other, whereas a cluster, in this context, indicates a group of farms whose sea lice abundance is similar to each other over a specified period of time. Therefore, it is perhaps not at all surprising that a group of farms with low connectivity would have more similarity in their monthly sea lice abundance than farms that have high connectivity to other farms in the area Despite the long time series data set of the observed sea lice counts from the farms (>10 yr) available to validate the biophysical simulation model, the validation has uncertainty, though there is support for the simulated sub-networks in the observed data clustering as well as support for the reduction in viable sea lice larvae during the spring freshet event seen in the simulation. Another difficulty in this validation exercise is the fact the simulation only covered a portion of the observed data time frame. At this point in time, a simulation with the high spatial and temporal resolution of the one described here would be prohibitively costly for computational time in order to simulate multiyear time scales.
Accounting for the impact of the treatments is difficult. If the treatments were effective and controlled sea lice levels, the sea lice abundance will not reflect original infestation. Thus, it is likely that treatments will break up connectivity among farms, thereby obscuring true clusters. The BA also has extremely large migrations of wild salmon (Ye et al. 2015) which, when returning to their natal rivers to spawn, typically carry attached sea lice (Gottesfeld et al. 2009). The introduction of additional infective sea lice to farms could further complicate our ability to detect farms that are hydrodynamically connected by inflating sea lice abundance on some farms. Therefore, it is possible that a farm with high abundance compared to the rest of the farms in their subnetwork could either be interpreted as evidence against the integrity of a given sub-network, or as a farm highly impacted by wild salmon migration, or some other unknown farming practice. Additionally, sea lice abundance is itself a derivative of larval infestation pressure. The relationship between larval infestation pressure and adult sea lice abundance on fish is not well understood (Frenzl 2014) and may not be linear. Therefore, it is possible that motile sea lice levels on farmed salmon may not even be the appropriate variable for comparison to infestation pressure because there are too many potential confounders and unknown steps between the larval supply and observed infestation.
In the previous exploration of the biophysical model simulated data set, we discovered that the freshet event in the spring suppressed sea lice devel-   Table 3. Sensitivity analysis for the purely temporal SaT -Scan analysis to identify annual and seasonal variation in sea lice infestation of salmon farms opment and acted as a mechanism to keep connectivity low between the farms. This suppression occurs because sea lice growth is hindered by cold and fresh water (Bricknell et al. 2006, Groner et al. 2016, Samsing et al. 2016, and during this time there is a large freshet event in the area which has been shown to suppress sea lice larval growth (Groner et al. 2016, Samsing et al. 2016, Cantrell et al. 2020a. Though there were sea lice treatments in the beginning of each year, the efficacy of SLICE is approximately 90% for at least 35 d after treatment (Stone et al. 2000). The fact the sea lice abundance remains low beyond this point and does not increase again throughout the summer provides evidence of the freshet event helping to suppress sea lice abundance for the duration of the summer months. Furthermore, the identification of the winter months as a high-risk cluster supports our finding of the freshet event in the spring and summer suppressing sea lice larval development. So, while the decrease in lice abundance is likely due to the treatments, the fact it is kept low for the entire summer without needing additional treatments is possibly due to the freshet. While the return timing of wild Pacific salmon depends on species and can begin as early as May, runs typically peak in late summer to autumn (August− October) (Puget Sound Indian Tribes and Washington State Department of Fish and Wildlife 2017).
There is no large influx of wild salmon bringing new sea lice to the area until the end of the summer months. Some of the challenges in parameterising sea lice biophysical models include unknowns in the sea lice biology, such as attachment rates, infestation pressure needed to initiate an infestation, and interactions between wild and farmed salmon, where sea lice are undoubtedly exchanged in both directions as adult salmon return to their natal rivers with existing sea lice infestations (Beamish et al. 2005). Both bath and in-feed treatments are commercially available to control sea lice infestations, with a veterinarian prescription. The impacts of these treatments on sea lice larvae dispersion remain difficult to parameterise. Additionally, the population size of sea lice as an initial condition in many models is unknown (Bellocchi et al. 2010).
An additional complication to this validation attempt is the resolution and scale mismatch between the observed and simulated data. In previous papers utilising the simulated data, we found high temporal variation in the connectivity strength depending on physical conditions from day to day (Cantrell et al. 2018(Cantrell et al. , 2020a. The simulation used a high frequency release of particles to analyse connectivity in a specific time period (March−July 2009). However, the observed data utilised here is coarse in temporal resolution (monthly sampling events), but large in scale (13 yr of data). It is possible that the high temporal variation identified in the simulation also exists in nature. With sea lice counts only being reported monthly, higher frequency temporal variation could be obscured.
The observed data set may not exhibit identical patterns detected in the simulated data set because not all farms were active at all times (Table 1, Fig. 4).
Having fallowed farms will disrupt the connectivity via larvae dispersing among sites and could disrupt farm clusters that would otherwise be present. This is often the entire purpose of fallowing farms. By the end of the observed data set in 2017, only 14 farms remained active (Fig. 4). This number of fallowed farms would undoubtedly disrupt connectivity. In light of the treatments and fallowed farms, the fact that the clusters can still be detected at all is perhaps rather strong evidence that the clusters identified in the simulation are valid.
The fact that a 13 yr data set with moderate temporal resolution does not offer either clear validation or disproval of the simulation highlights the difficulty in collecting observations that are able to ground-truth the emergent properties of biophysical models, and in particular, those simulating sea lice larval dispersion. All salmon farming regions will have farm treatments and fallowing regimes that complicate using their data for validation. However, the BA has fewer treatments and only used one type of treatment (EMB) during the study period (Saksida et al. 2011) compared to other regions where chemical treatments are rotated or used in tandem with non-medicinal treatments, such as cleaner fish and cage snorkel barriers in order to protect the efficacy of existing sea lice medications (Aaen et al. 2015, Jackson et al. 2018). BC has not had the same issues with resistance to chemical treatments found in other salmon farming regions, so responses to the treatments are fairly consistent (McEwan et al. 2015). The more homogenous methods in the BA likely means clustering patterns are less impacted by treatments than in other regions. In an ideal validation data set there would be no treatment impacts, all farms would be active, and higher temporal resolution would exist for lice counts. As this ideal data set does not exist and would be difficult or impossible to collect anywhere, the currently utilised data set represents perhaps the 'best case scenario' in terms of using farm level sea lice abundance for validation purposes.
Other techniques that have been utilised to validate biophysical models of sea lice dispersion, including plankton tows in the area to sample sea lice larvae, as well as setting up sentinel cages near farms to estimate infestation pressure on the farms (Adams et al. 2012, Pert et al. 2014, Sandvik et al. 2016). However, these methods have their own problems. Plankton tow sampling for sea lice larvae rarely yield sufficient samples for statistical analysis (Salama et al. 2011). For example, Adams et al. (2012) found only 12 of 126 sampling events yielded nonzero samples for sea lice larvae. Sentinel cages have been shown to underestimate sea lice abundance on the farms themselves (Ulgenes 2018), though this is possibly due in part to self infestation on the farms. Sentinel cages also poorly reflect the reality of patchiness in lice larvae distributions through the water column, though Sandvik and colleagues developed an improved method to 'shift' spatial correlations between observed sentinel cage counts and simulated larval distribution data. This task was accomplished by comparing observed lice counts from sentinel cages to predictions of lice larval dispersion in large grids in the simulation, rather than traditional pointto-point comparisons (Sandvik et al. 2016(Sandvik et al. , 2020. This information leaves researchers with imperfect options from which to validate the emergent properties of biophysical models. The analysis conducted here, therefore, may offer one of the 'best' options currently available. Despite the difficulties regarding validating the emergent properties of the simulated study, the central question is whether the model is useful for management; as George Box's often quoted aphorism puts it, 'all models are wrong, but some are useful'. The simulation identified clusters of farms which could be treated together in order to reduce the likelihood of outbreak scenarios. It identified areas that may be preferred from a disease management perspective for future siting of additional farms, such as near the 'unconnected farms' (e.g. in Sub-network 4), or in areas that showed a cluster with a low risk for sea lice. These clusters must be considered as guidelines, as the observed data set illustrated some farms have elevated sea lice abundance not predicted in the simulation and should receive additional surveillance, due to some factor not included in the ABM. Future simulations could explore known wild salmon migrations as an additional source of infestation pressure on the farms and expand the time frame of the simulation to include time periods when sea lice abundance levels are at their highest (in BC, this would be the autumn and winter months).

CONCLUSIONS
Validating sea lice dispersal simulations is a difficult task that does not yet have a 'gold standard' established in this field. Pattern matching between observed data and simulation outcomes offers one method to validate emergent properties from the simulated data, though often the observed data includes complicating features (such as treatments) which makes validation less than straightforward. We adopted qualitative comparisons of quantitative analysis, utilising spatio-temporal cluster analysis (SaTScan) to identify clusters in the observed sea lice abundance data to qualitatively compare to the clustering previously identified in the simulated data. We found empirical support for many patterns identified in a previously published ABM that simulated sea lice larval dispersion in the BA, BC.