3. Sample size considerations for detecting species co-occurrence with multispecies occupancy models

Amber Cowans

Multispecies occupancy models have become a popular framework to jointly model species distributions while simultaneously accounting for environmental factors and imperfect detection. However, these models have been recognised to perform poorly using smaller, but realistic, sample sizes due to convergence and estimation issues. This has prompted the implementation of penalised likelihood approaches, introducing small amounts of bias to increase predictive ability. Despite wide application, there currently is no formal evaluation of how multi-species occupancy model performance is influenced by sample size and configuration, nor how the benefit of using penalised likelihood frameworks varies under different sample size scenarios. To investigate this, we conducted an extensive simulation study to test the model’s ability to recover known co-occurrence patterns. We fit co-occurrence models to simulated datasets under varying sample size scenarios, while iteratively increasing model complexity in two parameter dimensions: number of covariates and number of interacting species. For every scenario, we simultaneously implemented both standard and penalised likelihood approaches to explicitly compare the inference-prediction trade-off. Through this, we effectively demonstrate that the ability of co-occurrence models to uncover abiotic and biotic interactions is sensitive to both sample size and model complexity. Even under the simplest parametrization, we obseve high bias and low coverage in natural parameters (those used for inference) for sample sizes below 200 sites. Penalised likelihood generally outperforms log-likelihood below sample sizes of 200-300 sites. For smaller sample sizes, biases in the general parameters or derived quantities (those of predictive interest) are lower than for natural parameters, suggesting that while systematic biases compromise ecological inference, the predictive ability is less affected. We conclude by providing clear user guidelines for model interpretation and the suitability of alternative model fitting frameworks with respect to sample size and model complexity, increasing the utility of multispecies occupancy models for ecologists exploring species co-occurrence.