Skip to contents

Calculate the "Bayesian Spatial Scan Statistic" by Neill et al. (2006), adapted to a spatio-temporal setting. The scan statistic assumes that, given the relative risk, the data follows a Poisson distribution. The relative risk is in turn assigned a Gamma distribution prior, yielding a negative binomial marginal distribution for the counts under the null hypothesis. Under the alternative hypothesis, the

Usage

scan_bayes_negbin(
  counts,
  zones,
  baselines = NULL,
  population = NULL,
  outbreak_prob = 0.05,
  alpha_null = 1,
  beta_null = 1,
  alpha_alt = alpha_null,
  beta_alt = beta_null,
  inc_values = seq(1, 3, by = 0.1),
  inc_probs = 1
)

Arguments

counts

Either:

  • A matrix of observed counts. Rows indicate time and are ordered from least recent (row 1) to most recent (row nrow(counts)). Columns indicate locations, numbered from 1 and up. If counts is a matrix, the optional matrix argument baselines should also be specified.

  • A data frame with columns "time", "location", "count", "baseline". Alternatively, the column "baseline" can be replaced by a column "population". The baselines are the expected values of the counts.

zones

A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone.

baselines

Optional. A matrix of the same dimensions as counts. Not needed if counts is a data frame. Holds the Poisson mean parameter for each observed count. Will be estimated if not supplied (requires the population argument). These parameters are typically estimated from past data using e.g. Poisson (GLM) regression.

population

Optional. A matrix or vector of populations for each location. Not needed if counts is a data frame. If counts is a matrix, population is only needed if baselines are to be estimated and you want to account for the different populations in each location (and time). If a matrix, should be of the same dimensions as counts. If a vector, should be of the same length as the number of columns in counts.

outbreak_prob

A scalar; the probability of an outbreak (at any time, any place). Defaults to 0.05.

alpha_null

A scalar; the shape parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1.

beta_null

A scalar; the scale parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1.

alpha_alt

A scalar; the shape parameter for the gamma distribution under the alternative hypothesis of an anomaly. Defaults to the same value as alpha_null.

beta_alt

A scalar; the scale parameter for the gamma distribution under the alternative hypothesis of an anomaly. Defaults to the same value as beta_null.

inc_values

A vector of possible values for the increase in the mean (and variance) of an anomalous count. Defaults to evenly spaced values between 1 and 3, with a difference of 0.1 between consecutive values.

inc_probs

A vector of the prior probabilities of each value in inc_values. Defaults to 1, implying a discrete uniform distribution.

Value

A list which, in addition to the information about the type of scan statistic, has the following components: priors (list), posteriors (list), MLC (list) and marginal_data_prob (scalar). The list MLC has elements

zone

The number of the spatial zone of the most likely cluster (MLC).

duration

The most likely event duration.

log_posterior

The posterior log probability that an event is ongoing in the MLC.

log_bayes_factor

The logarithm of the Bayes factor for the MLC.

posterior

The posterior probability that an event is ongoing in the MLC.

locations

The locations involved in the MLC.

The list priors has elements

null_prior

The prior probability of no anomaly.

alt_prior

The prior probability of an anomaly.

inc_prior

A vectorof prior probabilities of each value in the argument inc_values.

window_prior

The prior probability of an outbreak in any of the space-time windows.

The list posteriors has elements

null_posterior

The posterior probability of no anomaly.

alt_posterior

The posterior probability of an anomaly.

inc_posterior

A data frame with columns inc_values and inc_posterior.

window_posteriors

A data frame with columns zone, duration, log_posterior and log_bayes_factor, each row corresponding to a space-time window.

space_time_posteriors

A matrix with the posterior anomaly probability of each location-time combination.

location_posteriors

A vector with the posterior probability of an anomaly at each location.

References

Neill, D. B., Moore, A. W., Cooper, G. F. (2006). A Bayesian Spatial Scan Statistic. Advances in Neural Information Processing Systems 18.

Examples

if (FALSE) {
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)

# Simulate data
baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(baselines)), max_duration, n_locs)

# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rpois(ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols]), 
  length(ob_rows), length(ob_cols))
res <- scan_bayes_negbin(counts = counts,
                         zones = zones,
                         baselines = baselines)
}