Calculate the negative binomial bayesian scan statistic..
Source:R/scan_bayes_negbin.R
scan_bayes_negbin.Rd
Calculate the "Bayesian Spatial Scan Statistic" by Neill et al. (2006), adapted to a spatio-temporal setting. The scan statistic assumes that, given the relative risk, the data follows a Poisson distribution. The relative risk is in turn assigned a Gamma distribution prior, yielding a negative binomial marginal distribution for the counts under the null hypothesis. Under the alternative hypothesis, the
Usage
scan_bayes_negbin(
counts,
zones,
baselines = NULL,
population = NULL,
outbreak_prob = 0.05,
alpha_null = 1,
beta_null = 1,
alpha_alt = alpha_null,
beta_alt = beta_null,
inc_values = seq(1, 3, by = 0.1),
inc_probs = 1
)
Arguments
- counts
Either:
A matrix of observed counts. Rows indicate time and are ordered from least recent (row 1) to most recent (row
nrow(counts)
). Columns indicate locations, numbered from 1 and up. Ifcounts
is a matrix, the optional matrix argumentbaselines
should also be specified.A data frame with columns "time", "location", "count", "baseline". Alternatively, the column "baseline" can be replaced by a column "population". The baselines are the expected values of the counts.
- zones
A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone.
- baselines
Optional. A matrix of the same dimensions as
counts
. Not needed ifcounts
is a data frame. Holds the Poisson mean parameter for each observed count. Will be estimated if not supplied (requires thepopulation
argument). These parameters are typically estimated from past data using e.g. Poisson (GLM) regression.- population
Optional. A matrix or vector of populations for each location. Not needed if
counts
is a data frame. Ifcounts
is a matrix,population
is only needed ifbaselines
are to be estimated and you want to account for the different populations in each location (and time). If a matrix, should be of the same dimensions ascounts
. If a vector, should be of the same length as the number of columns incounts
.- outbreak_prob
A scalar; the probability of an outbreak (at any time, any place). Defaults to 0.05.
- alpha_null
A scalar; the shape parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1.
- beta_null
A scalar; the scale parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1.
- alpha_alt
A scalar; the shape parameter for the gamma distribution under the alternative hypothesis of an anomaly. Defaults to the same value as
alpha_null
.- beta_alt
A scalar; the scale parameter for the gamma distribution under the alternative hypothesis of an anomaly. Defaults to the same value as
beta_null
.- inc_values
A vector of possible values for the increase in the mean (and variance) of an anomalous count. Defaults to evenly spaced values between 1 and 3, with a difference of 0.1 between consecutive values.
- inc_probs
A vector of the prior probabilities of each value in
inc_values
. Defaults to 1, implying a discrete uniform distribution.
Value
A list which, in addition to the information about the type of scan
statistic, has the following components: priors
(list),
posteriors
(list), MLC
(list) and marginal_data_prob
(scalar). The list MLC
has elements
- zone
The number of the spatial zone of the most likely cluster (MLC).
- duration
The most likely event duration.
- log_posterior
The posterior log probability that an event is ongoing in the MLC.
- log_bayes_factor
The logarithm of the Bayes factor for the MLC.
- posterior
The posterior probability that an event is ongoing in the MLC.
- locations
The locations involved in the MLC.
The list priors
has elements
- null_prior
The prior probability of no anomaly.
- alt_prior
The prior probability of an anomaly.
- inc_prior
A vectorof prior probabilities of each value in the argument
inc_values
.- window_prior
The prior probability of an outbreak in any of the space-time windows.
The list posteriors
has elements
- null_posterior
The posterior probability of no anomaly.
- alt_posterior
The posterior probability of an anomaly.
- inc_posterior
A data frame with columns
inc_values
andinc_posterior
.- window_posteriors
A data frame with columns
zone
,duration
,log_posterior
andlog_bayes_factor
, each row corresponding to a space-time window.- space_time_posteriors
A matrix with the posterior anomaly probability of each location-time combination.
- location_posteriors
A vector with the posterior probability of an anomaly at each location.
References
Neill, D. B., Moore, A. W., Cooper, G. F. (2006). A Bayesian Spatial Scan Statistic. Advances in Neural Information Processing Systems 18.
Examples
if (FALSE) {
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)
# Simulate data
baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(baselines)), max_duration, n_locs)
# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
rpois(ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols]),
length(ob_rows), length(ob_cols))
res <- scan_bayes_negbin(counts = counts,
zones = zones,
baselines = baselines)
}