Calculate the space-time permutation scan statistic. — scan

Calculate the space-time permutation scan statistic devised by Kulldorff (2005).

Usage

scan_permutation(
  counts,
  zones,
  population = NULL,
  n_mcsim = 0,
  gumbel = FALSE,
  max_only = FALSE
)

Arguments

counts

Either:

A matrix of observed counts. Rows indicate time and are ordered from least recent (row 1) to most recent (row nrow(counts)). Columns indicate locations, numbered from 1 and up. If counts is a matrix, the optional argument population should also be specified.
A data frame with columns "time", "location", "count", "population".

zones

A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone.

population

Optional. A matrix or vector of populations for each location and time point. Only needed if baselines are to be estimated and you want to account for the different populations in each location (and time). If a matrix, should be of the same dimensions as counts. If a vector, should be of the same length as the number of columns in counts (the number of locations).

n_mcsim

A non-negative integer; the number of replicate scan statistics to generate in order to calculate a P-value.

gumbel

Logical: should a Gumbel P-value be calculated? Default is FALSE.

max_only

Boolean. If FALSE (default) the log-likelihood ratio statistic for each zone and duration is returned. If TRUE, only the largest such statistic (i.e. the scan statistic) is returned, along with the corresponding zone and duration.

Value

A list which, in addition to the information about the type of scan statistic, has the following components:

MLC: A list containing the number of the zone of the most likely cluster (MLC), the locations in that zone, the duration of the MLC, the calculated score, and the relative risk inside and outside the cluster. In order, the elements of this list are named zone_number, locations, duration, score, relrisk_in, relrisk_out.
observed: A data frame containing, for each combination of zone and duration investigated, the zone number, duration, score, relative risks. The table is sorted by score with the top-scoring location on top. If max_only = TRUE, only contains a single row corresponding to the MLC.
replicates: A data frame of the Monte Carlo replicates of the scan statistic (if any), and the corresponding zones and durations.
MC_pvalue: The Monte Carlo \(P\)-value.
Gumbel_pvalue: A \(P\)-value obtained by fitting a Gumbel distribution to the replicate scan statistics.
n_zones: The number of zones scanned.
n_locations: The number of locations.
max_duration: The maximum duration considered.
n_mcsim: The number of Monte Carlo replicates made.

References

Kulldorff, M., Heffernan, R., Hartman, J., Assunção, R. M., Mostashari, F. (2005). A space-time permutation scan statistic for disease outbreak detection. PLoS Medicine, 2(3), 0216-0224.

Examples

if (FALSE) {
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)

# Simulate data
population <- matrix(rnorm(n_total, 100, 10), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(population) / 20), 
                 max_duration, n_locs)

# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rpois(ob_dur * length(ob_cols), 2 * population[ob_rows, ob_cols] / 20), 
  length(ob_rows), length(ob_cols))
res <- scan_permutation(counts = counts,
                           zones = zones,
                           population = population,
                           n_mcsim = 99,
                           max_only = FALSE)
}