Skip to contents

Runs a model across all combinations of analytical choices (model specification, sample restrictions, functional forms, etc.) and visualizes the resulting distribution of estimates. This reveals how sensitive conclusions are to reasonable but arbitrary researcher decisions.

Usage

multiverse_analysis(
  data,
  outcome,
  treatment,
  choices = list(),
  family = "gaussian",
  alpha = 0.05,
  sort_by = c("estimate", "p_value"),
  plot = TRUE,
  parallel = FALSE
)

Arguments

data

A data frame.

outcome

Character. Name of the outcome variable.

treatment

Character. Name of the treatment variable.

choices

A named list. Each element is a character vector of alternative choices for that analytical dimension. Supported keys:

controls

List of covariate sets to add. Each element is a character vector of variable names, e.g. list(c(), c("age"), c("age","female")).

sample_filters

Named character vector of filter expressions (as strings), e.g. c(full="TRUE", adults="age>=18").

outcome_transforms

Named character vector of transformations applied to the outcome, e.g. c(levels="y", log="log(y+1)").

se_types

Character vector of SE types: "OLS", "HC1", "HC3", "cluster".

cluster_var

Character. Variable to cluster on (used when se_types includes "cluster").

family

Character. "gaussian" (default), "binomial", or "poisson". Model family.

alpha

Numeric. Significance level for highlighting. Default 0.05.

sort_by

Character. Sort specifications by "estimate" (default) or "p_value".

plot

Logical. Produce the multiverse plot. Default TRUE.

parallel

Logical. Run specifications in parallel via parallel::mclapply. Default FALSE.

Value

A list with:

results

Data frame of all specification results: spec_id, estimate, std_error, t_stat, p_value, ci_lo, ci_hi, significant, plus one column per analytical dimension.

summary

Summary statistics across all specifications: median estimate, % significant, % positive, IQR.

plot

A ggplot2 object (if plot = TRUE).

References

Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4, 1208-1214.

Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702-712.

Examples

set.seed(42)
n <- 300
df <- data.frame(
  y      = rnorm(n),
  treat  = rbinom(n, 1, 0.5),
  age    = runif(n, 18, 65),
  female = rbinom(n, 1, 0.5),
  income = rnorm(n)
)

mv <- multiverse_analysis(
  data      = df,
  outcome   = "y",
  treatment = "treat",
  choices   = list(
    controls        = list(c(), c("age"), c("age", "female"), c("age", "female", "income")),
    sample_filters  = c(full = "TRUE", age30plus = "age >= 30"),
    outcome_transforms = c(levels = "y")
  )
)
mv$summary
#>   n_specs median_est   iqr_est pct_positive pct_significant   min_est   max_est
#> 1       8  0.1891403 0.0610175          100               0 0.1542434 0.2336293