Monte Carlo Power Simulation for Causal Designs

Estimates statistical power (or sample requirements) for a variety of causal inference designs via Monte Carlo simulation. Unlike closed-form formulas, this approach handles non-normal outcomes, clustering, and complex treatment assignment mechanisms.

Usage

power_sim(
  dgp = NULL,
  estimator = NULL,
  n_seq = c(50, 100, 200, 500),
  effect_seq = c(0.1, 0.2, 0.3, 0.5),
  n_sims = 1000,
  alpha = 0.05,
  seed = 1,
  n_cores = 1,
  plot = TRUE
)

Arguments

dgp: A function function(n, effect_size, ...) that generates one simulated dataset as a data frame with columns Y (outcome) and W (treatment). If NULL, a default DGP is used (linear model with Gaussian noise).
estimator: A function function(data) that takes the simulated data frame and returns a named list with at least estimate (numeric) and p_value (numeric). If NULL, a simple OLS t-test on W is used.
n_seq: Integer vector. Sample sizes to evaluate. Default c(50, 100, 200, 500).
effect_seq: Numeric vector. Effect sizes (standardized) to evaluate. Default c(0.1, 0.2, 0.3, 0.5).
n_sims: Integer. Number of Monte Carlo replications per cell. Default 1000.
alpha: Numeric. Significance level. Default 0.05.
seed: Integer. Random seed. Default 1.
n_cores: Integer. Number of cores for parallel simulation. Default 1.
plot: Logical. Produce power curves. Default TRUE.

Value

A named list with two elements: results (a data frame with columns n, effect_size, power, mean_estimate, bias, rmse, and coverage) and plot (a ggplot2 power curve, or NULL when plot = FALSE).

Examples

if (FALSE) { # \dontrun{
# Default DGP: linear Gaussian model
ps <- power_sim(
  n_seq      = c(100, 200, 400),
  effect_seq = c(0.2, 0.3, 0.5),
  n_sims     = 500
)
ps$results
ps$plot

# Custom DGP: clustered data
cluster_dgp <- function(n, effect_size) {
  n_clusters  <- max(10, n %/% 20)
  cluster_ids <- rep(seq_len(n_clusters), each = ceiling(n / n_clusters))[seq_len(n)]
  cluster_re  <- rnorm(n_clusters, 0, 0.5)
  W  <- as.integer(cluster_ids <= n_clusters / 2)
  Y  <- effect_size * W + cluster_re[cluster_ids] + rnorm(n)
  data.frame(Y = Y, W = W, cluster = cluster_ids)
}
cluster_est <- function(data) {
  fit <- lm(Y ~ W, data = data)
  sm  <- summary(fit)$coefficients["W", ]
  list(estimate = sm[1], p_value = sm[4])
}
ps_cl <- power_sim(dgp = cluster_dgp, estimator = cluster_est,
                   n_seq = c(200, 500), n_sims = 500)
} # }