
Monte Carlo Power Simulation for Causal Designs
power_sim.RdEstimates statistical power (or sample requirements) for a variety of causal inference designs via Monte Carlo simulation. Unlike closed-form formulas, this approach handles non-normal outcomes, clustering, and complex treatment assignment mechanisms.
Arguments
- dgp
A function
function(n, effect_size, ...)that generates one simulated dataset as a data frame with columnsY(outcome) andW(treatment). IfNULL, a default DGP is used (linear model with Gaussian noise).- estimator
A function
function(data)that takes the simulated data frame and returns a named list with at leastestimate(numeric) andp_value(numeric). IfNULL, a simple OLS t-test on W is used.- n_seq
Integer vector. Sample sizes to evaluate. Default
c(50, 100, 200, 500).- effect_seq
Numeric vector. Effect sizes (standardized) to evaluate. Default
c(0.1, 0.2, 0.3, 0.5).- n_sims
Integer. Number of Monte Carlo replications per cell. Default
1000.- alpha
Numeric. Significance level. Default
0.05.- seed
Integer. Random seed. Default
1.- n_cores
Integer. Number of cores for parallel simulation. Default
1.- plot
Logical. Produce power curves. Default
TRUE.
Value
A named list with two elements: results (a data frame with
columns n, effect_size, power, mean_estimate,
bias, rmse, and coverage) and plot (a ggplot2
power curve, or NULL when plot = FALSE).
Examples
if (FALSE) { # \dontrun{
# Default DGP: linear Gaussian model
ps <- power_sim(
n_seq = c(100, 200, 400),
effect_seq = c(0.2, 0.3, 0.5),
n_sims = 500
)
ps$results
ps$plot
# Custom DGP: clustered data
cluster_dgp <- function(n, effect_size) {
n_clusters <- max(10, n %/% 20)
cluster_ids <- rep(seq_len(n_clusters), each = ceiling(n / n_clusters))[seq_len(n)]
cluster_re <- rnorm(n_clusters, 0, 0.5)
W <- as.integer(cluster_ids <= n_clusters / 2)
Y <- effect_size * W + cluster_re[cluster_ids] + rnorm(n)
data.frame(Y = Y, W = W, cluster = cluster_ids)
}
cluster_est <- function(data) {
fit <- lm(Y ~ W, data = data)
sm <- summary(fit)$coefficients["W", ]
list(estimate = sm[1], p_value = sm[4])
}
ps_cl <- power_sim(dgp = cluster_dgp, estimator = cluster_est,
n_seq = c(200, 500), n_sims = 500)
} # }