Specification Curve Analysis — spec

Runs multiple model specifications and produces a specification curve plot following Simonsohn, Simmons, and Nelson (2020). This is useful for demonstrating that results are robust across reasonable analytical choices.

Usage

spec_curve(
  data,
  y,
  x,
  controls,
  fixed_effects = NULL,
  cluster_var = NULL,
  family = "gaussian"
)

Arguments

data: A data frame containing all variables.
y: Character string. The dependent variable name.
x: Character string. The treatment/main independent variable name.
controls: A list of character vectors, where each vector is a set of control variables for one specification. For example, list(c("age"), c("age", "income"), c("age", "income", "education")).
fixed_effects: A character vector of fixed effect variable names to cycle through (each used individually). Default is NULL.
cluster_var: Optional character string. Variable name for clustered standard errors. Default is NULL.
family: Character string. Either "gaussian" (default) or "binomial" for logistic regression.

Value

A list with components:

results: A data frame with columns: spec_id, estimate, std_error, ci_lower, ci_upper, p_value, n_obs, controls, fe.
plot: A ggplot2 object showing the specification curve.
median_estimate: The median coefficient across all specifications.
pct_significant: Percentage of specifications with p < 0.05.
pct_positive: Percentage of specifications with positive estimates.

References

Simonsohn, U., Simmons, J. P., and Nelson, L. D. (2020). "Specification Curve Analysis." Nature Human Behaviour, 4(11), 1208-1214.

Examples

if (FALSE) { # \dontrun{
data(mtcars)
result <- spec_curve(
  data = mtcars,
  y = "mpg",
  x = "am",
  controls = list(
    c("wt"),
    c("wt", "hp"),
    c("wt", "hp", "disp"),
    c("wt", "hp", "disp", "drat")
  )
)
result$plot
} # }