Skip to contents

Runs multiple model specifications and produces a specification curve plot following Simonsohn, Simmons, and Nelson (2020). This is useful for demonstrating that results are robust across reasonable analytical choices.

Usage

spec_curve(
  data,
  y,
  x,
  controls,
  fixed_effects = NULL,
  cluster_var = NULL,
  family = "gaussian"
)

Arguments

data

A data frame containing all variables.

y

Character string. The dependent variable name.

x

Character string. The treatment/main independent variable name.

controls

A list of character vectors, where each vector is a set of control variables for one specification. For example, list(c("age"), c("age", "income"), c("age", "income", "education")).

fixed_effects

A character vector of fixed effect variable names to cycle through (each used individually). Default is NULL.

cluster_var

Optional character string. Variable name for clustered standard errors. Default is NULL.

family

Character string. Either "gaussian" (default) or "binomial" for logistic regression.

Value

A list with components:

results

A data frame with columns: spec_id, estimate, std_error, ci_lower, ci_upper, p_value, n_obs, controls, fe.

plot

A ggplot2 object showing the specification curve.

median_estimate

The median coefficient across all specifications.

pct_significant

Percentage of specifications with p < 0.05.

pct_positive

Percentage of specifications with positive estimates.

References

Simonsohn, U., Simmons, J. P., and Nelson, L. D. (2020). "Specification Curve Analysis." Nature Human Behaviour, 4(11), 1208-1214.

Examples

if (FALSE) { # \dontrun{
data(mtcars)
result <- spec_curve(
  data = mtcars,
  y = "mpg",
  x = "am",
  controls = list(
    c("wt"),
    c("wt", "hp"),
    c("wt", "hp", "disp"),
    c("wt", "hp", "disp", "drat")
  )
)
result$plot
} # }