stack_data processes datasets used in staggered Difference-in-Differences (DiD) designs. Staggered DiD designs arise when different units (e.g., firms, regions, countries) get treated at different time periods. This function creates cohorts based on the provided treatment period variable and stacks them together to create a comprehensive longitudinal format suitable for staggered DiD analyses.

stack_data(
  treated_period_var,
  time_var,
  pre_window,
  post_window,
  data,
  control_type = c("both", "never-treated", "not-yet-treated")
)

Arguments

treated_period_var

A character string indicating the column name of the treatment period variable.

time_var

A character string indicating the column name for time.

pre_window

An integer indicating the number of periods before the treatment to consider (i.e., leads).

post_window

An integer indicating the number of periods after the treatment to consider (i.e., lags).

data

A data frame containing the dataset to be processed.

control_type

A character string indicating which control type to use. One of "both", "never-treated", or "not-yet-treated".

Value

A data frame with the stacked data, augmented with relative period dummy variables, suitable for staggered DiD analysis.

Details

The function emphasizes the importance of having a control group, which should be represented by the value 10000 in the treated_period_var column of the provided dataset. The output data will be augmented with relative period dummy variables for ease of subsequent analysis.

Examples

if (FALSE) {
  library(did)
  library(tidyverse)
  library(fixest)
  data(base_stagg)
  stacked_data <- stack_data("year_treated", "year", 3, 3, base_stagg, control_type = "both")
  feols_result <- feols(as.formula(paste0(
    "y ~ ",
    paste(paste0("`rel_period_", c(-3:-2, 0:3), "`"), collapse = " + "),
    " | id ^ df + year ^ df"
  )), data = stacked_data)
  print(feols_result)
}