use "https://dss.princeton.edu/training/Panel101.dta", clear
gen time = 0
replace time = 1 if year>=1994
gen treated = 0
replace treated = 1 if country>4
gen did = time*treated
reg y time treated did
Stata will give us the following output table:
. reg y time treated did
The coefficient for ‘did’ is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.
Use the following codes to generate the graph. Note that we are using the lgraph package to generate the graph.
ssc install lgraph
preserve
collapse (mean) y, by(year treated)
lgraph y year, by(treated) xline(1993)
restore
Stata will generate the following did graph:
Given that we used hypothetical data for this example, the graph does not show a clear parallel trend in outcome for treatment and control groups before the policy intervention.
reg y time##treated
The coefficient for "time#treated" is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.
ssc install diff
diff y, t(treated) p(time)
Note: "treated" and "time" in parentheses are dummies for treatment and time; see the "basic" method
Stata will give us the following outputs:
. diff y, t(treated) p(time)
The coefficient for ‘Diff-in-Diff’ is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.
** Type help diff for more details/options
Example 1: Using Hypothetical DSS Data
use "https://dss.princeton.edu/training/Panel101.dta", clear
gen time = 0
replace time = 1 if year>=1994
gen treated = 0
replace treated = 1 if country>4
gen did = time*treated
didregress (y) (did), group(country) time(year)
In the above codes, the first parenthesis contains the dependent variable ('y' for our dataset), and the second parenthesis contains the interaction variable ('did' which we created in the previous step). The group and time functions contain the group ID ('country' for our dataset) and time ('year' for this dataset) variables, respectively.
Stata will give us the following output table:
The table indicates that the average treatment effect on the treated is -2.52e+09 and statistically significant at 7% level (as the p-value is 0.07) . Keep in mind that the ideal significance level is 5% level.
To test the parallel trend assumption graphically, type:
estat trendplots
Stata will give us the following graph:
Given that we used hypothetical data for this example, the graph does not show a clear parallel trend in outcome for treatment and control groups before the policy intervention.
We can also test the parallel trend assumption using the following command:
estat ptrends
Stata will give us the following results:
As the p value is not < 0.05, we fail to reject the null hypothesis, which indicates that the parallel trend assumption is satisfied.
Notice that even though the graphical test did not indicate a clear parallel trend before the policy intervention, the mathematical test explicitly supports satisfying the parallel trend assumption.
For more on didregress postestimation commands type:
help didregress_postestimation
didregress (y x2 x3) (did), group(country) time(year) aeq
Note: we added 'x2' and 'x3' as covariates.
Stata will give us the following results:
The above table includes coefficients for covariates 'x2', 'x3' and for the time variable ('year').
Notice that the the average treatment effect on the treated (-2.71e+09) is now statistically significant at 5% level as the p-value (which is 0.013) is less than 0.05.
Example 2: Using Stata Data
use https://www.stata-press.com/data/r17/hospdd.dta, clear
In the hospital admission dataset, the 'procedure' column contains information for the policy change, assigning the value of 1 (labeled as 'New') if the new admission measure was taken by a hospital and 0 (labeled as 'old') otherwise.
didregress (satis) (procedure), group(hospital) time(month)
In the above codes, the first parenthesis contains the dependent variable, and the second parenthesis contains the DID/policy change variable. The group and time functions contain the group ID and time variables, respectively.
Stata will give us the following output table:
The table indicates that the average treatment effect on the treated is 0.8479879 and statistically significant at 1% level. That means, the changes in hospital admission procedures increases satisfaction by 0.85 units.
To test the parallel trend assumption graphically, type:
estat trendplots
Stata will give us the following graph:
The graph indicates that the treatment and the control group had parallel satisfaction level prior to the policy change.
We can also test the parallel trend assumption using the following command:
estat ptrends
Stata will give us the following results:
As the p value is not < 0.05, we fail to reject the null hypothesis, which indicates that the parallel trend assumption is satisfied.
For more on didregress postestimation commands type:
help didregress_postestimation
didregress (satis frequency) (procedure), group(hospital) time(month) aeq
Note: we added 'frequency' as a covariate.
Stata will give us the following results:
The above table includes coefficients for covariates 'frequency' and for the time variable ('month').
If you have questions or comments about this guide or method, please email data@Princeton.edu.