# Differences-in-Differences in Stata: A Step-by-Step Guide

Basic differences-in-differences estimation using Stata

## 1. Using the "basic" method

• Getting sample data.

use "https://dss.princeton.edu/training/Panel101.dta", clear

• Create a dummy variable to indicate the time when the treatment started. Let's assume that the treatment started in 1994. In this case, years before 1994 will have a value of 0, and years from 1994 onward a 1.

gen time = 0
replace time = 1 if year>=1994

• Create a dummy variable to identify the group exposed to the treatment. In this example, let's assume that countries with codes 5, 6, and 7 were treated (=1). Countries 1-4 were not treated (=0).

gen treated = 0
replace treated = 1 if country>4

• Create an interaction between time and treated. We will call this interaction ‘did’

gen did = time*treated

• Estimating the DID estimator

reg y time treated did

Stata will give us the following output table:

. reg y time treated did

###### ------------------------------------------------------------------------------            y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] -------------+----------------------------------------------------------------         time |   2.29e+09   9.53e+08     2.40   0.019     3.87e+08    4.19e+09      treated |   1.78e+09   1.13e+09     1.58   0.120    -4.75e+08    4.03e+09          did |  -2.52e+09   1.46e+09    -1.73   0.088    -5.43e+09    3.87e+08        _cons |   3.58e+08   7.38e+08     0.49   0.629    -1.12e+09    1.83e+09 ------------------------------------------------------------------------------

The coefficient for ‘did’ is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.

• Testing the parallel trend assumption graphically

Use the following codes to generate the graph. Note that we are using the lgraph package to generate the graph.

ssc install lgraph
preserve
collapse (mean) y, by(year treated)
lgraph y year, by(treated) xline(1993)
restore

Stata will generate the following did graph:

Given that we used hypothetical data for this example, the graph does not show a clear parallel trend in outcome for treatment and control groups before the policy intervention.

## 2. Using the "hashtag" method

• No need to generate interaction while using the hashtag method. Estimate using the following command

reg y time##treated

###### ------------------------------------------------------------------------------            y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] -------------+----------------------------------------------------------------       1.time |   2.29e+09   9.53e+08     2.40   0.019     3.87e+08    4.19e+09    1.treated |   1.78e+09   1.13e+09     1.58   0.120    -4.75e+08    4.03e+09              | time#treated |         1 1  |  -2.52e+09   1.46e+09    -1.73   0.088    -5.43e+09    3.87e+08              |        _cons |   3.58e+08   7.38e+08     0.49   0.629    -1.12e+09    1.83e+09 ------------------------------------------------------------------------------

The coefficient for "time#treated" is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.

## 3. Using the "diff" command

• The command diff is user‐defined for Stata. To install, type

ssc install diff

• Estimating using the diff command

diff y, t(treated) p(time)

Note: "treated" and "time" in parentheses are dummies for treatment and time; see the "basic" method

Stata will give us the following outputs:

. diff y, t(treated) p(time)

###### DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS Number of observations in the DIFF-IN-DIFF: 70             Before         After        Control: 16             24          40    Treated: 12             18          30             28             42 --------------------------------------------------------  Outcome var.   | y       | S. Err. |   |t|   |  P>|t| ----------------+---------+---------+---------+--------- Before          |         |         |         |     Control      |  3.6e+08|         |         |     Treated      |  2.1e+09|         |         |     Diff (T-C)   |  1.8e+09|  1.1e+09| 1.58    | 0.120 After           |         |         |         |     Control      |  2.6e+09|         |         |     Treated      |  1.9e+09|         |         |     Diff (T-C)   | -7.4e+08|  9.2e+08| 0.81    | 0.422                 |         |         |         |  Diff-in-Diff    | -2.5e+09|  1.5e+09| 1.73    | 0.088* -------------------------------------------------------- R-square:    0.08 * Means and Standard Errors are estimated by linear regression **Inference: *** p<0.01; ** p<0.05; * p<0.1

The coefficient for ‘Diff-in-Diff’ is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.

** Type help diff for more details/options

## 4. Using Stata 17 and later versions

Example 1: Using Hypothetical DSS Data

• Getting sample data.

use "https://dss.princeton.edu/training/Panel101.dta", clear

• Create a dummy variable to indicate the time when the treatment started. Let's assume that the treatment started in 1994. In this case, years before 1994 will have a value of 0, and years from 1994 onward a 1.

gen time = 0
replace time = 1 if year>=1994

• Create a dummy variable to identify the group exposed to the treatment. In this example, let's assume that countries with codes 5, 6, and 7 were treated (=1). Countries 1-4 were not treated (=0).

gen treated = 0
replace treated = 1 if country>4

• Create an interaction between time and treated. We will call this interaction ‘did’

gen did = time*treated

• To run the DID model, type:

didregress (y) (did), group(country) time(year)

In the above codes, the first parenthesis contains the dependent variable ('y' for our dataset), and the second parenthesis contains the interaction variable ('did' which we created in the previous step). The group and time functions contain the group ID ('country' for our dataset) and time ('year' for this dataset) variables, respectively.

Stata will give us the following output table:

###### (Std. err. adjusted for 7 clusters in country) ------------------------------------------------------------------------------              |               Robust            y | Coefficient  std. err.      t    P>|t|     [95% conf. interval] -------------+---------------------------------------------------------------- ATET         |          did |    (1 vs 0)  |  -2.52e+09   1.15e+09    -2.20   0.070    -5.32e+09    2.85e+08 ------------------------------------------------------------------------------ Note: ATET estimate adjusted for group effects and time effects.

The table indicates that the average treatment effect on the treated is -2.52e+09 and statistically significant at 7% level (as the p-value is 0.07) . Keep in mind that the ideal significance level is 5% level.

• Testing the parallel trend assumption

To test the parallel trend assumption graphically, type:

estat trendplots

Stata will give us the following graph:

Given that we used hypothetical data for this example, the graph does not show a clear parallel trend in outcome for treatment and control groups before the policy intervention.

We can also test the parallel trend assumption using the following command:

estat ptrends

Stata will give us the following results:

###### F(1, 6) =   0.19 Prob > F = 0.6810

As the p value is not < 0.05, we fail to reject the null hypothesis, which indicates that the parallel trend assumption is satisfied.

Notice that even though the graphical test did not indicate a clear parallel trend before the policy intervention, the mathematical test explicitly supports satisfying the parallel trend assumption.

For more on didregress postestimation commands type:

help didregress_postestimation

• To add covariates in the DID model, type:

didregress (y x2 x3) (did), group(country) time(year) aeq

Note: we added 'x2' and 'x3' as covariates.

Stata will give us the following results:

###### (Std. err. adjusted for 7 clusters in country) ------------------------------------------------------------------------------              |               Robust            y | Coefficient  std. err.      t    P>|t|     [95% conf. interval] -------------+---------------------------------------------------------------- ATET         |          did |    (1 vs 0)  |  -2.71e+09   7.75e+08    -3.50   0.013    -4.61e+09   -8.17e+08 -------------+---------------------------------------------------------------- Controls     |           x2 |   1.44e+09   1.62e+09     0.89   0.408    -2.52e+09    5.39e+09           x3 |   5.38e+08   3.42e+08     1.58   0.166    -2.98e+08    1.38e+09              |         year |        1991  |   4.68e+08   1.45e+09     0.32   0.757    -3.07e+09    4.00e+09        1992  |   6.51e+08   1.02e+09     0.64   0.546    -1.84e+09    3.14e+09        1993  |   3.37e+09   1.86e+09     1.81   0.120    -1.19e+09    7.93e+09        1994  |   4.72e+09   2.05e+09     2.30   0.061    -3.02e+08    9.74e+09        1995  |   2.69e+09   2.08e+09     1.29   0.244    -2.40e+09    7.78e+09        1996  |   3.42e+09   1.34e+09     2.56   0.043     1.51e+08    6.69e+09        1997  |   4.80e+09   1.43e+09     3.36   0.015     1.30e+09    8.30e+09        1998  |   1.83e+09   1.26e+09     1.45   0.197    -1.26e+09    4.93e+09        1999  |   2.13e+09   2.21e+09     0.96   0.373    -3.28e+09    7.54e+09              |        _cons |  -4.68e+08   1.07e+09    -0.44   0.676    -3.08e+09    2.14e+09 ------------------------------------------------------------------------------ Note: ATET estimate adjusted for covariates, group effects, and time effects.

The above table includes coefficients for covariates 'x2', 'x3' and for the time variable ('year').

Notice that the  the average treatment effect on the treated (-2.71e+09) is now statistically significant at 5% level as the p-value (which is 0.013) is less than 0.05.

Example 2: Using Stata Data

• Getting sample data.

use https://www.stata-press.com/data/r17/hospdd.dta, clear

In the hospital admission dataset, the 'procedure' column contains information for the policy change, assigning the value of 1 (labeled as 'New') if the new admission measure was taken by a hospital and 0 (labeled as 'old') otherwise.

• To run the DID model, type:

didregress (satis) (procedure), group(hospital) time(month)

In the above codes, the first parenthesis contains the dependent variable, and the second parenthesis contains the DID/policy change variable. The group and time functions contain the group ID and time variables, respectively.

Stata will give us the following output table:

###### (Std. err. adjusted for 46 clusters in hospital) -------------------------------------------------------------------------------               |               Robust         satis | Coefficient  std. err.      t    P>|t|     [95% conf. interval] --------------+---------------------------------------------------------------- ATET          |     procedure | (New vs Old)  |   .8479879   .0321121    26.41   0.000     .7833108     .912665 ------------------------------------------------------------------------------- Note: ATET estimate adjusted for group effects and time effects.

The table indicates that the average treatment effect on the treated is 0.8479879 and statistically significant at 1% level. That means, the changes in hospital admission procedures increases satisfaction by 0.85 units.

• Testing the parallel trend assumption

To test the parallel trend assumption graphically, type:

estat trendplots

Stata will give us the following graph:

The graph indicates that the treatment and the control group had parallel satisfaction level prior to the policy change.

We can also test the parallel trend assumption using the following command:

estat ptrends

Stata will give us the following results:

###### F(1, 45) =   0.55 Prob > F = 0.4615

As the p value is not < 0.05, we fail to reject the null hypothesis, which indicates that the parallel trend assumption is satisfied.

For more on didregress postestimation commands type:

help didregress_postestimation

• To add covariates in the DID model, type:

didregress (satis frequency) (procedure), group(hospital) time(month) aeq

Note: we added 'frequency' as a covariate.

Stata will give us the following results:

###### (Std. err. adjusted for 46 clusters in hospital) -------------------------------------------------------------------------------               |               Robust         satis | Coefficient  std. err.      t    P>|t|     [95% conf. interval] --------------+---------------------------------------------------------------- ATET          |     procedure | (New vs Old)  |   .8479879   .0321143    26.41   0.000     .7833063    .9126694 --------------+---------------------------------------------------------------- Controls      |     frequency |   .0537506    .018952     2.84   0.007     .0155794    .0919218               |         month |     February  |  -.0096077    .018433    -0.52   0.605    -.0467336    .0275183        March  |   .0219686   .0182522     1.20   0.235    -.0147933    .0587304        April  |  -.0032839   .0221044    -0.15   0.883    -.0478043    .0412366          May  |  -.0094027   .0232415    -0.40   0.688    -.0562135     .037408         June  |  -.0038375   .0190647    -0.20   0.841    -.0422358    .0345607         July  |  -.0111941   .0230045    -0.49   0.629    -.0575276    .0351393               |         _cons |   3.311728   .0498646    66.41   0.000     3.211296    3.412161 ------------------------------------------------------------------------------- Note: ATET estimate adjusted for covariates, group effects, and time effects.

The above table includes coefficients for covariates 'frequency' and for the time variable ('month').

## Useful Resources

Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.

Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How much should we trust differences-in-differences estimates?. The Quarterly journal of economics119(1), 249-275.

Card, D. (1990). The impact of the Mariel boatlift on the Miami labor market. ILR Review43(2), 245-257.

Card, D., & Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. The American Economic Review84(4), 772.

DSS Data Analysis Guides: Available at https://library.princeton.edu/dss/training

Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of economic literature47(1), 5-86.

Mailman School of Public Health, Columbia University. Difference-in-Differences Estimation. Available at: https://www.publichealth.columbia.edu/research/population-health-methods/difference-difference-estimation

Naqvi, A. (2020-2024). Difference-in-differences. Available at https://asjadnaqvi.github.io/DiD/

Roth, J., Sant'Anna, P. H., Bilinski, A., & Poe, J. (2022). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. Available at https://www.jonathandroth.com/assets/files/DiD_Review_Paper.pdf

Waldinger, F. (n.d.). Lecture 3: Differences-in-Differences. Available ate: https://silo.tips/download/lecture-3-differences-in-differences, accessed August, 10(2022).

Wooldridge, J. (2007). What’s new in econometrics? Lecture 10 difference-in-differences estimation. NBER Summer Institute, available at: https://www.nber.org/sites/default/files/2021-03/slides_10_diffindiffs.pdf, accessed August, 8(2022).

## Data Consultant

He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

## Data Consultant

Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519