Research Guides: Difference-in-Differences in Stata: A Step-by-Step Guide

1. Using the "basic" method

Getting sample data.

use "https://dss.princeton.edu/training/Panel101.dta", clear

Create a dummy variable to indicate the time when the treatment started. Let's assume that the treatment started in 1994. In this case, years before 1994 will have a value of 0, and years from 1994 onward a 1.

gen time = 0
replace time = 1 if year>=1994

Create a dummy variable to identify the group exposed to the treatment. In this example, let's assume that countries with codes 5, 6, and 7 were treated (=1). Countries 1-4 were not treated (=0).

gen treated = 0
replace treated = 1 if country>4

Create an interaction between time and treated. We will call this interaction ‘did’

gen did = time*treated

Estimating the DID estimator

reg y time treated did

Stata will give us the following output table:

. reg y time treated did

Source | SS df MS Number of obs = 70
-------------+---------------------------------- F(3, 66) = 1.98
Model | 5.1898e+19 3 1.7299e+19 Prob > F = 0.1249
Residual | 5.7540e+20 66 8.7181e+18 R-squared = 0.0827
-------------+---------------------------------- Adj R-squared = 0.0410
Total | 6.2729e+20 69 9.0912e+18 Root MSE = 3.0e+09

------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
time | 2.29e+09 9.53e+08 2.40 0.019 3.87e+08 4.19e+09
treated | 1.78e+09 1.13e+09 1.58 0.120 -4.75e+08 4.03e+09
did | -2.52e+09 1.46e+09 -1.73 0.088 -5.43e+09 3.87e+08
_cons | 3.58e+08 7.38e+08 0.49 0.629 -1.12e+09 1.83e+09
------------------------------------------------------------------------------

The coefficient for ‘did’ is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.

Testing the parallel trend assumption graphically

Use the following codes to generate the graph. Note that we are using the lgraph package to generate the graph.

ssc install lgraph
preserve
collapse (mean) y, by(year treated)
lgraph y year, by(treated) xline(1993)
restore

Stata will generate the following did graph:

Given that we used hypothetical data for this example, the graph does not show a clear parallel trend in outcome for treatment and control groups before the policy intervention.

2. Using the "hashtag" method

No need to generate interaction while using the hashtag method. Estimate using the following command

reg y time##treated

. reg y time##treated

Source | SS df MS Number of obs = 70
-------------+---------------------------------- F(3, 66) = 1.98
Model | 5.1898e+19 3 1.7299e+19 Prob > F = 0.1249
Residual | 5.7540e+20 66 8.7181e+18 R-squared = 0.0827
-------------+---------------------------------- Adj R-squared = 0.0410
Total | 6.2729e+20 69 9.0912e+18 Root MSE = 3.0e+09

------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
1.time | 2.29e+09 9.53e+08 2.40 0.019 3.87e+08 4.19e+09
1.treated | 1.78e+09 1.13e+09 1.58 0.120 -4.75e+08 4.03e+09
|
time#treated |
1 1 | -2.52e+09 1.46e+09 -1.73 0.088 -5.43e+09 3.87e+08
|
_cons | 3.58e+08 7.38e+08 0.49 0.629 -1.12e+09 1.83e+09
------------------------------------------------------------------------------

The coefficient for "time#treated" is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.

3. Using the "diff" command

The command diff is user‐defined for Stata. To install, type

ssc install diff

Estimating using the diff command

diff y, t(treated) p(time)

Note: "treated" and "time" in parentheses are dummies for treatment and time; see the "basic" method

Stata will give us the following outputs:

. diff y, t(treated) p(time)

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 70
Before After
Control: 16 24 40
Treated: 12 18 30
28 42
--------------------------------------------------------
Outcome var. | y | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 3.6e+08| | |
Treated | 2.1e+09| | |
Diff (T-C) | 1.8e+09| 1.1e+09| 1.58 | 0.120
After | | | |
Control | 2.6e+09| | |
Treated | 1.9e+09| | |
Diff (T-C) | -7.4e+08| 9.2e+08| 0.81 | 0.422
| | | |
Diff-in-Diff | -2.5e+09| 1.5e+09| 1.73 | 0.088*
--------------------------------------------------------
R-square: 0.08
* Means and Standard Errors are estimated by linear regression
Inference: * p<0.01; ** p<0.05; * p<0.1

The coefficient for ‘Diff-in-Diff’ is the average treatment effect on the treated. The effect is significant at 10% level, with the treatment having a negative effect.

** Type help diff for more details/options

4. Using Stata 17 and later versions

Example 1: Using Hypothetical DSS Data

Getting sample data.

use "https://dss.princeton.edu/training/Panel101.dta", clear

Create a dummy variable to indicate the time when the treatment started. Let's assume that the treatment started in 1994. In this case, years before 1994 will have a value of 0, and years from 1994 onward a 1.

gen time = 0
replace time = 1 if year>=1994

Create a dummy variable to identify the group exposed to the treatment. In this example, let's assume that countries with codes 5, 6, and 7 were treated (=1). Countries 1-4 were not treated (=0).

gen treated = 0
replace treated = 1 if country>4

Create an interaction between time and treated. We will call this interaction ‘did’

gen did = time*treated

To run the DID model, type:

didregress (y) (did), group(country) time(year)

In the above codes, the first parenthesis contains the dependent variable ('y' for our dataset), and the second parenthesis contains the interaction variable ('did' which we created in the previous step). The group and time functions contain the group ID ('country' for our dataset) and time ('year' for this dataset) variables, respectively.

Stata will give us the following output table:

. didregress (y) (did), group(country) time(year)

Treatment and time information

Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country | 4 3
-------------+---------------------
Time |
Minimum | 1990 1994
Maximum | 1990 1994
-----------------------------------

Difference-in-differences regression Number of obs = 70
Data type: Repeated cross-sectional

(Std. err. adjusted for 7 clusters in country)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
did |
(1 vs 0) | -2.52e+09 1.15e+09 -2.20 0.070 -5.32e+09 2.85e+08
------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.

The table indicates that the average treatment effect on the treated is -2.52e+09 and statistically significant at 7% level (as the p-value is 0.07) . Keep in mind that the ideal significance level is 5% level.

Testing the parallel trend assumption

To test the parallel trend assumption graphically, type:

estat trendplots

Stata will give us the following graph:

Given that we used hypothetical data for this example, the graph does not show a clear parallel trend in outcome for treatment and control groups before the policy intervention.

We can also test the parallel trend assumption using the following command:

estat ptrends

Stata will give us the following results:

. estat ptrends

Parallel-trends test (pretreatment time period)
H0: Linear trends are parallel

F(1, 6) = 0.19
Prob > F = 0.6810

As the p value is not < 0.05, we fail to reject the null hypothesis, which indicates that the parallel trend assumption is satisfied.

Notice that even though the graphical test did not indicate a clear parallel trend before the policy intervention, the mathematical test explicitly supports satisfying the parallel trend assumption.

For more on didregress postestimation commands type:

help didregress_postestimation

To add covariates in the DID model, type:

didregress (y x2 x3) (did), group(country) time(year) aeq

Note: we added 'x2' and 'x3' as covariates.

Stata will give us the following results:

. didregress (y x2 x3) (did), group(country) time(year) aeq

Treatment and time information

Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country | 4 3
-------------+---------------------
Time |
Minimum | 1990 1994
Maximum | 1990 1994
-----------------------------------

Difference-in-differences regression Number of obs = 70
Data type: Repeated cross-sectional

(Std. err. adjusted for 7 clusters in country)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
did |
(1 vs 0) | -2.71e+09 7.75e+08 -3.50 0.013 -4.61e+09 -8.17e+08
-------------+----------------------------------------------------------------
Controls |
x2 | 1.44e+09 1.62e+09 0.89 0.408 -2.52e+09 5.39e+09
x3 | 5.38e+08 3.42e+08 1.58 0.166 -2.98e+08 1.38e+09
|
year |
1991 | 4.68e+08 1.45e+09 0.32 0.757 -3.07e+09 4.00e+09
1992 | 6.51e+08 1.02e+09 0.64 0.546 -1.84e+09 3.14e+09
1993 | 3.37e+09 1.86e+09 1.81 0.120 -1.19e+09 7.93e+09
1994 | 4.72e+09 2.05e+09 2.30 0.061 -3.02e+08 9.74e+09
1995 | 2.69e+09 2.08e+09 1.29 0.244 -2.40e+09 7.78e+09
1996 | 3.42e+09 1.34e+09 2.56 0.043 1.51e+08 6.69e+09
1997 | 4.80e+09 1.43e+09 3.36 0.015 1.30e+09 8.30e+09
1998 | 1.83e+09 1.26e+09 1.45 0.197 -1.26e+09 4.93e+09
1999 | 2.13e+09 2.21e+09 0.96 0.373 -3.28e+09 7.54e+09
|
_cons | -4.68e+08 1.07e+09 -0.44 0.676 -3.08e+09 2.14e+09
------------------------------------------------------------------------------
Note: ATET estimate adjusted for covariates, group effects, and time effects.

The above table includes coefficients for covariates 'x2', 'x3' and for the time variable ('year').

Notice that the the average treatment effect on the treated (-2.71e+09) is now statistically significant at 5% level as the p-value (which is 0.013) is less than 0.05.

Example 2: Using Stata Data

Getting sample data.

use https://www.stata-press.com/data/r17/hospdd.dta, clear

In the hospital admission dataset, the 'procedure' column contains information for the policy change, assigning the value of 1 (labeled as 'New') if the new admission measure was taken by a hospital and 0 (labeled as 'old') otherwise.

To run the DID model, type:

didregress (satis) (procedure), group(hospital) time(month)

In the above codes, the first parenthesis contains the dependent variable, and the second parenthesis contains the DID/policy change variable. The group and time functions contain the group ID and time variables, respectively.

Stata will give us the following output table:

. didregress (satis) (procedure), group(hospital) time(month)

Treatment and time information

Time variable: month
Control: procedure = 0
Treatment: procedure = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
hospital | 28 18
-------------+---------------------
Time |
Minimum | 1 4
Maximum | 1 4
-----------------------------------

Difference-in-differences regression Number of obs = 7,368
Data type: Repeated cross-sectional

(Std. err. adjusted for 46 clusters in hospital)
-------------------------------------------------------------------------------
| Robust
satis | Coefficient std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
ATET |
procedure |
(New vs Old) | .8479879 .0321121 26.41 0.000 .7833108 .912665
-------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.

The table indicates that the average treatment effect on the treated is 0.8479879 and statistically significant at 1% level. That means, the changes in hospital admission procedures increases satisfaction by 0.85 units.

Testing the parallel trend assumption

To test the parallel trend assumption graphically, type:

estat trendplots

Stata will give us the following graph:

The graph indicates that the treatment and the control group had parallel satisfaction level prior to the policy change.

We can also test the parallel trend assumption using the following command:

estat ptrends

Stata will give us the following results:

. estat ptrends

Parallel-trends test (pretreatment time period)
H0: Linear trends are parallel

F(1, 45) = 0.55
Prob > F = 0.4615

As the p value is not < 0.05, we fail to reject the null hypothesis, which indicates that the parallel trend assumption is satisfied.

For more on didregress postestimation commands type:

help didregress_postestimation

To add covariates in the DID model, type:

didregress (satis frequency) (procedure), group(hospital) time(month) aeq

Note: we added 'frequency' as a covariate.

Stata will give us the following results:

. didregress (satis frequency) (procedure), group(hospital) time(month) aeq

Treatment and time information

Time variable: month
Control: procedure = 0
Treatment: procedure = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
hospital | 28 18
-------------+---------------------
Time |
Minimum | 1 4
Maximum | 1 4
-----------------------------------

Difference-in-differences regression Number of obs = 7,368
Data type: Repeated cross-sectional

(Std. err. adjusted for 46 clusters in hospital)
-------------------------------------------------------------------------------
| Robust
satis | Coefficient std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
ATET |
procedure |
(New vs Old) | .8479879 .0321143 26.41 0.000 .7833063 .9126694
--------------+----------------------------------------------------------------
Controls |
frequency | .0537506 .018952 2.84 0.007 .0155794 .0919218
|
month |
February | -.0096077 .018433 -0.52 0.605 -.0467336 .0275183
March | .0219686 .0182522 1.20 0.235 -.0147933 .0587304
April | -.0032839 .0221044 -0.15 0.883 -.0478043 .0412366
May | -.0094027 .0232415 -0.40 0.688 -.0562135 .037408
June | -.0038375 .0190647 -0.20 0.841 -.0422358 .0345607
July | -.0111941 .0230045 -0.49 0.629 -.0575276 .0351393
|
_cons | 3.311728 .0498646 66.41 0.000 3.211296 3.412161
-------------------------------------------------------------------------------
Note: ATET estimate adjusted for covariates, group effects, and time effects.

The above table includes coefficients for covariates 'frequency' and for the time variable ('month').

Useful Resources

Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.

Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How much should we trust differences-in-differences estimates?. The Quarterly journal of economics, 119(1), 249-275.

Card, D. (1990). The impact of the Mariel boatlift on the Miami labor market. ILR Review, 43(2), 245-257.

Card, D., & Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. The American Economic Review, 84(4), 772.

Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of econometrics, 225(2), 200-230.

DSS Data Analysis Guides: Available at https://libguides.princeton.edu/c.php?g=1415215

Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of econometrics, 225(2), 254-277.

Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of economic literature, 47(1), 5-86.

Mailman School of Public Health, Columbia University. Difference-in-Differences Estimation. Available at: https://www.publichealth.columbia.edu/research/population-health-methods/difference-difference-estimation

Naqvi, A. (2020-2024). Difference-in-differences. Available at https://asjadnaqvi.github.io/DiD/

Roth, J., Sant'Anna, P. H., Bilinski, A., & Poe, J. (2022). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. Available at https://www.jonathandroth.com/assets/files/DiD_Review_Paper.pdf

The World Bank. Difference-in-Differences. https://dimewiki.worldbank.org/Difference-in-Differences#:~:text=The%20difference%2Din%2Ddifferences%20method,not%20(the%20comparison%20group).

Waldinger, F. (n.d.). Lecture 3: Differences-in-Differences. Available ate: https://silo.tips/download/lecture-3-differences-in-differences, accessed August, 10(2022).

Wooldridge, J. (2007). What’s new in econometrics? Lecture 10 difference-in-differences estimation. NBER Summer Institute, available at: https://www.nber.org/sites/default/files/2021-03/slides_10_diffindiffs.pdf, accessed August, 8(2022).

Data Consultant

Muhammad Al Amin

He/Him/His

Email Me

Contact:

Firestone Library, A-12-F.1

609-258-6051

Data Consultant

Yufei Qin

Email Me

Contact:

Firestone Library, A.12F.2

6092582519

Difference-in-Differences in Stata: A Step-by-Step Guide

A Step-by-Step Guide

Table of Contents

1. Using the "basic" method

2. Using the "hashtag" method

3. Using the "diff" command

4. Using Stata 17 and later versions

5. Useful Resources

1. Using the "basic" method

2. Using the "hashtag" method

. reg y time##treated

3. Using the "diff" command

4. Using Stata 17 and later versions

. didregress (y) (did), group(country) time(year)

Treatment and time information

Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country | 4 3
-------------+---------------------
Time |
Minimum | 1990 1994
Maximum | 1990 1994
-----------------------------------

Difference-in-differences regression Number of obs = 70
Data type: Repeated cross-sectional

. estat ptrends

Parallel-trends test (pretreatment time period)
H0: Linear trends are parallel

F(1, 6) = 0.19
Prob > F = 0.6810

. didregress (y x2 x3) (did), group(country) time(year) aeq

Treatment and time information

Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country | 4 3
-------------+---------------------
Time |
Minimum | 1990 1994
Maximum | 1990 1994
-----------------------------------

Difference-in-differences regression Number of obs = 70
Data type: Repeated cross-sectional

. didregress (satis) (procedure), group(hospital) time(month)

Treatment and time information

Time variable: month
Control: procedure = 0
Treatment: procedure = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
hospital | 28 18
-------------+---------------------
Time |
Minimum | 1 4
Maximum | 1 4
-----------------------------------

Difference-in-differences regression Number of obs = 7,368
Data type: Repeated cross-sectional

. estat ptrends

Parallel-trends test (pretreatment time period)
H0: Linear trends are parallel

F(1, 45) = 0.55
Prob > F = 0.4615

. didregress (satis frequency) (procedure), group(hospital) time(month) aeq

Treatment and time information

Time variable: month
Control: procedure = 0
Treatment: procedure = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
hospital | 28 18
-------------+---------------------
Time |
Minimum | 1 4
Maximum | 1 4
-----------------------------------

Difference-in-differences regression Number of obs = 7,368
Data type: Repeated cross-sectional

Useful Resources

Data Consultant

Data Consultant

Comments or Questions?

Difference-in-Differences in Stata: A Step-by-Step Guide

A Step-by-Step Guide

Table of Contents

1. Using the "basic" method

2. Using the "hashtag" method

3. Using the "diff" command

4. Using Stata 17 and later versions

5. Useful Resources

1. Using the "basic" method

2. Using the "hashtag" method

. reg y time##treated

3. Using the "diff" command

4. Using Stata 17 and later versions

. didregress (y) (did), group(country) time(year)

Treatment and time information

Time variable: year Control: did = 0 Treatment: did = 1 ----------------------------------- | Control Treatment -------------+--------------------- Group | country | 4 3 -------------+--------------------- Time | Minimum | 1990 1994 Maximum | 1990 1994 -----------------------------------

Difference-in-differences regression Number of obs = 70 Data type: Repeated cross-sectional

. estat ptrends

Parallel-trends test (pretreatment time period) H0: Linear trends are parallel

F(1, 6) = 0.19 Prob > F = 0.6810

. didregress (y x2 x3) (did), group(country) time(year) aeq

Treatment and time information

Time variable: year Control: did = 0 Treatment: did = 1 ----------------------------------- | Control Treatment -------------+--------------------- Group | country | 4 3 -------------+--------------------- Time | Minimum | 1990 1994 Maximum | 1990 1994 -----------------------------------

Difference-in-differences regression Number of obs = 70 Data type: Repeated cross-sectional

. didregress (satis) (procedure), group(hospital) time(month)

Treatment and time information

Time variable: month Control: procedure = 0 Treatment: procedure = 1 ----------------------------------- | Control Treatment -------------+--------------------- Group | hospital | 28 18 -------------+--------------------- Time | Minimum | 1 4 Maximum | 1 4 -----------------------------------

Difference-in-differences regression Number of obs = 7,368 Data type: Repeated cross-sectional

. estat ptrends

Parallel-trends test (pretreatment time period) H0: Linear trends are parallel

F(1, 45) = 0.55 Prob > F = 0.4615

. didregress (satis frequency) (procedure), group(hospital) time(month) aeq

Treatment and time information

Time variable: month Control: procedure = 0 Treatment: procedure = 1 ----------------------------------- | Control Treatment -------------+--------------------- Group | hospital | 28 18 -------------+--------------------- Time | Minimum | 1 4 Maximum | 1 4 -----------------------------------

Difference-in-differences regression Number of obs = 7,368 Data type: Repeated cross-sectional

Useful Resources

Data Consultant

Data Consultant

Comments or Questions?

Subscribe to our Newsletter

Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country | 4 3
-------------+---------------------
Time |
Minimum | 1990 1994
Maximum | 1990 1994
-----------------------------------

Difference-in-differences regression Number of obs = 70
Data type: Repeated cross-sectional

Parallel-trends test (pretreatment time period)
H0: Linear trends are parallel

F(1, 6) = 0.19
Prob > F = 0.6810

Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country | 4 3
-------------+---------------------
Time |
Minimum | 1990 1994
Maximum | 1990 1994
-----------------------------------

Difference-in-differences regression Number of obs = 70
Data type: Repeated cross-sectional

Time variable: month
Control: procedure = 0
Treatment: procedure = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
hospital | 28 18
-------------+---------------------
Time |
Minimum | 1 4
Maximum | 1 4
-----------------------------------

Difference-in-differences regression Number of obs = 7,368
Data type: Repeated cross-sectional

Parallel-trends test (pretreatment time period)
H0: Linear trends are parallel

F(1, 45) = 0.55
Prob > F = 0.4615

Time variable: month
Control: procedure = 0
Treatment: procedure = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
hospital | 28 18
-------------+---------------------
Time |
Minimum | 1 4
Maximum | 1 4
-----------------------------------

Difference-in-differences regression Number of obs = 7,368
Data type: Repeated cross-sectional