Skip to Main Content

DID in Stata: Difference-in-Differences Stata Tutorial

Basic difference-in-differences estimation using Stata

Difference-in-Differences Stata Tutorial

Using the "basic" method

  • Getting sample data.

use "https://dss.princeton.edu/training/Panel101.dta", clear

  • Create a dummy variable to indicate the time when the treatment started. Let's assume that the treatment started in 1994. In this case, years before 1994 will have a value of 0, and years from 1994 onward a 1.

gen time = (year>=1994) & !missing(year)

  • Create a dummy variable to identify the group exposed to the treatment. In this example, let's assume that countries with codes 5, 6, and 7 were treated (=1). Countries 1-4 were not treated (=0).

gen treated = (country>4) & !missing(country)

  • Create an interaction between time and treated. We will call this interaction ‘did’

gen did = time*treated

  • Estimating the DID estimator

reg y time treated did, r

. reg y time treated did, r

Linear regression                               Number of obs     =         70
                                           F(3, 66)          =       2.17
                                           Prob > F          =     0.0998
                                           R-squared         =     0.0827
                                           Root MSE          =     3.0e+09
--------------------------------------------------------------------------------
        |                 Robust         
      y |  Coefficient    std. err.      t       p>|t|     [95% conf. interval]
--------------------------------------------------------------------------------
   time |  2.29e+09       9.00e+08      2.54     0.013      4.92e+08   4.09e+09
treated |  1.78e+09       1.05e+09      1.70     0.094     -3.11e+08   3.86e+09
    did | -2.52e+09       1.45e+09     -1.73     0.088     -5.42e+09   3.81e+08
  _cons | 3.58e+08        7.61e+08      0.47     0.640     -1.16e+09   1.88e+09
--------------------------------------------------------------------------------
  • The coefficient for ‘did’ is the difference-in-differences estimator. The effect is significant at 10% level, with the treatment having a negative effect.

Using the "hashtag" method

  • No need to generate interaction while using the hashtag method. Estimate using the following command

reg y time##treated, r

   . reg y time##treated, r
   Linear regression                                 Number of obs     =        70
                                           F(3, 66)          =       2.17
                                           Prob > F          =     0.0998
                                           R-squared         =     0.0827
                                           Root MSE          =    3.0e+09
-----------------------------------------------------------------------------------
            |                 Robust         
         y  |  Coefficient    std. err.      t      p>|t|      [95% conf. interval]
-----------------------------------------------------------------------------------
     1.time |  2.29e+09       9.00e+08      2.54     0.013      4.92e+08   4.09e+09
  1.treated |  1.78e+09       1.05e+09      1.70     0.094     -3.11e+08   3.86e+09
time##treated
        1 1 | -2.52e+09       1.45e+09     -1.73     0.088     -5.42e+09   3.81e+08
      _cons | 3.58e+08        7.61e+08      0.47     0.640     -1.16e+09   1.88e+09
-----------------------------------------------------------------------------------
  • The coefficient for ‘time#treated’ is the difference-in-differences estimator (‘did’ in the previous example). The effect is significant at 10%, with the treatment having a negative effect.

Using the "diff" command

  • The command diff is user‐defined for Stata. To install, type

ssc install diff

  • Estimating using the diff command

diff y, t(treated) p(time)

Note: "treated" and "time" in parentheses are dummies for treatment and time; see the "basic" method

. diff y, t(treated) p(time)

Number of observations in the DIFF-IN-DIFF: 70
    Baseline        Follow-up
Control:  16              24          40 
Treated:  12              18          30
     28              42
---------------------------------------------------------------
 Outcome var.   |  y       | S. Err.   |  t       | P>|t|
---------------------------------------------------------------
Baseline
Control    |  3.6e+08 |           |         |
Treated    |  2.1e+09 |           |         |
Diff (T-C) |  1.8e+09 |  1.1e+09  | 1.58    | 0.120
Follow-up
Control    |  2.6e+09 |           |         |
Treated    |  1.9e+09 |           |         |
Diff (T-C) | -7.4e+08 |  9.2e+08  | -0.81   | 0.422
Diff-in-Diff    | -2.5e+09 |  1.5e+09  | -1.73   | 0.088* 
------------------------------------------------------------------
R-square:    0.08
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

  • The coefficient for ‘Diff-in-Diff’ is the difference-in-differences estimator. The effect is significant at 10% level, with the treatment having a negative effect.

** Type help diff for more details/options

Useful Resources

Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.
 
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How much should we trust differences-in-differences estimates?. The Quarterly journal of economics119(1), 249-275.
 
Card, D. (1990). The impact of the Mariel boatlift on the Miami labor market. ILR Review43(2), 245-257.
 
Card, D., & Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. The American Economic Review84(4), 772.
 
Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of economic literature47(1), 5-86.
 
Mailman School of Public Health, Columbia University. Difference-in-Differences Estimation. Available at: https://www.publichealth.columbia.edu/research/population-health-methods/difference-difference-estimation
 
Roth, J., Sant'Anna, P. H., Bilinski, A., & Poe, J. (2022). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. Available at https://www.jonathandroth.com/assets/files/DiD_Review_Paper.pdf 
 
Waldinger, F. (n.d.). Lecture 3: Differences-in-Differences. Available ate: https://silo.tips/download/lecture-3-differences-in-differences, accessed August, 10(2022).
 
Wooldridge, J. (2007). What’s new in econometrics? Lecture 10 difference-in-differences estimation. NBER Summer Institute, available at: https://www.nber.org/sites/default/files/2021-03/slides_10_diffindiffs.pdf, accessed August, 8(2022).

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.