Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

DID in Stata: Difference-in-Differences Stata Tutorial

Basic difference-in-differences estimation using Stata

Difference-in-Differences Stata Tutorial

Using "basic" method

  • Getting sample data.

use "https://dss.princeton.edu/training/Panel101.dta", clear

  • Create a dummy variable to indicate the time when the treatment started. Let's assume that the treatment started in 1994. In this case, years before 1994 will have a value of 0, and years from 1994 onward a 1.

gen time = (year>=1994) & !missing(year)

  • Create a dummy variable to identify the group exposed to the treatment. In this example, let's assume that countries with code 5, 6, and 7 were treated (=1). Countries 1-4 were not treated (=0).

gen treated = (country>4) & !missing(country)

  • Create an interaction between time and treated. We will call this interaction ‘did’

gen did = time*treated

  • Estimating the DID estimator

reg y time treated did, r

. reg y time treated did, r

Linear regression                               Number of obs     =         70
                                           F(3, 66)          =       2.17
                                           Prob > F          =     0.0998
                                           R-squared         =     0.0827
                                           Root MSE          =     3.0e+09
--------------------------------------------------------------------------------
        |                 Robust         
      y |  Coefficient    std. err.      t       p>|t|     [95% conf. interval]
--------------------------------------------------------------------------------
   time |  2.29e+09       9.00e+08      2.54     0.013      4.92e+08   4.09e+09
treated |  1.78e+09       1.05e+09      1.70     0.094     -3.11e+08   3.86e+09
    did | -2.52e+09       1.45e+09     -1.73     0.088     -5.42e+09   3.81e+08
  _cons | 3.58e+08        7.61e+08      0.47     0.640     -1.16e+09   1.88e+09
--------------------------------------------------------------------------------
  • The coefficient for ‘did’ is the difference-in-differences estimator. The effect is significant at 10% level, with the treatment having a negative effect.

Using "hastag" method

  • No need to generate interaction while using the hastag method. Estimate using the following command

reg y time##treated, r

   . reg y time##treated, r
   Linear regression                                 Number of obs     =        70
                                           F(3, 66)          =       2.17
                                           Prob > F          =     0.0998
                                           R-squared         =     0.0827
                                           Root MSE          =    3.0e+09
-----------------------------------------------------------------------------------
            |                 Robust         
         y  |  Coefficient    std. err.      t      p>|t|      [95% conf. interval]
-----------------------------------------------------------------------------------
     1.time |  2.29e+09       9.00e+08      2.54     0.013      4.92e+08   4.09e+09
  1.treated |  1.78e+09       1.05e+09      1.70     0.094     -3.11e+08   3.86e+09
time##treated
        1 1 | -2.52e+09       1.45e+09     -1.73     0.088     -5.42e+09   3.81e+08
      _cons | 3.58e+08        7.61e+08      0.47     0.640     -1.16e+09   1.88e+09
-----------------------------------------------------------------------------------
  • The coefficient for ‘time#treated’ is the difference-in-differences estimator (‘did’ in the previous example). The effect is significant at 10%, with the treatment having a negative effect.

Using the "diff" command

  • The command diff is user‐defined for Stata. To install, type

ssc install diff

  • Estimating using the diff command

diff y, t(treated) p(time)

Note: "treated" and "time" in parentheses are dummies for treatment and time; see the "basic" method

. diff y, t(treated) p(time)

Number of observations in the DIFF-IN-DIFF: 70
    Baseline        Follow-up
Control:  16              24          40 
Treated:  12              18          30
     28              42
---------------------------------------------------------------
 Outcome var.   |  y       | S. Err.   |  t       | P>|t|
---------------------------------------------------------------
Baseline
Control    |  3.6e+08 |           |         |
Treated    |  2.1e+09 |           |         |
Diff (T-C) |  1.8e+09 |  1.1e+09  | 1.58    | 0.120
Follow-up
Control    |  2.6e+09 |           |         |
Treated    |  1.9e+09 |           |         |
Diff (T-C) | -7.4e+08 |  9.2e+08  | -0.81   | 0.422
Diff-in-Diff    | -2.5e+09 |  1.5e+09  | -1.73   | 0.088* 
------------------------------------------------------------------
R-square:    0.08
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

Note: the highlighted number (0.088) is the p-value for the treatment effect, or DID estimator

** Type help diff for more details/options

Useful Resources

Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.
 
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How much should we trust differences-in-differences estimates?. The Quarterly journal of economics119(1), 249-275.
 
Card, D. (1990). The impact of the Mariel boatlift on the Miami labor market. ILR Review43(2), 245-257.
 
Card, D., & Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. The American Economic Review84(4), 772.
 
Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of economic literature47(1), 5-86.
 
Roth, J., Sant'Anna, P. H., Bilinski, A., & Poe, J. (2022). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. Available at https://www.jonathandroth.com/assets/files/DiD_Review_Paper.pdf 
 
Waldinger, F. (n.d.). Lecture 3: Differences-in-Differences. Available ate: https://silo.tips/download/lecture-3-differences-in-differences, accessed August, 10(2022).
 
Wooldridge, J. (2007). What’s new in econometrics? Lecture 10 difference-in-differences estimation. NBER Summer Institute, available at: https://www.nber.org/sites/default/files/2021-03/slides_10_diffindiffs.pdf, accessed August, 8(2022).

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.