Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

DID in R: Differences-in-Differences R tutorial

difference-in-difference R guide

Concept heads up

Difference in Difference method compares not the outcomes Y but the change in the outcomes pre- and posttreatment. This is a quasi-experiment approach.

R coding tutorial

Getting sample data.

library(foreign)
mydata <- read.dta("https://dss.princeton.edu/training/Panel101.dta")

Create a dummy variable to indicate the time when the treatment started. Lets assume that treatment started in 1994. In this case, years before 1994 will have a value of 0 and 1994+ a 1. If you already have this skip this step.

mydata$time=ifelse(mydata$year>=1994,1,0)

Create a dummy variable to identify the group exposed to the treatment. In this example lets assumed that countries with code 5,6, and 7 were treated (=1). Countries 1-4 were not treated (=0). If you already have this skip this step.

mydata$treated = ifelse(mydata$country == "E" | mydata$country == "F" | mydata$country == "G", 1, 0)

 Create an interaction between time and treated. We will call this interaction ‘did’.

mydata$did = mydata$time * mydata$treated

 Estimating the DID estimator (method 1: generate the interaction)

didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
Call:
lm(formula = y ~ treated + time + did, data = mydata)

Residuals:
       Min         1Q     Median         3Q        Max 
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  3.581e+08  7.382e+08   0.485   0.6292  
treated      1.776e+09  1.128e+09   1.575   0.1200  
time         2.289e+09  9.530e+08   2.402   0.0191 *
did         -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104 
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249

The coefficient for ‘did’ is the differences-in-differences estimator. The effect is significant at 10% with the treatment having a negative effect.

Estimating the DID estimator (method 2: using the multiplication)

didreg1 = lm(y ~ treated*time, data = mydata)
summary(didreg1)
Call:
lm(formula = y ~ treated * time, data = mydata)

Residuals:
       Min         1Q     Median         3Q        Max 
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.581e+08  7.382e+08   0.485   0.6292  
treated       1.776e+09  1.128e+09   1.575   0.1200  
time          2.289e+09  9.530e+08   2.402   0.0191 *
treated:time -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104 
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249

The coefficient for ‘treated*time’ is the differences-indifferences estimator (‘did’ in the previous example). The effect is significant at 10% with the treatment having a negative effect.

Reference list

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051