It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

# DID in R: Differences-in-Differences R tutorial

difference-in-difference R guide

## Concept heads up

Difference in Difference method compares not the outcomes Y but the change in the outcomes pre- and posttreatment. This is a quasi-experiment approach.

## R coding tutorial

Getting sample data.

```library(foreign)
```

Create a dummy variable to indicate the time when the treatment started. Lets assume that treatment started in 1994. In this case, years before 1994 will have a value of 0 and 1994+ a 1. If you already have this skip this step.

```mydata\$time=ifelse(mydata\$year>=1994,1,0)
```

Create a dummy variable to identify the group exposed to the treatment. In this example lets assumed that countries with code 5,6, and 7 were treated (=1). Countries 1-4 were not treated (=0). If you already have this skip this step.

```mydata\$treated = ifelse(mydata\$country == "E" | mydata\$country == "F" | mydata\$country == "G", 1, 0)
```

Create an interaction between time and treated. We will call this interaction ‘did’.

```mydata\$did = mydata\$time * mydata\$treated
```

Estimating the DID estimator (method 1: generate the interaction)

```didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
```
```Call:
lm(formula = y ~ treated + time + did, data = mydata)

Residuals:
Min         1Q     Median         3Q        Max
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.581e+08  7.382e+08   0.485   0.6292
treated      1.776e+09  1.128e+09   1.575   0.1200
time         2.289e+09  9.530e+08   2.402   0.0191 *
did         -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249
```

The coefficient for ‘did’ is the differences-in-differences estimator. The effect is significant at 10% with the treatment having a negative effect.

Estimating the DID estimator (method 2: using the multiplication)

```didreg1 = lm(y ~ treated*time, data = mydata)
summary(didreg1)
```
```Call:
lm(formula = y ~ treated * time, data = mydata)

Residuals:
Min         1Q     Median         3Q        Max
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.581e+08  7.382e+08   0.485   0.6292
treated       1.776e+09  1.128e+09   1.575   0.1200
time          2.289e+09  9.530e+08   2.402   0.0191 *
treated:time -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249

```

The coefficient for ‘treated*time’ is the differences-indifferences estimator (‘did’ in the previous example). The effect is significant at 10% with the treatment having a negative effect.