# DID in R: Differences-in-Differences R tutorial

## Concept heads up

Difference in Difference method compares not the outcomes Y but the change in the outcomes pre- and posttreatment. This is a quasi-experiment approach.

## R coding tutorial

Getting sample data.

```library(foreign)
```

Create a dummy variable to indicate the time when the treatment started. Lets assume that treatment started in 1994. In this case, years before 1994 will have a value of 0 and 1994+ a 1. If you already have this skip this step.

```mydata\$time=ifelse(mydata\$year>=1994,1,0)
```

Create a dummy variable to identify the group exposed to the treatment. In this example lets assumed that countries with code 5,6, and 7 were treated (=1). Countries 1-4 were not treated (=0). If you already have this skip this step.

```mydata\$treated = ifelse(mydata\$country == "E" | mydata\$country == "F" | mydata\$country == "G", 1, 0)
```

Create an interaction between time and treated. We will call this interaction ‘did’.

```mydata\$did = mydata\$time * mydata\$treated
```

Estimating the DID estimator (method 1: generate the interaction)

```didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
```
```Call:
lm(formula = y ~ treated + time + did, data = mydata)

Residuals:
Min         1Q     Median         3Q        Max
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.581e+08  7.382e+08   0.485   0.6292
treated      1.776e+09  1.128e+09   1.575   0.1200
time         2.289e+09  9.530e+08   2.402   0.0191 *
did         -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249
```

The coefficient for ‘did’ is the differences-in-differences estimator. The effect is significant at 10% with the treatment having a negative effect.

Estimating the DID estimator (method 2: using the multiplication)

```didreg1 = lm(y ~ treated*time, data = mydata)
summary(didreg1)
```
```Call:
lm(formula = y ~ treated * time, data = mydata)

Residuals:
Min         1Q     Median         3Q        Max
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.581e+08  7.382e+08   0.485   0.6292
treated       1.776e+09  1.128e+09   1.575   0.1200
time          2.289e+09  9.530e+08   2.402   0.0191 *
treated:time -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249

```

The coefficient for ‘treated*time’ is the differences-indifferences estimator (‘did’ in the previous example). The effect is significant at 10% with the treatment having a negative effect.