Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Event Study with Stata: A Step-by-step Guide

This tutorial provides a step-by-step guide to conducting basic event study analysis using Stata.

A Step-by-step Guide

Introduction

An event study is used to examine reactions of the market to events of interest. A simple event study involves the following steps:

  1. Data Preparation
  2. Cleaning the Data and Calculating the Event Window
  3. Estimating Normal Performance
  4. Calculating Abnormal and Cumulative Abnormal Returns
  5. Testing for Significance
  6. Testing Across All Events

This guide will show how to conduct an event study using Stata with the help of two examples. The first example uses CRSP data, which is restricted by contract to Princeton University and requires one to authenticate via Global Protect and then use their Princeton University NetID and password to download.  The second example uses hypothetical data, and the data is downloadable for all.

PS: if you are interested in conducting the difference-in-differences analysis, please click here.

Event Study with CRSP Data

1. Data Preparation

You may have downloaded datasets for an event study, or created ones by entering data into excel sheets. Usually, people have two files, one for stock returns, and the other for your event of interest. In this example, we start with two data sets, one called eventdates and the other called stockdata. In the eventdates data, we have company id (company_id) and the date of event (event_date) as variables. In the stock data, we have matching company id (company_id), stock return date (date), stock return (ret), and market return value (market_return).

If a set of observations for each company can be matched to a single event date, the study will be much simpler. In some situations, one may wish to examine more than one event date for each company. In multiple observations per company, it is necessary to create a duplicate set of observations for each event date/company combination. You need a full set of stock observations to examine each event.

If you already know that you have only one event for each company, you may skip the sub-sections 1.1 and 1.2 below, merge your own eventdate and stockdata data files, and go to section 2 for the step-by-step procedure to conduct the event study.

1.1 Downloading the Data

We recommend using a Stata do file to conduct the following event study analysis.

First, open a do file in Stata, and set your working directory. Use the following command to set the working directory. We use our working directory. Please make sure you use your working directory path.

cd "C:\Documents\EventStudy\EventStudy_CRSP"

There are two data sets we need to download: one called eventdates_crsp, which contains event information, and the other called stockdata_crsp. The events, in this example, are merger announcement dates for 2007 obtained from SDC Platinum. The stock return data for 2007 was obtained from the CRSP daily stock.

Second, download the eventdates_crsp and stockdata_crsp datasets from the Princeton University repository by clicking the following link. Remember, the sample dataset for the CRSP example is only available to Princeton University users.

https://dss2.princeton.edu/data/244/

Third, if you did not download the datasets in your working directory, drag them into the working directory. 

1.2 Combining Event and Stock Data

First, we need to find out how many event dates there are for each company. Use the dataset of event dates and generate a variable that counts the number of event dates per company.

use eventdates_crsp, clear
sort company_id
by company_id: gen eventcount=_N

Cut the dataset down to just one observation for each company. Each company observation is associated with the count of event dates for that company. Save this as a new dataset - don't overwrite your dataset of event dates!

by company_id: keep if _n==1
sort company_id
keep company_id eventcount
save eventcount_crsp, replace

The next step is to merge the new 'eventcount_crsp' dataset with your dataset of stock data.

use stockdata_crsp, clear
sort company_id
merge m:1 company_id using eventcount_crsp
tab _merge
keep if _merge==3
drop _merge

Now use Stata's 'expand' command to create the duplicate observations. The 'eventcount' variable has been merged on to each stock observation, and tells Stata how many copies of that observation are needed. This is where your dataset can get VERY BIG, as we are duplicating the observations to however many counts of event we have per company.

expand eventcount              

You need to create a variable that indicates which 'set' of observations within the company each observation belongs to. Then sort the dataset to prepare for another merge.

drop eventcount
sort company_id date
by company_id date: gen set=_n
sort company_id set
save stockdata2_crsp, replace

Back in your original event dates dataset - not the 'eventcount' one! You need to create a matching set variable to identify the different event dates within each company. The final step is to use the set variable to match each event date with a set of stock observations.

use eventdates_crsp, clear
sort company_id
by company_id: gen set=_n
sort company_id set
save eventdates2_crsp, replace
use stockdata2_crsp, clear
merge m:1 company_id set using eventdates2_crsp
tab _merge 
             

Here, you may have observations where you have the events information but not stock information. You may examine which companies' stock information is missing.

list company_id if _merge==2
keep if _merge==3
drop _merge

Finally, create a new variable that groups company_id and set so that you have a unique identifier to use in the rest of your analysis.

egen group_id = group(company_id set)    

PS: During the rest of your analysis, use group_id wherever the instructions say company_id.

2. Cleaning the Data and Calculating the Event and Estimation Windows

It's likely that you have more observations for each company than you need. It's also possible that you do not have enough for some. Before you can continue, you must make sure that you will be conducting your analyses on the correct observations. To do this, you will need to create a variable, dif, that will count the number of days from the observation to the event date. This can be either calendar days or trading days.

For number of trading days:

sort company_id date
by company_id: gen datenum=_n
by company_id: gen target=datenum if date==event_date
egen td=min(target), by(company_id)
drop target
gen dif=datenum-td

For calendar days:

gen dif=date-event_date

As you can see, calculating the number of trading days is a little trickier than calendar days. For trading days, we first need to create a variable that counts the number of days within each company_id. Then we determine which observation occurs on the event date. We create a variable with the event date's day number on all of the observations within that company_id. Finally, we simply take the difference between the two, creating a variable, dif, that counts the number of days between each individual observation and the event day. Next, we need to make sure that we have the minimum number of observations before and after the event date, as well as the minimum number of observations before the event window for the estimation window. Let's say we want 2 days before and after the event date (a total of 5 days in the event window) and 30 days for the estimation window. (You can of course change these numbers to suit your analysis.)

by company_id: gen event_window=1 if dif>=-2 & dif<=2
egen count_event_obs=count(event_window), by(company_id)
by company_id: gen estimation_window=1 if dif<-30 & dif>=-60
egen count_est_obs=count(estimation_window), by(company_id)
replace event_window=0 if event_window==.
replace estimation_window=0 if estimation_window==.

The procedure for determining the event and estimation windows is the same. First we create a variable that equals 1 if the observation is within the specified days. Second, we create another variable that counts how many observations, within each company_id, has a 1 assigned to it. Finally, we replace all the missing values with zeroes, creating a dummy variable. You can now determine which companies do not have a sufficient number of observations.

tab company_id if count_event_obs<5
tab company_id if count_est_obs<30

The "tab" will produce a list of company_ids that do not have enough observations within the event and estimation windows, as well as the total number of observations for those company_ids. To eliminate these companies: (You should make sure the dataset has been saved under a different name before dropping any observations!)

drop if count_event_obs < 5
drop if count_est_obs < 30

At this point you can also drop some variables you won't need any longer: count_event_obs and count_est_obs. For doing this, type:

drop count_event_obs count_est_obs

3. Estimating Normal Performance

Now we are at the point where we can actually start an analysis. First we need a way to estimate Normal Performance. To do this, we will run a seperate regression for each company using the data within the estimation window and save the alphas (the intercept) and betas (the coefficient of the independent variable). We will later use these saved regression equations to predict normal performance during the event window.

Note that return, the dependent variable in our regression, is simply the CRSP variable for a given stock's return, while the independent variable vretd that we use to predict ret is the value-weighted return of an index for whatever exchange the stock trades on. Use the equivalent variables for your dataset.

gen predicted_return=.
egen id=group(company_id)
/* for multiple event dates, use: egen id = group(group_id) */
forvalues i=1(1)N{ /*note: replace N with the highest value of id;our N=199 */
     l id company_id if id==`i' & dif==0
       reg ret market_return if id==`i' & estimation_window==1
       predict p if id==`i'
       replace predicted_return = p if id==`i' & event_window==1
       drop p
   }

Here, we created a variable "id" that numbers the companies from 1 to however many there are. The N is the number of company-event combinations that have complete data. This process iterates over the companies, runs a regression in the estimation window for each, and then uses that regression to predict a 'normal' return in the event window.

4. Abnormal and Cumulative Abnormal Returns

We can now calculate the abnormal and cumulative abnormal returns for our data. The daily abnormal return is computed by subtracting the predicted normal return from the actual return for each day in the event window. The sum of the abnormal returns over the event window is the cumulative abnormal return.

sort id date
gen abnormal_return=ret-predicted_return if event_window==1
by id: egen cumulative_abnormal_return = sum(abnormal_return)

Here we simply calculate the abnormal return for each observation in the event window. Then we set the cumulative abnormal return equal to the sum of the abnormal returns for each company.

5. Testing for Significance

We are going to compute a test statistic, TEST, to check whether the average abnormal return for each stock is statistically different from zero.

TEST= ((ΣAR)/N) / (AR_SD/sqrt(N))

where AR is the abnormal return and AR_SD is the abnormal return standard deviation. If the absolute value of test is greater than 1.96, then the average abnormal return for that stock is significantly different from zero at the 5% level. The value of 1.96 comes from the standard normal distribution with a mean of 0 and a standard deviation of 1. 95% of the distribution is between ±1.96.

sort id date
by id: egen ar_sd = sd(abnormal_return)
gen test =(1/sqrt(number of days in event window))*(cumulative_abnormal_return /ar_sd)
list company_id cumulative_abnormal_return test if dif==0

Note 1: find the number of days in event window from the working dataset by using the code tab event_date. In our case, it is 45069

Note 2: this test uses the sample standard deviation. A less conservative alternative is to use the population standard deviation. To derive this from the sample standard deviation produced by Stata, multiply ar_sd by the square root of n-1/n; in our example, by the square root of 4/5.

The following codes will store the results of your event study in an Excel spreadsheet file:

outsheet  company_id event_date cumulative_abnormal_return test using stats.csv if dif==0, comma names

6. Testing Across All Events

Instead of, or in addition to, looking at the average abnormal return for each company, you probably want to calculate the cumulative abnormal for all companies treated as a group. Here's the code for that:

reg cumulative_abnormal_return if dif==0, robust

The P-value on the constant from this regression will give you the significance of the cumulative abnormal return across all companies. This test preferable to a t-test because it allows you to use robust standard errors.

Event Study with (hypothetical) DSS Data

1. Data Preparation

You may have downloaded datasets for an event study, or created ones by entering data into excel sheets. Usually, people have two files, one for stock returns, and the other for your event of interest. In this example, we start with two data sets, one called eventdates and the other called stockdata. In the eventdates data, we have company id (company_id) and the date of event (event_date) as variables. In the stock data, we have matching company id (company_id), stock return date (date), stock return (ret), and market return value (market_return).

If a set of observations for each company can be matched to a single event date, the study will be much simpler. In some situations, one may wish to examine more than one event date for each company. In multiple observations per company, it is necessary to create a duplicate set of observations for each event date/company combination. You need a full set of stock observations to examine each event.

If you already know that you have only one event for each company, you may skip the sub-sections 1.1 and 1.2 below, merge your own eventdate and stockdata data files, and go to section 2 for the step-by-step procedure to conduct the event study.

1.1 Downloading the Data

We recommend using a Stata do file to conduct the following event study analysis.

First, open a do file in Stata, and set your working directory. Use the following command to set the working directory. We use our working directory. Please make sure you use your working directory path.

cd "C:\Documents\EventStudy\EventStudy_DSS"

There are two data sets we need to download: one called eventdates, which contains event information, and the other called stockdata. The events, in this example, are hypothetical merger announcement dates for 2018. The stock return data for 2018 are also hypothetical.

Second, download the eventdates data by clicking on the following link, and save the data in your working directory.

use https://dss.princeton.edu/training/eventdates_dss.dta
save eventdates_dss 

Third, download the stockdata by clicking on the following link, and save the data in your working directory.

use https://dss.princeton.edu/training/stockdata_dss.dta, clear 
save stockdata_dss

1.2 Combining Event and Stock Data

First, we need to find out how many event dates there are for each company. Use the dataset of event dates and generate a variable that counts the number of event dates per company.

use eventdates_dss, clear
sort company_id
by company_id: gen eventcount=_N

Cut the dataset down to just one observation for each company. Each company observation is associated with the count of event dates for that company. Save this as a new dataset - don't overwrite your dataset of event dates!

by company_id: keep if _n==1
sort company_id
keep company_id eventcount
save eventcount_dss, replace

The next step is to merge the new 'eventcount_dss' dataset with your dataset of stock data.

use stockdata_dss, clear
sort company_id
merge m:1 company_id using eventcount_dss
tab _merge
keep if _merge==3
drop _merge

Now use Stata's 'expand' command to create the duplicate observations. The 'eventcount' variable has been merged on to each stock observation, and tells Stata how many copies of that observation are needed. This is where your dataset can get BIGGER, as we are duplicating the observations to however many counts of event we have per company.

expand eventcount              

You need to create a variable that indicates which 'set' of observations within the company each observation belongs to. Then sort the dataset to prepare for another merge.

drop eventcount
sort company_id date
by company_id date: gen set=_n
sort company_id set
save stockdata2_dss, replace

Back in your original event dates dataset - not the 'eventcount' one! You need to create a matching set variable to identify the different event dates within each company. The final step is to use the set variable to match each event date with a set of stock observations.

use eventdates_dss, clear
sort company_id
by company_id: gen set=_n
sort company_id set
save eventdates2_dss, replace
use stockdata2_dss, clear
merge m:1 company_id set using eventdates2_dss
tab _merge 
             

Here, you may have observations where you have the events information but not stock information. You may examine which companies' stock information is missing.

list company_id if _merge==2
keep if _merge==3
drop _merge

Finally, create a new variable that groups company_id and set so that you have a unique identifier to use in the rest of your analysis.

egen group_id = group(company_id set)    

PS: During the rest of your analysis, use group_id wherever the instructions say company_id.

2. Cleaning the Data and Calculating the Event and Estimation Windows

It's likely that you have more observations for each company than you need. It's also possible that you do not have enough for some. Before you can continue, you must make sure that you will be conducting your analyses on the correct observations. To do this, you will need to create a variable, dif, that will count the number of days from the observation to the event date. This can be either calendar days or trading days.

For number of trading days:

sort company_id date
by company_id: gen datenum=_n
by company_id: gen target=datenum if date==event_date
egen td=min(target), by(company_id)
drop target
gen dif=datenum-td

For calendar days:

gen dif=date-event_date

As you can see, calculating the number of trading days is a little trickier than calendar days. For trading days, we first need to create a variable that counts the number of days within each company_id. Then we determine which observation occurs on the event date. We create a variable with the event date's day number on all of the observations within that company_id. Finally, we simply take the difference between the two, creating a variable, dif, that counts the number of days between each individual observation and the event day. Next, we need to make sure that we have the minimum number of observations before and after the event date, as well as the minimum number of observations before the event window for the estimation window. Let's say we want 2 days before and after the event date (a total of 5 days in the event window) and 30 days for the estimation window. (You can of course change these numbers to suit your analysis.)

by company_id: gen event_window=1 if dif>=-2 & dif<=2
egen count_event_obs=count(event_window), by(company_id)
by company_id: gen estimation_window=1 if dif<-30 & dif>=-60
egen count_est_obs=count(estimation_window), by(company_id)
replace event_window=0 if event_window==.
replace estimation_window=0 if estimation_window==.

The procedure for determining the event and estimation windows is the same. First we create a variable that equals 1 if the observation is within the specified days. Second, we create another variable that counts how many observations, within each company_id, has a 1 assigned to it. Finally, we replace all the missing values with zeroes, creating a dummy variable.

3. Estimating Normal Performance

Now we are at the point where we can actually start an analysis. First we need a way to estimate Normal Performance. To do this, we will run a seperate regression for each company using the data within the estimation window and save the alphas (the intercept) and betas (the coefficient of the independent variable). We will later use these saved regression equations to predict normal performance during the event window.

gen predicted_return=.
egen id=group(company_id)
/* for multiple event dates, use: egen id = group(group_id) */
forvalues i=1(1)N{ /*note: replace N with the highest value of id; Our N=30 */
     l id company_id if id==`i' & dif==0
       reg ret market_return if id==`i' & estimation_window==1
       predict p if id==`i'
       replace predicted_return = p if id==`i' & event_window==1
       drop p
   }

Here, we created a variable "id" that numbers the companies from 1 to however many there are. The N is the number of company-event combinations that have complete data. This process iterates over the companies, runs a regression in the estimation window for each, and then uses that regression to predict a 'normal' return in the event window.

4. Abnormal and Cumulative Abnormal Returns

We can now calculate the abnormal and cumulative abnormal returns for our data. The daily abnormal return is computed by subtracting the predicted normal return from the actual return for each day in the event window. The sum of the abnormal returns over the event window is the cumulative abnormal return.

sort id date
gen abnormal_return=ret-predicted_return if event_window==1
by id: egen cumulative_abnormal_return = sum(abnormal_return)

Here we simply calculate the abnormal return for each observation in the event window. Then we set the cumulative abnormal return equal to the sum of the abnormal returns for each company.

5. Testing for Significance

We are going to compute a test statistic, TEST, to check whether the average abnormal return for each stock is statistically different from zero.

TEST= ((ΣAR)/N) / (AR_SD/sqrt(N))

where AR is the abnormal return and AR_SD is the abnormal return standard deviation. If the absolute value of test is greater than 1.96, then the average abnormal return for that stock is significantly different from zero at the 5% level. The value of 1.96 comes from the standard normal distribution with a mean of 0 and a standard deviation of 1. 95% of the distribution is between ±1.96.

sort id date
by id: egen ar_sd = sd(abnormal_return)
gen test =(1/sqrt(number of days in event window))*(cumulative_abnormal_return /ar_sd)
list company_id cumulative_abnormal_return test if dif==0

Note 1: find the number of days in event window from the working dataset by using the code tab event_date. In our case, it is 5730

Note 2: this test uses the sample standard deviation. A less conservative alternative is to use the population standard deviation. To derive this from the sample standard deviation produced by Stata, multiply ar_sd by the square root of n-1/n; in our example, by the square root of 4/5.

The following codes will store the results of your event study in an Excel spreadsheet file:

outsheet  company_id event_date cumulative_abnormal_return test using stats.csv if dif==0, comma names

6. Testing Across All Events

Instead of, or in addition to, looking at the average abnormal return for each company, you probably want to calculate the cumulative abnormal for all companies treated as a group. Here's the code for that:

reg cumulative_abnormal_return if dif==0, robust

The P-value on the constant from this regression will give you the significance of the cumulative abnormal return across all companies. This test preferable to a t-test because it allows you to use robust standard errors. We found statistically insignificant results may be due to the very small sample size of our hypothetical data.

Useful Resources

Center for Research on Securities Prices (CRSP): https://www.crsp.org/products/research-products/crsp-us-stock-databases 

Dasgupta, S., Laplante, B., & Mamingi, N. (1998). Capital market responses to environmental performance in developing countries (Vol. 1909). World Bank Publications.

DSS Online Training Section https://dss.princeton.edu/training/

Princeton DSS Data Analysis Guides https://library.princeton.edu/dss/training
 
Ullah, S., Zaefarian, G., Ahmed, R., & Kimani, D. (2021). How to apply the event study methodology in STATA: An overview and a step-by-step guide for authors. Industrial Marketing Management99, A1-A12.
 

Wellner, L., Dreher, A., Fuchs, A., Parks, B., & Strange, A. (2022). Can aid buy foreign public support? Evidence from Chinese development finance.

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.