An event study is used to examine the reactions of the market to events of interest. A simple event study involves the following steps:
This guide will show how to conduct an event study using Stata with the help of two examples. The first example uses CRSP data, which is restricted by contract to Princeton University and requires one to authenticate via Global Protect and then use their Princeton University NetID and password to download. The second example uses hypothetical data, and the data is downloadable for all.
PS: if you are interested in conducting the differences-in-differences analysis, please click here.
1. Data Preparation
You may have downloaded datasets for an event study, or created ones by entering data into excel sheets. Usually, people have two files, one for stock returns, and the other for your event of interest. In this example, we start with two data sets, one called eventdates and the other called stockdata. In the eventdates data, we have company id (company_id) and the date of event (event_date) as variables. In the stock data, we have matching company id (company_id), stock return date (date), stock return (ret), and market return value (market_return).
If a set of observations for each company can be matched to a single event date, the study will be much simpler. In some situations, one may wish to examine more than one event date for each company. In multiple observations per company, it is necessary to create a duplicate set of observations for each event date/company combination. You need a full set of stock observations to examine each event.
If you already know that you have only one event for each company, you may skip the sub-sections 1.1 and 1.2 below, merge your own eventdate and stockdata data files, and go to section 2 for the step-by-step procedure to conduct the event study.
1.1 Downloading the Data
We recommend using a Stata do file to conduct the following event study analysis.
First, open a do file in Stata, and set your working directory. Use the following command to set the working directory. We use our working directory. Please make sure you use your working directory path.
There are two data sets we need to download: one called eventdates_crsp, which contains event information, and the other called stockdata_crsp. The events, in this example, are merger announcement dates for 2007 obtained from SDC Platinum. The stock return data for 2007 was obtained from the CRSP daily stock.
Second, download the eventdates_crsp and stockdata_crsp datasets from the Princeton University repository by clicking the following link. Remember, the sample dataset for the CRSP example is only available to Princeton University users.
Third, if you did not download the datasets in your working directory, drag them into the working directory.
1.2 Combining Event and Stock Data
First, we need to find out how many event dates there are for each company. Use the dataset of event dates and generate a variable that counts the number of event dates per company.
Cut the dataset down to just one observation for each company. Each company observation is associated with the count of event dates for that company. Save this as a new dataset - don't overwrite your dataset of event dates!
The next step is to merge the new 'eventcount_crsp' dataset with your dataset of stock data.
Now use Stata's 'expand' command to create the duplicate observations. The 'eventcount' variable has been merged on to each stock observation, and tells Stata how many copies of that observation are needed. This is where your dataset can get VERY BIG, as we are duplicating the observations to however many counts of event we have per company.
You need to create a variable that indicates which 'set' of observations within the company each observation belongs to. Then sort the dataset to prepare for another merge.
Back in your original event dates dataset - not the 'eventcount' one! You need to create a matching set variable to identify the different event dates within each company. The final step is to use the set variable to match each event date with a set of stock observations.
Here, you may have observations where you have the events information but not stock information. You may examine which companies' stock information is missing.
Finally, create a new variable that groups company_id and set so that you have a unique identifier to use in the rest of your analysis.
PS: During the rest of your analysis, use group_id wherever the instructions say company_id.
2. Cleaning the Data and Calculating the Event and Estimation Windows
It's likely that you have more observations for each company than you need. It's also possible that you do not have enough for some. Before you can continue, you must make sure that you will be conducting your analyses on the correct observations. To do this, you will need to create a variable, dif, that will count the number of days from the observation to the event date. This can be either calendar days or trading days.
For number of trading days:
For calendar days:
As you can see, calculating the number of trading days is a little trickier than calendar days. For trading days, we first need to create a variable that counts the number of days within each company_id. Then we determine which observation occurs on the event date. We create a variable with the event date's day number on all of the observations within that company_id. Finally, we simply take the difference between the two, creating a variable, dif, that counts the number of days between each individual observation and the event day. Next, we need to make sure that we have the minimum number of observations before and after the event date, as well as the minimum number of observations before the event window for the estimation window. Let's say we want 2 days before and after the event date (a total of 5 days in the event window) and 30 days for the estimation window. (You can of course change these numbers to suit your analysis.)
The procedure for determining the event and estimation windows is the same. First we create a variable that equals 1 if the observation is within the specified days. Second, we create another variable that counts how many observations, within each company_id, has a 1 assigned to it. Finally, we replace all the missing values with zeroes, creating a dummy variable. You can now determine which companies do not have a sufficient number of observations.
The "tab" will produce a list of company_ids that do not have enough observations within the event and estimation windows, as well as the total number of observations for those company_ids. To eliminate these companies: (You should make sure the dataset has been saved under a different name before dropping any observations!)
At this point you can also drop some variables you won't need any longer: count_event_obs and count_est_obs. For doing this, type:
drop count_event_obs count_est_obs
3. Estimating Normal Performance
Now we are at the point where we can actually start an analysis. First we need a way to estimate Normal Performance. To do this, we will run a seperate regression for each company using the data within the estimation window and save the alphas (the intercept) and betas (the coefficient of the independent variable). We will later use these saved regression equations to predict normal performance during the event window.
Note that return, the dependent variable in our regression, is simply the CRSP variable for a given stock's return, while the independent variable vretd that we use to predict ret is the value-weighted return of an index for whatever exchange the stock trades on. Use the equivalent variables for your dataset.
Here, we created a variable "id" that numbers the companies from 1 to however many there are. The N is the number of company-event combinations that have complete data. This process iterates over the companies, runs a regression in the estimation window for each, and then uses that regression to predict a 'normal' return in the event window.
4. Abnormal and Cumulative Abnormal Returns
We can now calculate the abnormal and cumulative abnormal returns for our data. The daily abnormal return is computed by subtracting the predicted normal return from the actual return for each day in the event window. The sum of the abnormal returns over the event window is the cumulative abnormal return.
Here we simply calculate the abnormal return for each observation in the event window. Then we set the cumulative abnormal return equal to the sum of the abnormal returns for each company.
5. Testing for Significance
We are going to compute a test statistic, TEST, to check whether the average abnormal return for each stock is statistically different from zero.
TEST= ((ΣAR)/N) / (AR_SD/sqrt(N))
where AR is the abnormal return and AR_SD is the abnormal return standard deviation. If the absolute value of test is greater than 1.96, then the average abnormal return for that stock is significantly different from zero at the 5% level. The value of 1.96 comes from the standard normal distribution with a mean of 0 and a standard deviation of 1. 95% of the distribution is between ±1.96.
Note 1: find the number of days in event window from the working dataset by using the code tab event_date. In our case, it is 45069
Note 2: this test uses the sample standard deviation. A less conservative alternative is to use the population standard deviation. To derive this from the sample standard deviation produced by Stata, multiply ar_sd by the square root of n-1/n; in our example, by the square root of 4/5.
The following codes will store the results of your event study in an Excel spreadsheet file:
outsheet company_id event_date cumulative_abnormal_return test using stats.csv if dif==0, comma names
6. Testing Across All Companies
Instead of, or in addition to, looking at the average abnormal return for each company, you probably want to calculate the cumulative abnormal for all companies treated as a group. Here's the code for that:
The P-value on the constant from this regression will give you the significance of the cumulative abnormal return across all companies.
1. Data Preparation
You may have downloaded datasets for an event study, or created ones by entering data into excel sheets. Usually, people have two files, one for stock returns, and the other for your event of interest. In this example, we start with two data sets, one called eventdates and the other called stockdata. In the eventdates data, we have company id (company_id) and the date of event (event_date) as variables. In the stock data, we have matching company id (company_id), stock return date (date), stock return (ret), and market return value (market_return).
If a set of observations for each company can be matched to a single event date, the study will be much simpler. In some situations, one may wish to examine more than one event date for each company. In multiple observations per company, it is necessary to create a duplicate set of observations for each event date/company combination. You need a full set of stock observations to examine each event.
If you already know that you have only one event for each company, you may skip the sub-sections 1.1 and 1.2 below, merge your own eventdate and stockdata data files, and go to section 2 for the step-by-step procedure to conduct the event study.
1.1 Downloading the Data
We recommend using a Stata do file to conduct the following event study analysis.
First, open a do file in Stata, and set your working directory. Use the following command to set the working directory. We use our working directory. Please make sure you use your working directory path.
There are two data sets we need to download: one called eventdates, which contains event information, and the other called stockdata. The events, in this example, are hypothetical merger announcement dates for 2018. The stock return data for 2018 are also hypothetical.
Second, download the eventdates data by clicking on the following link, and save the data in your working directory.
Third, download the stockdata by clicking on the following link, and save the data in your working directory.
1.2 Combining Event and Stock Data
First, we need to find out how many event dates there are for each company. Use the dataset of event dates and generate a variable that counts the number of event dates per company.
Cut the dataset down to just one observation for each company. Each company observation is associated with the count of event dates for that company. Save this as a new dataset - don't overwrite your dataset of event dates!
The next step is to merge the new 'eventcount_dss' dataset with your dataset of stock data.
Now use Stata's 'expand' command to create the duplicate observations. The 'eventcount' variable has been merged on to each stock observation, and tells Stata how many copies of that observation are needed. This is where your dataset can get BIGGER, as we are duplicating the observations to however many counts of event we have per company.
You need to create a variable that indicates which 'set' of observations within the company each observation belongs to. Then sort the dataset to prepare for another merge.
Back in your original event dates dataset - not the 'eventcount' one! You need to create a matching set variable to identify the different event dates within each company. The final step is to use the set variable to match each event date with a set of stock observations.
Here, you may have observations where you have the events information but not stock information. You may examine which companies' stock information is missing.
Finally, create a new variable that groups company_id and set so that you have a unique identifier to use in the rest of your analysis.
PS: During the rest of your analysis, use group_id wherever the instructions say company_id.
2. Cleaning the Data and Calculating the Event and Estimation Windows
It's likely that you have more observations for each company than you need. It's also possible that you do not have enough for some. Before you can continue, you must make sure that you will be conducting your analyses on the correct observations. To do this, you will need to create a variable, dif, that will count the number of days from the observation to the event date. This can be either calendar days or trading days.
For number of trading days:
For calendar days:
As you can see, calculating the number of trading days is a little trickier than calendar days. For trading days, we first need to create a variable that counts the number of days within each company_id. Then we determine which observation occurs on the event date. We create a variable with the event date's day number on all of the observations within that company_id. Finally, we simply take the difference between the two, creating a variable, dif, that counts the number of days between each individual observation and the event day. Next, we need to make sure that we have the minimum number of observations before and after the event date, as well as the minimum number of observations before the event window for the estimation window. Let's say we want 2 days before and after the event date (a total of 5 days in the event window) and 30 days for the estimation window. (You can of course change these numbers to suit your analysis.)
The procedure for determining the event and estimation windows is the same. First we create a variable that equals 1 if the observation is within the specified days. Second, we create another variable that counts how many observations, within each company_id, has a 1 assigned to it. Finally, we replace all the missing values with zeroes, creating a dummy variable.
3. Estimating Normal Performance
Now we are at the point where we can actually start an analysis. First we need a way to estimate Normal Performance. To do this, we will run a seperate regression for each company using the data within the estimation window and save the alphas (the intercept) and betas (the coefficient of the independent variable). We will later use these saved regression equations to predict normal performance during the event window.
Here, we created a variable "id" that numbers the companies from 1 to however many there are. The N is the number of company-event combinations that have complete data. This process iterates over the companies, runs a regression in the estimation window for each, and then uses that regression to predict a 'normal' return in the event window.
4. Abnormal and Cumulative Abnormal Returns
We can now calculate the abnormal and cumulative abnormal returns for our data. The daily abnormal return is computed by subtracting the predicted normal return from the actual return for each day in the event window. The sum of the abnormal returns over the event window is the cumulative abnormal return.
Here we simply calculate the abnormal return for each observation in the event window. Then we set the cumulative abnormal return equal to the sum of the abnormal returns for each company.
5. Testing for Significance
We are going to compute a test statistic, TEST, to check whether the average abnormal return for each stock is statistically different from zero.
TEST= ((ΣAR)/N) / (AR_SD/sqrt(N))
where AR is the abnormal return and AR_SD is the abnormal return standard deviation. If the absolute value of test is greater than 1.96, then the average abnormal return for that stock is significantly different from zero at the 5% level. The value of 1.96 comes from the standard normal distribution with a mean of 0 and a standard deviation of 1. 95% of the distribution is between ±1.96.
Note 1: find the number of days in event window from the working dataset by using the code tab event_date. In our case, it is 5730
Note 2: this test uses the sample standard deviation. A less conservative alternative is to use the population standard deviation. To derive this from the sample standard deviation produced by Stata, multiply ar_sd by the square root of n-1/n; in our example, by the square root of 4/5.
The following codes will store the results of your event study in an Excel spreadsheet file:
outsheet company_id event_date cumulative_abnormal_return test using stats.csv if dif==0, comma names
6. Testing Across All Companies
Instead of, or in addition to, looking at the average abnormal return for each company, you probably want to calculate the cumulative abnormal for all companies treated as a group. Here's the code for that:
The P-value on the constant from this regression will give you the significance of the cumulative abnormal return across all companies.
Center for Research on Securities Prices (CRSP): https://www.crsp.org/products/research-products/crsp-us-stock-databases
Dasgupta, S., Laplante, B., & Mamingi, N. (1998). Capital market responses to environmental performance in developing countries (Vol. 1909). World Bank Publications.
DSS Data Analysis Guides. Available at: https://libguides.princeton.edu/c.php?g=1415215
Estudy Command for Event Study in Stata. The Data Hall. Available at https://thedatahall.com/estudy-command-for-event-study-in-stata/
The Basics of Event Study. The Data Hall. Available at https://thedatahall.com/the-basics-of-event-study/
Ullah, S., Zaefarian, G., Ahmed, R., & Kimani, D. (2021). How to apply the event study methodology in STATA: An overview and a step-by-step guide for authors. Industrial Marketing Management, 99, A1-A12.
If you have questions or comments about this guide or method, please email data@Princeton.edu.