Converting into date variables
There are some different ways of representing date and we need to convert them into date variable, so that R understands they are dates.
We first read the file into R.
mydata<-read.csv("https://dss.princeton.edu/training/mydata_date.csv")
Source:https://www.statmethods.net/input/dates.html
We can see that the four columns are representing the same date, with different types of formatting. The first two columns are characters written in different ways, and the last two are intergers.
Now we will show how to convert these four variables into date variables correspondingly.
converting the string 'date1' into a date variable called 'new.date1':
mydata$new.date1 = as.Date(mydata$date1, "%d-%b-%y")
converting the string 'date2' into a date variable called 'new.date2':
mydata$new.date2 = as.Date(mydata$date2, "%m/%d/%Y")
converting the string 'date3' into a date variable called 'new.date3':
mydata$new.date3 = as.Date(as.character(mydata$date3), "%Y%m%d")
#note that in this case we have to convert the integer into character first
converting the string 'date4' into a date variable called 'new.date4':
mydata$len.date4 = nchar(mydata$date4) # Need to identify the length first mydata$date4 = as.character(mydata$date4) # Need to convert to character mydata$date4b = ifelse(mydata$len.date4==6, paste0(substr(mydata$date4,1,4),0, substr(mydata$date4,5,5),0, substr(mydata$date4,6,6)), paste0(substr(mydata$date4,1,4),0, substr(mydata$date4,5,5), substr(mydata$date4,6,7))) mydata$new.date4 = as.Date(mydata$date4b, "%Y%m%d")
Now see the conversion results.
Now we can see the new.date1-4 are date variables.
Note that it's easy to convert them back into strings:
mydata$string.date1 = as.character(mydata$new.date1)
Extracting year, month and day using base functions
mydata$date<-mydata$new.date1 # Extracting year (from a variable in date format) mydata$year = as.numeric(format(mydata$date, "%Y")) # Extracting month (from a variable in date format) mydata$month = as.numeric(format(mydata$date, "%m")) # Extracting day (from a variable in date format) mydata$day = as.numeric(format(mydata$date, "%d"))
Lags and forwards (leads)
# Getting the sample data usa = read.csv("http://dss.princeton.edu/training/us.csv", header=TRUE) # Lag 1 of ‘gdppcgr’, see variable ‘l1.gdp’ below usa$l1.gdp <- c(NA,usa$gdppcgr[1:nrow(usa)-1]) # Forward 1 of ‘gdppcgr’, see variable ‘f1.gdp’ below usa$f1.gdp <- c(usa$gdppcgr[2:nrow(usa)],NA
Lag and forward variables in panel data
# Creating a dataset set.seed(12345) mydata = data.frame(country = rep(toupper(letters[1:3]), each=5), year = rep(2000:2004,3), var1 = rnorm(15)) # Function to get the lags lag = function(x) c(NA,x[1:(length(x)-1)]) # Getting the lags in the data mydata$lag.var1 = ave(mydata$var1, mydata$country, FUN=lag) # or using plm package to get the lag: library(plm) mydata = pdata.frame(mydata, index = c("country", "year")) mydata$lag.var1 = lag(mydata$var1) # Function to get the forward or lead values lead = function(x) c(x[2:length(x)],NA) # Getting the forward/leads in the data mydata$lead.var1 = ave(mydata$var1, mydata$country, FUN=lead)
Replacing missing values with previous non-missing
We often have to deal with datasets have missing data. For whatever reason the data is missing on some certain dates, one of the ways of filling the NA value is to use the most recent previous non-missing.
# Creating a dataset set.seed(12345) mydata = data.frame(country = rep(toupper(letters[1:3]), each=5), year = rep(2000:2004,3), var1 = rnorm(15)) mydata$var1 = ifelse(mydata$year<2003,mydata$var1,NA) # Replacing missing values with previous non-missing library(zoo) mydata$var2 <- na.locf(mydata$var1) mydata
Rolling sum in panel data
# Creating a dataset set.seed(12345) mydata = data.frame(country = rep(toupper(letters[1:3]), each=5), year = rep(2000:2004,3), var1 = rnorm(15)) # Sort data by country and year mydata = mydata[ order(mydata$country, mydata$year), ] # Rolling sum every four years library(zoo) rolsum = function(x) rollapply(x, 4, sum, na.rm=TRUE, fill = NA, align = "right") mydata$sum = ave(mydata$var1, mydata$country, FUN=rolsum) mydata
We can download the 'eventstudies' package in R to conduct event study analysis. Note that you cannot directly download the package. You have to download the 'githubinstall' package first and then download the eventstudies package through github download.
install.packages('digest') install.packages('githubinstall') library(githubinstall) githubinstall('eventstudies',force=TRUE) library(eventstudies)
After that, we will call our data.
data('StockPriceReturns') data('SplitDates') data('OtherReturns')
The overview of the data:
The StockPriceReturn dataset gives the information about several companies' stock return along with the date, the 'OtherReturns' are market benchmark stockprices, and the 'SplitDate' dataset tells us the event happening time.
Now we can do the eventstudy.
es <-eventstudy(firm.returns = StockPriceReturns,
event.list = SplitDates,
event.window = 7,
type = 'None', #We are using 'None' as the type here.
to.remap = TRUE,
remap = 'cumsum',
inference = TRUE,
inference.strategy = 'bootstrap')
plot(es)
We can notice that there is a price drop at the time of the event. Also, after 3 days of the event, it seems that the stock price has a trend of going back.
Now, we use the market model.
es1 <-eventstudy(firm.returns = StockPriceReturns, event.list = SplitDates, event.window = 7, type = 'marketModel', #Here we use the market model to.remap = TRUE, remap = 'cumsum', inference = TRUE, inference.strategy = 'bootstrap', model.args = list( market.returns = OtherReturns[,'NiftyIndex'] #Here we specify what benchmark we will use )) plot(es1)
Now we can see that using the market model, the event of split does not influence the stock price very much.
We can also use augmented market model to allow for more flexibility.
es2 <-eventstudy(firm.returns = StockPriceReturns,
event.list = SplitDates,
event.window = 7,
type = 'marketModel',
to.remap = TRUE,
remap = 'cumsum',
inference = TRUE,
inference.strategy = 'bootstrap',
model.args = list(
market.returns = OtherReturns[,'NiftyIndex'],
Others = OtherReturns[,'USDINR'],
market.returns.purge=TRUE,
nlag.makex = 5, #We can manually set the lag time
nlag.lmAMM = 5
))
plot(es2)
We can see that this time, the plot resembles the one from the market model very much. Both indicating the market 'recovered' from the split really soon.
Also, the eventstudies package allows you to convert the returns to cumulative returns.
es <- phys2eventtime(z=StockPriceReturns, events=SplitDates, width=10) es.w <- window(es$z.e, start=-10, end=10) es.cs <- remap.cumsum(es.w,is.pc=FALSE,base=0) es.cs
A Little Book of R For Time Series, release 0.2, Avril Coghlan, https://buildmedia.readthedocs.org/media/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-r-for-time-series.pdf
Cross Validated, https://stats.stackexchange.com/
Event Studies, https://theeffectbook.net/ch-EventStudies.html
Event Study in R, https://www.youtube.com/watch?v=gTfW33-dInM
Event Study using R with 'estudy2' package, https://www.youtube.com/watch?v=sf6BF1q3yyI
Introduction to the eventstudies package in R, Ajay Shah, Vimal Balasubramaniam and Vikram Bahure, https://cran.microsoft.com/snapshot/2015-06-03/web/packages/eventstudies/vignettes/eventstudies.pdf
Quick-R: https://www.statmethods.net/
R bloggers, https://www.r-bloggers.com/
Stackoverflow, https://stackoverflow.com/
If you have questions or comments about this guide or method, please email data@Princeton.edu.