Research Guides: Cubic Interpolation in R: Hands on R tutorial

Introduction

Understanding trends and making predictions based on historical data is a big part of economic data analysis. Often we have yearly or quarterly data available, while the more detailed trends within those periods can be concealed. This guide provides a comprehensive approach to address this limitation by using cubic interpolation.

Cubic interpolation uses cubic polynomials to estimate unknown data points within the known data range. For example, by converting quarterly economic data into a daily format, analysts and researchers can uncover detailed patterns and improve the accuracy of their economic forecasts. This guide uses R, a powerful tool for data manipulation and statistical analysis, to perform these transformations and interpolations.

Cubic Interpolation

Getting sample data.

mydata<-read.csv('https://dss.princeton.edu/training/cubic_interpolation.csv')

The data is from IMF database: https://data.imf.org/regular.aspx?key=61545849

View(mydata)

Then we need to transform the date variable ‘quarterly’, it's a factor/string now.

Note that we first need to modify the string so that it can be converted to date variable. Here, we need to change the capitalized 'Q' to a lowercase 'q' .

library(zoo)
# first change to lower case
mydata$date <- gsub("Q", "q", mydata$date)
mydata$quarter = as.yearqtr(mydata$date,format="%Yq%q")
# Quarter in year-month-day format
mydata$qvar = as.Date(mydata$quarter)

Now we can see that we successfully created the quarter variable.

Then we can create a daily sequence for the quarterly range.

daily = seq(mydata$qvar[1], tail(mydata$qvar,1), by="day")
daily
[1] "2000-01-01" "2000-01-02" "2000-01-03" "2000-01-04" "2000-01-05" "2000-01-06"
[7] "2000-01-07" "2000-01-08" "2000-01-09" "2000-01-10" "2000-01-11" "2000-01-12"
......

Now we use the gdp variable as an example.

# Getting variable of interest
gdp = mydata[c("qvar","GDP")]
gdp$GDP<-as.numeric(gdp$GDP) # Note that remember to check whether it's numeric.
View(gdp)

For GDP data, we only have quarterly. Now we can do the cubic interpolation.

# Cubic interpolation using spline()
gdp2 = data.frame(qvar=daily, gdp2=spline(gdp, method="natural", xout=daily)$y)

mydata2 = merge(gdp, gdp2, by="qvar", all=TRUE)

We can see now that the interpolation filled the daily sequence with values.

From plotting the original and interpolated value, we can see how it fits.

library(ggplot2)
ggplot() +
  geom_line(data = gdp, aes(x = qvar, y = GDP), color = "blue") +
  geom_line(data = gdp2, aes(x = qvar, y = gdp2), color = "red",alpha=0.8) +
  labs(title = "Comparison of Original and Interpolated GDP Data",
       x = "Date",
       y = "GDP")

We can see that it captures the main trend and visually they fit very well.

Limitations

Cubic interpolation is a powerful statistical tool for estimating values between known data points, using cubic polynomials. It can offer a smooth, continuous curve and can be highly beneficial for filling gaps in data. However, it also comes with certain risks and limitations that can lead to misleading results if not applied carefully.

Data structure

Cubic interpolation works best with smoothly varying data. If the data exhibits sharp discontinuities or rapid changes, cubic interpolation may be problematic.

Data Range

Cubic interpolation should only be used for interpolation, which is estimating values within the range of the data points. If we use the method to predict values outside the range of the data, it can be very unreliable.

Data Density

The accuracy of cubic interpolation is dependent on having a sufficient number of data points. If the data points are too sparse, it may fit poorly between points.

Outliers and Missing Data

Outliers and missing values can significantly distort the results of cubic interpolation. Consider cleaning the data by removing outliers or imputing missing values before performing interpolation.