A dataset can be written in two different formats: wide and long.
A wide format has values that do not repeat in the first column.
A long format has values that do repeat in the first column.
Example of wide format:
Student | Math | Literature | PE |
A | 99 | 45 | 56 |
B | 73 | 78 | 55 |
C | 12 | 96 | 57 |
Example of long format:
Student | Subject | Score |
A | Math | 99 |
A | Literature | 45 |
A | PE | 56 |
B | Math | 73 |
B | Literature | 78 |
B | PE | 55 |
C | Math | 12 |
C | Literature | 96 |
C | PE | 57 |
We can see that in the wide format there is no repetitive value in the first column.
Sometimes when we download the datasets of interest from the website, they are not necessarily ready for statistical analysis. Thus, we will see how to transform between these two formats in R.
First we load the data.
rw<-read.csv("https://dss.princeton.edu/training/widetolong.csv")
This data is in the wide format.
Now we reshape this dataset.
data1= reshape(data = rw, idvar= "Country.Name", varying = 2:11, #We need to specify here the columns to be reshaped sep= "", timevar= "year", times = c(2017,2018,2019,2020,2021), new.row.names= 1:10000, direction = "long")
Now we can see it's in the long format now.
Now we load this dataset in the long format.
rl<-read.csv("https://dss.princeton.edu/training/longtowide.csv")
#data source: World Bank (WDI) 2007-2021
rl.wide= reshape(data = rl, idvar= "year", v.names= c("GDP"), timevar= "country", direction = "wide")
Now we can see that it has been transformed into a wide format.
If you have questions or comments about this guide or method, please email data@Princeton.edu.