Some times files are saved in different csv but they have identical structures within each file. We may be interested in applying same R code to multiple files.
The following code applies the same program to multiple *.csv files and produces one file containing all files by appending them. All files must have the variables with the same spelling and same order.
For the example, download these three *.csv files available in these links:
https://dss.princeton.edu/training/US2.csv
https://dss.princeton.edu/training/UK.csv
https://dss.princeton.edu/training/Mexico.csv
Make sure to save all files in the same folder.
Source of data is World Development Indicators from the World Bank.
# Set working directory to where the *.csv files are saved # Go to Session -> Set Working directory -> Choose directory # or type and run: setwd("write path here")
Now we make a list of all *.csv files in the folder and make sure only the ones you need are in the working directory.
files <- list.files(path = ".", pattern = ".csv")
files
[1] "Mexico.csv" "UK.csv" "US2.csv"
Create a reference data frame with the structure of the final version of your data. Read one file first, then leave the first row as placeholder. If the final version is different from the original # individual files, then create an empty data frame with the expected final structure of the data.
mydata = read.csv(files[1], header = TRUE, stringsAsFactors = FALSE)
mydata = data.frame(mydata[1,]) # Keep first row as placeholder
mydata
Year Country Countrycode GDPpc Unempm Unempf Unempt Exports Imports
1 1997 Mexico MEX 5289.168 3.12 6.32 4.24 1.21765e+11 1.22323e+11
# Process each datafile, then append all files into one .This loop uses smartbind() from library -gtools
library(gtools)
for (record in files) {
# Mandatory line
temp <- read.csv(record, header = TRUE, stringsAsFactors = FALSE)
temp$trade =temp$Exports + temp$Imports # Your code from here
mydata = smartbind(mydata, temp) # Mandatory last line
}
Make sure to change ‘temp’ with the last version in the last mandatory line, only if you created different versions. Also make sure to drop the first line used as placeholder.
mydata <- mydata[-1, ]
View(mydata)
Now we may look at another example of writing loop. We want to apply the same regression across different subgroups (like country).
First we download the sample datasets from:
http://dss.princeton.edu/training/loop_subgroup.csv
loop<-read.csv("http://dss.princeton.edu/training/loop_subgroup.csv")
Then we create a unique id per group. In this example per country.
library(plyr) #note loop$id <- id(loop[c("country")], drop = TRUE)
Note: there is another widely used package called 'dplyr'. If you introduce this package into the R environment and then introduce the 'plyr' package, it may cause trouble. If that is the case you may want to restart R before "library(plyr)".
Then we run the same regression per group.
b <- lapply(1:max(unique(loop$id)), function(i) { reg <- with(subset(loop, loop$id==i), lm(unempt ~ acc)) reg$coefficients })
We now create a data frame consists of the betas.
betas <- as.data.frame(do.call(rbind,b))
Now we have a regression table. Note the variable "acc" means the estimate of voice and accountability of a country.
betas$id = rownames(betas) head(betas) (Intercept) acc id 1 15.086145 -5.639099 1 2 -18.909522 -35.490677 2 3 -108.496407 -102.750321 3 4 11.872425 -3.519227 4 5 11.902054 -7.906690 5 6 -7.265655 13.325477 6
Source of data:
Worldwide Development indicators: https://databank.worldbank.org/source/world-development-indicators#
Worldwide Governance Indicators: https://databank.worldbank.org/reports.aspx?source=worldwide-governance-indicators#
DSS Online Training Section https://dss.princeton.edu/training/
Princeton DSS Libguides https://libguides.princeton.edu/dss
John Fox's site https://socialsciences.mcmaster.ca/jfox/
Quick-R https://www.statmethods.net/
UCLA Resources https://stats.oarc.ucla.edu/r/
If you have questions or comments about this guide or method, please email data@Princeton.edu.