2.1 Descriptive statistics: whole dataset
2.2 Descriptive statistics: replacing variable names with labels
2.3 Descriptive statistics: selected variables
2.4 Descriptive statistics: selected variables and by group
3.1 Linear Regression output: one model
3.2 Linear Regression output: multiple models
3.3 Logistic regression output: coefficient
As anything with R, there are many ways of exporting output into nice tables (but mostly for LaTeX users). Some packages are: apsrtable, xtable, texreg, memisc, outreg …and counting.
In this guide, we will focus on stargazer. This package offers a very nice, smart, and easy-to-use alternative to non-LaTeX users, in particular, the ability to import editable tables into a Word document. This presentation will show some of the options stargazer offers, the contents are based on the documentation from the package available in the following links:
https://cran.r-project.org/web/packages/stargazer/stargazer.pdf
https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf
The default setting produces LaTeX code, the additional alternatives are:
We can use stargazer package to report important summary statistics.
Getting sample data.
mydata <- mtcars
We use the built-in dataset mtcars in R here.
#install.packages("stargazer") ##if this is a new package we need to install first.
library(stargazer) #We have to library the package every time we use
Now we use the 'stargazer' command.
stargazer(mydata, type = "text", title="Descriptive statistics", digits=1, out="table1.txt")
Descriptive statistics
======================================
Statistic N Mean St. Dev. Min Max
--------------------------------------
mpg 32 20.1 6.0 10.4 33.9
cyl 32 6.2 1.8 4 8
disp 32 230.7 123.9 71.1 472.0
hp 32 146.7 68.6 52 335
drat 32 3.6 0.5 2.8 4.9
wt 32 3.2 1.0 1.5 5.4
qsec 32 17.8 1.8 14.5 22.9
vs 32 0.4 0.5 0 1
am 32 0.4 0.5 0 1
gear 32 3.7 0.7 3 5
carb 32 2.8 1.6 1 8
--------------------------------------
stargazer will automatically recognize the type of object, and will produce the appropriate output. In the case of data frames, it will display summary statistics. Therefore, if we want to create a summary statistics table including all variables in a data frame, we can directly put the data frame name in.
The 'out="table1.txt" is to specify the name of the file we created using stargazer. The table will be saved in your current working directory with whatever name you write in the out option. You can open this file with any word processor. If you are not sure about your current working directory, just type in getwd() in the Console.
We can also use the html format.
stargazer(mydata, type = "html", title="Descriptive statistics", digits=1, out="table1.htm")
Similarly, you can find the html file named 'table1.htm' in your current working directory and you can choose the preferred word processor to make any modification if you want. The output will look like this:
Note that the html output will not show in the Console. You will need to find the htm document in your directory.
We can also use the "flip=TRUE" command if we want the variables in columns.
stargazer(mydata, type = "text", title="Descriptive statistics", digits=1, out="table2.txt", flip=TRUE)
Descriptive statistics
==============================================================
Statistic mpg cyl disp hp drat wt qsec vs am gear carb
--------------------------------------------------------------
N 32 32 32 32 32 32 32 32 32 32 32
Mean 20.1 6.2 230.7 146.7 3.6 3.2 17.8 0.4 0.4 3.7 2.8
St. Dev. 6.0 1.8 123.9 68.6 0.5 1.0 1.8 0.5 0.5 0.7 1.6
Min 10.4 4 71.1 52 2.8 1.5 14.5 0 0 3 1
Max 33.9 8 472.0 335 4.9 5.4 22.9 1 1 5 8
--------------------------------------------------------------
Note that if you use the text format, make sure to create a new name for each table, otherwise it will cause errors. This does not apply to html format. You can use the same name for the tables ,and they will just overwrite the last one.
We may know very well which variable in our dataset means, but the variable names like 'mpg' could be confusing to the readers who are not familiar with our dataset. Therefore, we can replace the variable names with labels to better assist understanding.
stargazer(mydata, type = "text", title="Descriptive statistics", digits=1, out="table3.txt",
covariate.labels=c("Miles/(US)gallon","No. of cylinders","Displacement (cu.in.)",
"Gross horsepower","Rear axle ratio","Weight (lb/1000)",
"1/4 mile time","V/S","Transmission (0=auto, 1=manual)",
"Number of forward gears","Number of carburetors"))
Descriptive statistics
============================================================
Statistic N Mean St. Dev. Min Max
------------------------------------------------------------
Miles/(US)gallon 32 20.1 6.0 10.4 33.9
No. of cylinders 32 6.2 1.8 4 8
Displacement (cu.in.) 32 230.7 123.9 71.1 472.0
Gross horsepower 32 146.7 68.6 52 335
Rear axle ratio 32 3.6 0.5 2.8 4.9
Weight (lb/1000) 32 3.2 1.0 1.5 5.4
1/4 mile time 32 17.8 1.8 14.5 22.9
V/S 32 0.4 0.5 0 1
Transmission (0=auto, 1=manual) 32 0.4 0.5 0 1
Number of forward gears 32 3.7 0.7 3 5
Number of carburetors 32 2.8 1.6 1 8
------------------------------------------------------------
Use the option covariate.labels to replace variable names with variable labels. Must be in same order as in the dataset. If you are not sure about their order, you can create one table without labeling first and then look at the order in that table.
Often we do not want to report the summary statistics of the whole dataset, but only the variables of interest. We can do that in the stargazer package as well.
stargazer(mydata[c("mpg","hp","drat")], type = "text",
title="Descriptive statistics/selected variables", digits=1, out="table4.txt")
Descriptive statistics/selected variables
=====================================
Statistic N Mean St. Dev. Min Max
-------------------------------------
mpg 32 20.1 6.0 10.4 33.9
hp 32 146.7 68.6 52 335
drat 32 3.6 0.5 2.8 4.9
-------------------------------------
We can also create the same output transposed and with labels instead of variable names:
stargazer(mydata[c("mpg","hp","drat")], type = "text", title="Descriptive statistics/selected variables", digits=1, out="table5.txt", flip=TRUE, covariate.labels=c("Miles/(US)gallon","Gross horsepower","Rear axle ratio")) Descriptive statistics/selected variables =========================================================== Statistic Miles/(US)gallon Gross horsepower Rear axle ratio ----------------------------------------------------------- N 32 32 32 Mean 20.1 146.7 3.6 St. Dev. 6.0 68.6 0.5 Min 10.4 52 2.8 Max 33.9 335 4.9 -----------------------------------------------------------
We can also display the summary statistics by group.
First we want descriptive statistics for cars with automatic transmission:
stargazer(subset(mydata[c("mpg","hp","drat")], mydata$am==0),
title="Automatic transmission", type = "text", digits=1, out="table6.txt")
Automatic transmission
=====================================
Statistic N Mean St. Dev. Min Max
-------------------------------------
mpg 19 17.1 3.8 10.4 24.4
hp 19 160.3 53.9 62 245
drat 19 3.3 0.4 2.8 3.9
-------------------------------------
We use the subset() function to select only when the 'am' variable equals to 0.
Then we want descriptive statistics for cars with manual transmission:
stargazer(subset(mydata[c("mpg","hp","drat")], mydata$am==1),
title="Manual transmission", type = "text", digits=1, out="table7.txt")
Manual transmission
=====================================
Statistic N Mean St. Dev. Min Max
-------------------------------------
mpg 13 24.4 6.2 15.0 33.9
hp 13 126.8 84.1 52 335
drat 13 4.0 0.4 3.5 4.9
-------------------------------------
Sometimes we want to report them all into one table. We can do so by creating another data frame.
library(dplyr) library(tidyr) newdata<-mydata %>% select(am,mpg,hp,drat) %>% group_by(am) %>% mutate(id = 1:n()) %>% ungroup() %>% gather(temp, val, mpg, hp,drat) %>% unite(temp1, am, temp, sep = '_') %>% spread(temp1, val) %>% select(-id) %>% as.data.frame()%>% stargazer(type = 'text') =========================================== Statistic N Mean St. Dev. Min Max ------------------------------------------- 0_drat 19 3.286 0.392 2.760 3.920 0_hp 19 160.263 53.908 62 245 0_mpg 19 17.147 3.834 10.400 24.400 1_drat 13 4.050 0.364 3.540 4.930 1_hp 13 126.846 84.062 52 335 1_mpg 13 24.392 6.167 15.000 33.900 -------------------------------------------
Another common usage of stargazer package is to create output tables for regression results.
We first look at how to report the result of one regression model.
m1 <- lm(mpg ~ hp, data=mydata) summary(m1) Call: lm(formula = mpg ~ hp, data = mydata) Residuals: Min 1Q Median 3Q Max -5.7121 -2.1122 -0.8854 1.5819 8.2360 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.09886 1.63392 18.421 < 2e-16 *** hp -0.06823 0.01012 -6.742 1.79e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.863 on 30 degrees of freedom Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892 F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
Now we report this.
stargazer(m1, type="text", out="models.txt") =============================================== Dependent variable: --------------------------- mpg ----------------------------------------------- hp -0.068*** (0.010) Constant 30.099*** (1.634) ----------------------------------------------- Observations 32 R2 0.602 Adjusted R2 0.589 Residual Std. Error 3.863 (df = 30) F Statistic 45.460*** (df = 1; 30) =============================================== Note: *p<0.1; **p<0.05; ***p<0.01
We can see that the table shows the coefficients and its standard error(in the bracket underneath the coefficients). Also, the level of significance is displayed as well, by the number of *. It also shows the observation number, R-square and Adjusted R-square.
Similarly, you can save the output in html format as well.
stargazer(m1, type="html", out="models.htm")
Note again that the html output will not show in the Console. You will need to find the htm document in your directory.
Sometimes we are interested in more than one regression models. It's possible to report their results together in stargazer.
m1 <- lm(mpg ~ hp, data=mydata) m2 <- lm(mpg ~ hp + drat, data=mydata) m3 <- lm(mpg ~ hp + drat + factor(gear), data=mydata) stargazer(m1, m2, m3, type="text", covariate.labels=c("Gross horsepower","Rear axle ratio","Four foward gears", "Five forward gears","Type of transmission (manual=1)"), out="models1.txt") ==================================================================================================== Dependent variable: -------------------------------------------------------------------- mpg (1) (2) (3) ---------------------------------------------------------------------------------------------------- Gross horsepower -0.068*** -0.052*** -0.064*** (0.010) (0.009) (0.011) Rear axle ratio 4.698*** 3.510* (1.192) (1.851) Four foward gears -0.276 (2.135) Five forward gears 3.761* (2.161) Type of transmission (manual=1) 30.099*** 10.790** 16.306** (1.634) (5.078) (6.429) ---------------------------------------------------------------------------------------------------- Observations 32 32 32 R2 0.602 0.741 0.782 Adjusted R2 0.589 0.723 0.749 Residual Std. Error 3.863 (df = 30) 3.170 (df = 29) 3.017 (df = 27) F Statistic 45.460*** (df = 1; 30) 41.522*** (df = 2; 29) 24.179*** (df = 4; 27) ==================================================================================================== Note: *p<0.1; **p<0.05; ***p<0.01
We can also report the result of logistic regressions.
mydata$fast <- as.numeric((mydata$mpg > 20.1)) #creating a new dummy variable
m4 <- glm(fast ~ hp + drat + am, family=binomial(link="logit"), data=mydata)
stargazer(m4, type="text", out="models2.txt")
=============================================
Dependent variable:
---------------------------
fast
---------------------------------------------
hp -0.397
(1.358)
drat 4.248
(21.106)
am 11.743
(359.486)
Constant 29.882
(85.238)
---------------------------------------------
Observations 32
Log Likelihood -1.953
Akaike Inf. Crit. 11.906
=============================================
Note: *p<0.1; **p<0.05; ***p<0.01
It's also possible to report the results along with linear regression models.
We can specify their dependent variables as they don't all have the same.
m1 <- lm(mpg ~ hp, data=mydata)
m2 <- lm(mpg ~ hp + drat, data=mydata)
m3 <- lm(mpg ~ hp + drat + factor(gear), data=mydata)
m4 <- glm(fast ~ hp + drat + am, family=binomial(link="logit"), data=mydata)
stargazer(m1, m2, m3, m4, type="text",
dep.var.labels=c("Miles/(US) gallon","Fast car (=1)"),
covariate.labels=c("Gross horsepower","Rear axle ratio","Four foward gears",
"Five forward gears","Type of transmission (manual=1)"), out="models3.txt")
==================================================================================================================
Dependent variable:
----------------------------------------------------------------------------------
Miles/(US) gallon Fast car (=1)
OLS logistic
(1) (2) (3) (4)
------------------------------------------------------------------------------------------------------------------
Gross horsepower -0.068*** -0.052*** -0.064*** -0.397
(0.010) (0.009) (0.011) (1.358)
Rear axle ratio 4.698*** 3.510* 4.248
(1.192) (1.851) (21.106)
Four foward gears -0.276
(2.135)
Five forward gears 3.761*
(2.161)
Type of transmission (manual=1) 11.743
(359.486)
Constant 30.099*** 10.790** 16.306** 29.882
(1.634) (5.078) (6.429) (85.238)
------------------------------------------------------------------------------------------------------------------
Observations 32 32 32 32
R2 0.602 0.741 0.782
Adjusted R2 0.589 0.723 0.749
Log Likelihood -1.953
Akaike Inf. Crit. 11.906
Residual Std. Error 3.863 (df = 30) 3.170 (df = 29) 3.017 (df = 27)
F Statistic 45.460*** (df = 1; 30) 41.522*** (df = 2; 29) 24.179*** (df = 4; 27)
==================================================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
For logistic regressions, often we not only want to look at the coefficients, but also the odds ratios for better interpretation.
For interpreting you can check: https://libguides.princeton.edu/R-logit
Note that when you apply any function to the coefficients or other statistics, stargazer automatically reevaluates t values using the updated coefficients. Therefore, the significance level will depend on the new value.
One way to avoid this problem is from here: https://cimentadaj.github.io/blog/2016-08-22-producing-stargazer-tables-with-odds-ratios-and-standard-errors-in-r/producing-stargazer-tables-with-odds-ratios-and-standard-errors-in-r/
m5 <- glm(vs ~ hp + mpg, family=binomial(link="logit"), data=mydata) stargazer2 <- function(model, odd.ratio = F, ...) { if(!("list" %in% class(model))) model <- list(model) if (odd.ratio) { coefOR2 <- lapply(model, function(x) exp(coef(x))) seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2]) p2 <- lapply(model, function(x) summary(x)$coefficients[, 4]) stargazer(model, coef = coefOR2, se = seOR2, p = p2, ...) } else { stargazer(model, ...) } } stargazer2(m5, odd.ratio = T, type = "text") ============================================= Dependent variable: --------------------------- vs --------------------------------------------- hp 0.930** (0.032) mpg 0.967 (0.175) Constant 13,783.050 (96,945.480) --------------------------------------------- Observations 32 Log Likelihood -8.401 Akaike Inf. Crit. 22.803 ============================================= Note: *p<0.1; **p<0.05; ***p<0.01
If you have questions or comments about this guide or method, please email data@Princeton.edu.