Skip to Main Content

Using Stargazer to make publication-quality tables: A Hands on R tutorial

Using stargazer to report regression outputs, summary statistics

1. Stargazer Overview

As anything with R, there are many ways of exporting output into nice tables (but mostly for LaTeX users). Some packages are: apsrtable, xtable, texreg, memisc, outreg …and counting.

In this guide, we will focus on stargazer. This package offers a very nice, smart, and easy-to-use alternative to non-LaTeX users, in particular, the ability to import editable tables into a Word document. This presentation will show some of the options stargazer offers, the contents are based on the documentation from the package available in the following links:

https://cran.r-project.org/web/packages/stargazer/stargazer.pdf

https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf

The default setting produces LaTeX code, the additional alternatives are:

  • Output as text, which allows a quick view of results
  • Output as html, which produce editable tables for Word documents.

2. Descriptive statistics

We can use stargazer package to report important summary statistics.

 

2.1 Descriptive statistics: whole dataset

Getting sample data.

mydata <- mtcars

We use the built-in dataset mtcars in R here.

#install.packages("stargazer") ##if this is a new package we need to install first. 
library(stargazer) #We have to library the package every time we use

Now we use the 'stargazer' command.

stargazer(mydata, type = "text", title="Descriptive statistics", digits=1, out="table1.txt")

Descriptive statistics
======================================
Statistic N  Mean  St. Dev. Min   Max 
--------------------------------------
mpg       32 20.1    6.0    10.4 33.9 
cyl       32  6.2    1.8     4     8  
disp      32 230.7  123.9   71.1 472.0
hp        32 146.7   68.6    52   335 
drat      32  3.6    0.5    2.8   4.9 
wt        32  3.2    1.0    1.5   5.4 
qsec      32 17.8    1.8    14.5 22.9 
vs        32  0.4    0.5     0     1  
am        32  0.4    0.5     0     1  
gear      32  3.7    0.7     3     5  
carb      32  2.8    1.6     1     8  
--------------------------------------

stargazer will automatically recognize the type of object, and will produce the appropriate output. In the case of data frames, it will display summary statistics. Therefore, if we want to create a summary statistics table including all variables in a data frame, we can directly put the data frame name in.

The 'out="table1.txt" is to specify the name of the file we created using stargazer. The table will be saved in your current working directory with whatever name you write in the out option. You can open this file with any word processor. If you are not sure about your current working directory, just type in getwd() in the Console.

We can also use the html format.

stargazer(mydata, type = "html", title="Descriptive statistics", digits=1, out="table1.htm")

Similarly, you can find the html file named 'table1.htm' in your current working directory and you can choose the preferred word processor to make any modification if you want. The output will look like this:

Note that the html output will not show in the Console. You will need to find the htm document in your directory.

We can also use the "flip=TRUE" command if we want the variables in columns.

stargazer(mydata, type = "text", title="Descriptive statistics", digits=1, out="table2.txt", flip=TRUE)
Descriptive statistics
==============================================================
Statistic mpg  cyl disp   hp   drat wt  qsec vs  am  gear carb
--------------------------------------------------------------
N          32  32   32    32    32  32   32  32  32   32   32 
Mean      20.1 6.2 230.7 146.7 3.6  3.2 17.8 0.4 0.4 3.7  2.8 
St. Dev.  6.0  1.8 123.9 68.6  0.5  1.0 1.8  0.5 0.5 0.7  1.6 
Min       10.4  4  71.1   52   2.8  1.5 14.5  0   0   3    1  
Max       33.9  8  472.0  335  4.9  5.4 22.9  1   1   5    8  
--------------------------------------------------------------

Note that if you use the text format, make sure to create a new name for each table, otherwise it will cause errors. This does not apply to html format. You can use the same name for the tables ,and they will just overwrite the last one.

2.2 Descriptive statistics: replacing variable names with labels

We may know very well which variable in our dataset means, but the variable names like 'mpg' could be confusing to the readers who are not familiar with our dataset. Therefore, we can replace the variable names with labels to better assist understanding.

stargazer(mydata, type = "text", title="Descriptive statistics", digits=1, out="table3.txt",
          covariate.labels=c("Miles/(US)gallon","No. of cylinders","Displacement (cu.in.)",
                             "Gross horsepower","Rear axle ratio","Weight (lb/1000)",
                             "1/4 mile time","V/S","Transmission (0=auto, 1=manual)",
                             "Number of forward gears","Number of carburetors"))
Descriptive statistics
============================================================
Statistic                       N  Mean  St. Dev. Min   Max 
------------------------------------------------------------
Miles/(US)gallon                32 20.1    6.0    10.4 33.9 
No. of cylinders                32  6.2    1.8     4     8  
Displacement (cu.in.)           32 230.7  123.9   71.1 472.0
Gross horsepower                32 146.7   68.6    52   335 
Rear axle ratio                 32  3.6    0.5    2.8   4.9 
Weight (lb/1000)                32  3.2    1.0    1.5   5.4 
1/4 mile time                   32 17.8    1.8    14.5 22.9 
V/S                             32  0.4    0.5     0     1  
Transmission (0=auto, 1=manual) 32  0.4    0.5     0     1  
Number of forward gears         32  3.7    0.7     3     5  
Number of carburetors           32  2.8    1.6     1     8  
------------------------------------------------------------

Use the option covariate.labels to replace variable names with variable labels. Must be in same order as in the dataset. If you are not sure about their order,  you can create one table without labeling first and then look at the order in that table.

2.3 Descriptive statistics: selected variables

Often we do not want to report the summary statistics of the whole dataset, but only the variables of interest. We can do that in the stargazer package as well. 

stargazer(mydata[c("mpg","hp","drat")], type = "text",
          title="Descriptive statistics/selected variables", digits=1, out="table4.txt")

Descriptive statistics/selected variables
=====================================
Statistic N  Mean  St. Dev. Min  Max 
-------------------------------------
mpg       32 20.1    6.0    10.4 33.9
hp        32 146.7   68.6    52  335 
drat      32  3.6    0.5    2.8  4.9 
-------------------------------------

We can also create the same output transposed and with labels instead of variable names:

stargazer(mydata[c("mpg","hp","drat")], type = "text",
          title="Descriptive statistics/selected variables", digits=1, out="table5.txt", flip=TRUE,
          covariate.labels=c("Miles/(US)gallon","Gross horsepower","Rear axle ratio"))
Descriptive statistics/selected variables
===========================================================
Statistic Miles/(US)gallon Gross horsepower Rear axle ratio
-----------------------------------------------------------
N                32               32              32       
Mean            20.1            146.7             3.6      
St. Dev.        6.0              68.6             0.5      
Min             10.4              52              2.8      
Max             33.9             335              4.9      
-----------------------------------------------------------

2.4 Descriptive statistic: selected variables and by group

We can also display the summary statistics by group.

First we want descriptive statistics for cars with automatic transmission:

stargazer(subset(mydata[c("mpg","hp","drat")], mydata$am==0),
          title="Automatic transmission", type = "text", digits=1, out="table6.txt")
Automatic transmission
=====================================
Statistic N  Mean  St. Dev. Min  Max 
-------------------------------------
mpg       19 17.1    3.8    10.4 24.4
hp        19 160.3   53.9    62  245 
drat      19  3.3    0.4    2.8  3.9 
-------------------------------------

We use the subset() function to select only when the 'am' variable equals to 0.

Then we want descriptive statistics for cars with manual transmission:

stargazer(subset(mydata[c("mpg","hp","drat")], mydata$am==1),
          title="Manual transmission", type = "text", digits=1, out="table7.txt")
Manual transmission
=====================================
Statistic N  Mean  St. Dev. Min  Max 
-------------------------------------
mpg       13 24.4    6.2    15.0 33.9
hp        13 126.8   84.1    52  335 
drat      13  4.0    0.4    3.5  4.9 
-------------------------------------

Sometimes we want to report them all into one table. We can do so by creating another data frame.

library(dplyr)
library(tidyr)
newdata<-mydata %>%
  select(am,mpg,hp,drat) %>%
  group_by(am) %>%
  mutate(id = 1:n()) %>%
  ungroup() %>%
  gather(temp, val, mpg, hp,drat) %>%
  unite(temp1, am, temp, sep = '_') %>%
  spread(temp1, val) %>%
  select(-id) %>%
  as.data.frame()%>%
  stargazer(type = 'text')

===========================================
Statistic N   Mean   St. Dev.  Min    Max  
-------------------------------------------
0_drat    19  3.286   0.392   2.760  3.920 
0_hp      19 160.263  53.908    62    245  
0_mpg     19 17.147   3.834   10.400 24.400
1_drat    13  4.050   0.364   3.540  4.930 
1_hp      13 126.846  84.062    52    335  
1_mpg     13 24.392   6.167   15.000 33.900
-------------------------------------------

 

3. Regression models

Another common usage of stargazer package is to create output tables for regression results.

3.1 Linear Regression output: one model

We first look at how to report the result of one regression model.

m1 <- lm(mpg ~ hp, data=mydata)
summary(m1)
Call:
lm(formula = mpg ~ hp, data = mydata)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,    Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Now we report this.

stargazer(m1, type="text",  out="models.txt")
===============================================
                        Dependent variable:    
                    ---------------------------
                                mpg            
-----------------------------------------------
hp                           -0.068***         
                              (0.010)          
                                               
Constant                     30.099***         
                              (1.634)          
                                               
-----------------------------------------------
Observations                    32             
R2                             0.602           
Adjusted R2                    0.589           
Residual Std. Error       3.863 (df = 30)      
F Statistic           45.460*** (df = 1; 30)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

We can see that the table shows the coefficients and its standard error(in the bracket underneath the coefficients). Also, the level of significance is displayed as well, by the number of *. It also shows the observation number, R-square and Adjusted R-square.

Similarly, you can save the output in html format as well.

stargazer(m1, type="html",  out="models.htm")

Note again that the html output will not show in the Console. You will need to find the htm document in your directory.

3.2 Linear Regression output: multiple models

Sometimes we are interested in more than one regression models. It's possible to report their results together in stargazer.

m1 <- lm(mpg ~ hp, data=mydata)
m2 <- lm(mpg ~ hp + drat, data=mydata)
m3 <- lm(mpg ~ hp + drat + factor(gear), data=mydata)

stargazer(m1, m2, m3, type="text",
          covariate.labels=c("Gross horsepower","Rear axle ratio","Four foward gears",
                             "Five forward gears","Type of transmission (manual=1)"), out="models1.txt")
====================================================================================================
                                                        Dependent variable:                         
                                --------------------------------------------------------------------
                                                                mpg                                 
                                         (1)                    (2)                    (3)          
----------------------------------------------------------------------------------------------------
Gross horsepower                      -0.068***              -0.052***              -0.064***       
                                       (0.010)                (0.009)                (0.011)        
                                                                                                    
Rear axle ratio                                               4.698***                3.510*        
                                                              (1.192)                (1.851)        
                                                                                                    
Four foward gears                                                                     -0.276        
                                                                                     (2.135)        
                                                                                                    
Five forward gears                                                                    3.761*        
                                                                                     (2.161)        
                                                                                                    
Type of transmission (manual=1)       30.099***               10.790**               16.306**       
                                       (1.634)                (5.078)                (6.429)        
                                                                                                    
----------------------------------------------------------------------------------------------------
Observations                              32                     32                     32          
R2                                      0.602                  0.741                  0.782         
Adjusted R2                             0.589                  0.723                  0.749         
Residual Std. Error                3.863 (df = 30)        3.170 (df = 29)        3.017 (df = 27)    
F Statistic                     45.460*** (df = 1; 30) 41.522*** (df = 2; 29) 24.179*** (df = 4; 27)
====================================================================================================
Note:                                                                    *p<0.1; **p<0.05; ***p<0.01

3.3 Logistic regression output: coefficient

We can also report the result of logistic regressions.

mydata$fast <- as.numeric((mydata$mpg > 20.1)) #creating a new dummy variable
m4 <- glm(fast ~ hp + drat + am, family=binomial(link="logit"), data=mydata)
stargazer(m4, type="text",  out="models2.txt")
=============================================
                      Dependent variable:    
                  ---------------------------
                             fast            
---------------------------------------------
hp                          -0.397           
                            (1.358)          
                                             
drat                         4.248           
                           (21.106)          
                                             
am                          11.743           
                           (359.486)         
                                             
Constant                    29.882           
                           (85.238)          
                                             
---------------------------------------------
Observations                  32             
Log Likelihood              -1.953           
Akaike Inf. Crit.           11.906           
=============================================
Note:             *p<0.1; **p<0.05; ***p<0.01

It's also possible to report the results along with linear regression models.

We can specify their dependent variables as they don't all have the same.

m1 <- lm(mpg ~ hp, data=mydata)
m2 <- lm(mpg ~ hp + drat, data=mydata)
m3 <- lm(mpg ~ hp + drat + factor(gear), data=mydata)
m4 <- glm(fast ~ hp + drat + am, family=binomial(link="logit"), data=mydata)
stargazer(m1, m2, m3, m4, type="text",
          dep.var.labels=c("Miles/(US) gallon","Fast car (=1)"),
          covariate.labels=c("Gross horsepower","Rear axle ratio","Four foward gears",
                             "Five forward gears","Type of transmission (manual=1)"), out="models3.txt")
==================================================================================================================
                                                               Dependent variable:                                
                                ----------------------------------------------------------------------------------
                                                         Miles/(US) gallon                               Fast car (=1)     
                                                                OLS                                         logistic       
                                         (1)                    (2)                    (3)                (4)     
------------------------------------------------------------------------------------------------------------------
Gross horsepower                      -0.068***              -0.052***              -0.064***           -0.397    
                                       (0.010)                (0.009)                (0.011)            (1.358)   
                                                                                                                  
Rear axle ratio                                               4.698***                3.510*             4.248    
                                                              (1.192)                (1.851)           (21.106)   
                                                                                                                  
Four foward gears                                                                     -0.276                      
                                                                                     (2.135)                      
                                                                                                                  
Five forward gears                                                                    3.761*                      
                                                                                     (2.161)                      
                                                                                                                  
Type of transmission (manual=1)                                                                         11.743    
                                                                                                       (359.486)  
                                                                                                                  
Constant                              30.099***               10.790**               16.306**           29.882    
                                       (1.634)                (5.078)                (6.429)           (85.238)   
                                                                                                                  
------------------------------------------------------------------------------------------------------------------
Observations                              32                     32                     32                32      
R2                                      0.602                  0.741                  0.782                       
Adjusted R2                             0.589                  0.723                  0.749                       
Log Likelihood                                                                                          -1.953    
Akaike Inf. Crit.                                                                                       11.906    
Residual Std. Error                3.863 (df = 30)        3.170 (df = 29)        3.017 (df = 27)                  
F Statistic                     45.460*** (df = 1; 30) 41.522*** (df = 2; 29) 24.179*** (df = 4; 27)              
==================================================================================================================
Note:                                                                                  *p<0.1; **p<0.05; ***p<0.01

 

3.4 Logistic Regression output: odds-ratio

For logistic regressions, often we not only want to look at the coefficients, but also the odds ratios for better interpretation.

For interpreting you can check: https://libguides.princeton.edu/R-logit

Note that when you apply any function to the coefficients or other statistics, stargazer automatically reevaluates t values using the updated coefficients. Therefore, the significance level will depend on the new value.

One way to avoid this problem is from here: https://cimentadaj.github.io/blog/2016-08-22-producing-stargazer-tables-with-odds-ratios-and-standard-errors-in-r/producing-stargazer-tables-with-odds-ratios-and-standard-errors-in-r/

m5 <- glm(vs ~ hp + mpg, family=binomial(link="logit"), data=mydata)
stargazer2 <- function(model, odd.ratio = F, ...) {
  if(!("list" %in% class(model))) model <- list(model)
  
  if (odd.ratio) {
    coefOR2 <- lapply(model, function(x) exp(coef(x)))
    seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2])
    p2 <- lapply(model, function(x) summary(x)$coefficients[, 4])
    stargazer(model, coef = coefOR2, se = seOR2, p = p2, ...)
    
  } else {
    stargazer(model, ...)
  }
}
stargazer2(m5, odd.ratio = T, type = "text") 

=============================================
                      Dependent variable:    
                  ---------------------------
                              vs             
---------------------------------------------
hp                          0.930**          
                            (0.032)          
                                             
mpg                          0.967           
                            (0.175)          
                                             
Constant                  13,783.050         
                         (96,945.480)        
                                             
---------------------------------------------
Observations                  32             
Log Likelihood              -8.401           
Akaike Inf. Crit.           22.803           
=============================================
Note:             *p<0.1; **p<0.05; ***p<0.01

 

Reference list / Useful links

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051