Skip to Main Content

Exploring Data using Stata: Descriptive Statistics

This tutorial provides instructions on exploring the basic features of data and conducting preliminary analysis using Stata.

Descriptive Statistics

1. Introduction

This guide discusses techniques to explore data using Stata. To explore data, we usually need to know about the format of the variables, summary statistics, crosstab, frequency, etc. We will provide Stata command to do all of this exploration. We will use built-in Stata data throughout this guide, which we can get by typing the following codes in the Stata command window:

sysuse nlsw88, clear

Note: To practice the commands using your data, you have to open your data from your working directory. You can do it using the point and click technique. For instance, to open a Stata dataset, which is stored as a .dta file, click on

File Open your .dta file 

2. Descriptive Statistics (describe)

Descriptive statistics is vital to understanding the nature of your data. It provides a basic description of your data and allows you to explore the formats ("display format") of the variables. We will use the describe command to get descriptive statistics. 

We will explore descriptive statistics of dataset nlsw88 provided by Stata with the package. 

First, get the dataset by typing:

sysuse nlsw88, clear

Note: for your data, open it from your working directory by clicking File → Open → your .dta file

In the command window, type:

describe

Stata will give us the following description table.

Contains data from C:\Program Files\Stata18\ado\base/n/nlsw88.dta
 Observations:         2,246                  NLSW, 1988 extract
    Variables:            17                  1 May 2022 22:52
                                              (_dta has notes)
-----------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-----------------------------------------------------------------------------------------------------------
idcode          int     %8.0g                 NLS ID
age             byte    %8.0g                 Age in current year
race            byte    %8.0g      racelbl    Race
married         byte    %8.0g      marlbl     Married
never_married   byte    %16.0g     nev_mar    Never married
grade           byte    %8.0g                 Current grade completed
collgrad        byte    %16.0g     gradlbl    College graduate
south           byte    %9.0g      southlbl   Lives in the south
smsa            byte    %9.0g      smsalbl    Lives in SMSA
c_city          byte    %16.0g     ccitylbl   Lives in a central city
industry        byte    %23.0g     indlbl     Industry
occupation      byte    %22.0g     occlbl     Occupation
union           byte    %8.0g      unionlbl   Union worker
wage            float   %9.0g                 Hourly wage
hours           byte    %8.0g                 Usual hours worked
ttl_exp         float   %9.0g                 Total work experience (years)
tenure          float   %9.0g                 Job tenure (years)
-----------------------------------------------------------------------------------------------------------
Sorted by: idcode

Interpretation

  • The above table provides a summary of the entire dataset. For instance, Variables: 17 indicates that there are 17 variables in the dataset.
  • Storage type helps us understanding the data type of a variable. Variables in Stata can have different data types such as int, byte, float, double, etc. int indicates that the variable is of integer data type (an integer is a whole number that does not have a decimal or fractional component). byte data type indicates that the variable is stored as a whole number within a limited range (they are typically used when you have categorical or ordinal variables with a small number of distinct values).  float indicates that the variable is stored as floating-point numbers with decimals. Str indicates that the variable is stored as string (text). The double data type refers to a variable that stores numeric values as both whole numbers and fractional values with a high degree of precision.
  • Display format refers to the format in which numeric values are displayed when we view or print the data in Stata. For instance, %9.0g in the wage variable indicates that when you view or print the "wage" values, they will be displayed with a total width of 9 characters. The "%g" indicates that it's a general format, , which is a flexible format that displays values in a compact and easy-to-read manner.

3. Summary Statistics (summarize)

We will use the summarize command in Stata to get the basic summary statics of the variables. If you use the latest versions of Stata, you can use su instead of summarize. Summarize provides basic statistics of your data and helps us understand the essential characteristics of the variables.

We will check the summary statistics for the nlsw88 dataset provided by Stata built into the package. 

First, get the dataset by typing the following codes in the command window:

sysuse nlsw88, clear

Note: for your data, open it from your working directory by clicking File → Open → your .dta file

In the command window, type:

su

Stata will give us the following summary table.

. su
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      idcode |      2,246    2612.654    1480.864          1       5159
         age |      2,246    39.15316    3.060002         34         46
        race |      2,246    1.282725    .4754413          1          3
     married |      2,246    .6420303    .4795099          0          1
never_marr~d |      2,246    .1041852    .3055687          0          1
-------------+---------------------------------------------------------
       grade |      2,244    13.09893    2.521246          0         18
    collgrad |      2,246    .2368655    .4252538          0          1
       south |      2,246    .4194123    .4935728          0          1
        smsa |      2,246    .7039181    .4566292          0          1
      c_city |      2,246    .2916296    .4546139          0          1
-------------+---------------------------------------------------------
    industry |      2,232    8.189516    3.010875          1         12
  occupation |      2,237    4.642825    3.408897          1         13
       union |      1,878    .2454739    .4304825          0          1
        wage |      2,246    7.766949    5.755523   1.004952   40.74659
       hours |      2,242    37.21811    10.50914          1         80
-------------+---------------------------------------------------------
     ttl_exp |      2,246    12.53498    4.610208   .1153846   28.88461
      tenure |      2,231     5.97785    5.510331          0   25.91667

 

Interpretation

  • The above table reports number of observations, mean value, standard deviation, the minimum value, and the maximum value of each of the variables in the dataset. 
  • Let's interpret the summary stat of the second variable age. The mean age of the individuals in the dataset is 39.2 and the standard deviation of the variable is 3.06. The minimum age of the individuals in the dataset is 34 and the maximum age is 46. 
  • married looks like a dummy variable as the minimum value of it is 0 and the maximum value is 1. If 1 is defined as married in the codebook, we can say 64% of the people are married as the mean value of the variable is 0.6420303. 

 

The "summarize, detail" command is useful for getting a comprehensive overview of the statistical properties and distribution of each variable in the dataset, For getting the detail set of summary statistics for each of the variables in the dataset, type:

su, detail

Stata will give us the following detail summary statistics .

Note: we reported the detailed summary stat for the first three variables in the dataset below.

. su, detail
                           NLS ID
-------------------------------------------------------------
      Percentiles      Smallest
 1%           45              1
 5%          269              2
10%          521              3       Obs               2,246
25%         1366              4       Sum of wgt.       2,246
50%         2614                      Mean           2612.654
                        Largest       Std. dev.      1480.864
75%         3903           5154
90%         4651           5156       Variance        2192957
95%         4925           5157       Skewness      -.0232308
99%         5104           5159       Kurtosis       1.828053
                     Age in current year
-------------------------------------------------------------
      Percentiles      Smallest
 1%           34             34
 5%           35             34
10%           35             34       Obs               2,246
25%           36             34       Sum of wgt.       2,246
50%           39                      Mean           39.15316
                        Largest       Std. dev.      3.060002
75%           42             45
90%           44             45       Variance       9.363614
95%           44             46       Skewness       .2003234
99%           45             46       Kurtosis       1.932389
                            Race
-------------------------------------------------------------
      Percentiles      Smallest
 1%            1              1
 5%            1              1
10%            1              1       Obs               2,246
25%            1              1       Sum of wgt.       2,246
50%            1                      Mean           1.282725
                        Largest       Std. dev.      .4754413
75%            2              3
90%            2              3       Variance       .2260444
95%            2              3       Skewness       1.284394
99%            3              3       Kurtosis       3.409155
 

Interpretation

  • Let's interpret the detail sum stat for the Age variable from the above table. The percentile column displays several percentiles (e.g., p10, p25, p50, p75, p90), which are points in the data distribution that divide the data into specified percentages. p50 is the median, which represents the middle value of 'age' when the data is sorted, which is 39. Mean, Variance, and Std. Dev. indicates the mean value, variance, and the standard deviation of the variable, which is 39.2, 9.4, and 3.1 respectively for the 'age' variable. The value of Skewness is .20 which suggests that the data may have a slight positive skew.

 

We can get the summary statistics for a particular variable with a condition. For instance, if we want to get the summary statistics of the variable 'age' if the person is 'married', type

su age if married == 1

Stata will give us the following table.


. su age if married == 1
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         age |      1,442     39.1165    3.066058         34         45
 

The above results indicate that the average age of the married people in the dataset is 39.12, and the minimum age of a married person in the database is 34.


If we want the summary statistics for a set of variables, we will have to type su and then mention the name of the set of variables. For instance, to get the summary statistics for the variables 'age', 'race', 'occupation', 'union, and 'wage' type:

su age race occupation union wage

Stata will give us the following table:

. su age race occupation union wage
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         age |      2,246    39.15316    3.060002         34         46
        race |      2,246    1.282725    .4754413          1          3
  occupation |      2,237    4.642825    3.408897          1         13
       union |      1,878    .2454739    .4304825          0          1
        wage |      2,246    7.766949    5.755523   1.004952   40.74659

 

 

If we want the summary statistics for a range of variables, such as 'idcode' to 'married', type: 

su idcode-married

Stata will give us the following table

. su idcode-married
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      idcode |      2,246    2612.654    1480.864          1       5159
         age |      2,246    39.15316    3.060002         34         46
        race |      2,246    1.282725    .4754413          1          3
     married |      2,246    .6420303    .4795099          0          1

 

 

Grouped summary statistics: We can get the summary statistics separately for different groups within a variable. For instance, if we want the summary statistics of 'grade' and 'wage' variable for each group of the 'occupation' variable, we will have to type :

by occupation , sort: summarize grade wage

Stata will give us the following table: (note: we reported the partial table below)

. by occupation , sort: summarize grade wage 
-----------------------------------------------------------------------------------------------------------
-> occupation = Professional/Technical
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       grade |        317    14.77603    2.132552          8         18
        wage |        317    10.72362    6.351074   1.032247   40.19808
-----------------------------------------------------------------------------------------------------------
-> occupation = Managers/Admin
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       grade |        263     13.4943    2.283146          7         18
        wage |        264    10.89978    7.521588   2.415459   40.19808
-----------------------------------------------------------------------------------------------------------
-> occupation = Sales
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       grade |        726    12.78237    1.777101          0         18
        wage |        726    7.154489    5.042757   1.571983   40.74659

4. Codebook of the Variables (codebook)

The codebook command in Stata is a valuable tool to get detailed information about the variables in a dataset. It provides information on variable names, value labels, data types, summary statistics, and other relevant details. Follow the following steps to apply the codebook command.

First, get the dataset by typing the following codes in the command window:

sysuse nlsw88, clear

To generate a codebook for a specific variable "wage," type:

codebook wage

Stata will give us the following outputs

. codebook wage
--------------------------------------------------------------------------------
wage                                                  Hourly wage
--------------------------------------------------------------------------------
                  Type: Numeric (float)
                 Range: [1.0049518,40.74659]          Units: 1.000e-07
         Unique values: 967                       Missing .: 0/2,246
                  Mean: 7.76695
             Std. dev.: 5.75552
           Percentiles:     10%       25%       50%       75%       90%
                        3.22061   4.25926   6.27227   9.59742   12.7778

Notice that the above table gives us the basic summary statistics (e.g., mean, std. dev., range) of the variable "wage" along with number of missing values and percentiles.

 

Conditional codebook: To generate a codebook for the variable "wage" if the person is from "south", type:

codebook wage if south == 1

Stata will give us the following outputs

. codebook wage if south == 1
--------------------------------------------------------------------------------
wage                                                         Hourly wage
--------------------------------------------------------------------------------
                  Type: Numeric (float)
                 Range: [1.1513683,40.74659]          Units: 1.000e-07
         Unique values: 535                       Missing .: 0/942
                  Mean: 6.88748
             Std. dev.: 5.28311
           Percentiles:     10%       25%       50%       75%       90%
                        2.89855   3.94525    5.5475   8.05153   11.4171

 

Compact codebook: this command provides us a compact summary statistics of the variables in a dataset. To get it, type:

codebook, compact

Stata will give us the following outputs:

. codebook, compact
Variable       Obs Unique      Mean       Min       Max  Label
--------------------------------------------------------------------------------
idcode        2246   2246  2612.654         1      5159  NLS ID
age           2246     13  39.15316        34        46  Age in current year
race          2246      3  1.282725         1         3  Race
married       2246      2  .6420303         0         1  Married
never_marr~d  2246      2  .1041852         0         1  Never married
grade         2244     16  13.09893         0        18  Current grade completed
collgrad      2246      2  .2368655         0         1  College graduate
south         2246      2  .4194123         0         1  Lives in the south
smsa          2246      2  .7039181         0         1  Lives in SMSA
c_city        2246      2  .2916296         0         1  Lives in a central city
industry      2232     12  8.189516         1        12  Industry
occupation    2237     13  4.642825         1        13  Occupation
union         1878      2  .2454739         0         1  Union worker
wage          2246    967  7.766949  1.004952  40.74659  Hourly wage
hours         2242     62  37.21811         1        80  Usual hours worked
ttl_exp       2246   1546  12.53498  .1153846  28.88461  Total work experience (years)
tenure        2231    259   5.97785         0  25.91667  Job tenure (years)
--------------------------------------------------------------------------------

5. Frequency Tables (tab, tab1, crosstab)

We use the tab command to produce frequency tables in Stata. We will show different ways to create frequency tables in Stata.

First, get the dataset by typing the following codes in the command window:

sysuse nlsw88, clear

To create a frequency table for a single categorical variable "married" type:

tab married

Stata will give us the following frequency table.

. tab married
    Married |      Freq.     Percent        Cum.
------------+-----------------------------------
     Single |        804       35.80       35.80
    Married |      1,442       64.20      100.00
------------+-----------------------------------
      Total |      2,246      100.00

Interpretation

  • The above table indicates that 804 people in the dataset are single, which is 35.80% of total observation. Similarly, the table also indicates that 1,442 people in the dataset are married, which is 64.20% of total observation.

To get the numerical values of the above categorical variable rather than value labels, type:

tab married, nolabel

Stata will give us the following table:

. tab married, nolabel
    Married |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        804       35.80       35.80
          1 |      1,442       64.20      100.00
------------+-----------------------------------
      Total |      2,246      100.00

In the above table, o indicates single and 1 indicates married. 

To create a frequency table with one-way bar plots for a single categorical variable "occupation" type:

tab occupation, plot sort

Stata will give us the following output table:

. tab occupation , plot sort
            Occupation |      Freq.
-----------------------+------------+------------------------------------------
                 Sales |        726 |******************************************
Professional/Technical |        317 |******************
              Laborers |        286 |*****************
        Managers/Admin |        264 |***************
            Operatives |        246 |**************
                 Other |        187 |***********
    Clerical/Unskilled |        102 |******
             Craftsmen |         53 |***
             Transport |         28 |**
               Service |         16 |*
         Farm laborers |          9 |*
     Household workers |          2 |
               Farmers |          1 |
-----------------------+------------+------------------------------------------
                 Total |      2,237 

 

tab1: If we want to get frequencies for more than one variable (e.g., "married", "south", and "race") at the same time we will have to run the following command:

tab1 married south race

Stata will give us the following frequency outputs.

. tab1 married south race
-> tabulation of married  
    Married |      Freq.     Percent        Cum.
------------+-----------------------------------
     Single |        804       35.80       35.80
    Married |      1,442       64.20      100.00
------------+-----------------------------------
      Total |      2,246      100.00
-> tabulation of south  
   Lives in |
  the south |      Freq.     Percent        Cum.
------------+-----------------------------------
  Not south |      1,304       58.06       58.06
      South |        942       41.94      100.00
------------+-----------------------------------
      Total |      2,246      100.00
-> tabulation of race  
       Race |      Freq.     Percent        Cum.
------------+-----------------------------------
      White |      1,637       72.89       72.89
      Black |        583       25.96       98.84
      Other |         26        1.16      100.00
------------+-----------------------------------
      Total |      2,246      100.00

The above table provides frequency table for each of the variables we assigned in the Stata codes.

 

Crosstab: Crosstabulation is useful if we want to get the common distribution of two variables in a dataset. To get the crosstabulation of the categorical variables "race" and "collgrad", type

tab collgrad race

Stata will give us the following table.

. tab collgrad race
                 |               Race
College graduate |     White      Black      Other |     Total
-----------------+---------------------------------+----------
Not college grad |     1,217        480         17 |     1,714 
    College grad |       420        103          9 |       532 
-----------------+---------------------------------+----------
           Total |     1,637        583         26 |     2,246 

The above table indicates that out of total 532 collage graduated people 420 are White, 103 are Black, and 9 are from other races. 

If we want to get percentage of college graduates instead of counts, type:

tab collgrad race, row nofreq

Stata will give us the following table.

. tab collgrad race, row nofreq
                 |               Race
College graduate |     White      Black      Other |     Total
-----------------+---------------------------------+----------
Not college grad |     71.00      28.00       0.99 |    100.00 
    College grad |     78.95      19.36       1.69 |    100.00 
-----------------+---------------------------------+----------
           Total |     72.89      25.96       1.16 |    100.00 

 

From the above table we can say 78.95% of total collage graduates are White, 19.36% are Black, and 1.69% are from other races.

If we want row percentage in addition to the counts in the above table, type:

tab collgrad race, row

Stata will give us the following table.

. tab collgrad race, row
+----------------+
| Key            |
|----------------|
|   frequency    |
| row percentage |
+----------------+
                 |               Race
College graduate |     White      Black      Other |     Total
-----------------+---------------------------------+----------
Not college grad |     1,217        480         17 |     1,714 
                 |     71.00      28.00       0.99 |    100.00 
-----------------+---------------------------------+----------
    College grad |       420        103          9 |       532 
                 |     78.95      19.36       1.69 |    100.00 
-----------------+---------------------------------+----------
           Total |     1,637        583         26 |     2,246 
                 |     72.89      25.96       1.16 |    100.00 

Interpretation

  • The above table displays both counts and percentages of the respective categorical variables. For instance, the table shows that among the total non college graduate people  1,217 are white, and this consists of 71% of total non college graduate individuals.

Notice that the above table reports percentage for rows. If we want to get the column percentage instead of rows type: 

tab collgrad race, column

Stata will give us the following table:

. tab collgrad race, col
+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+
                 |               Race
College graduate |     White      Black      Other |     Total
-----------------+---------------------------------+----------
Not college grad |     1,217        480         17 |     1,714 
                 |     74.34      82.33      65.38 |     76.31 
-----------------+---------------------------------+----------
    College grad |       420        103          9 |       532 
                 |     25.66      17.67      34.62 |     23.69 
-----------------+---------------------------------+----------
           Total |     1,637        583         26 |     2,246 
                 |    100.00     100.00     100.00 |    100.00 

Interpretation

  • The above table indicates that 74.34% of total White people are not college graduates, contrary to 25.66% of college graduates. Similarly, 82.33% of total Black people are not college graduates compared to 17.67% of college graduates. 

If we want to get both column and row percentage in the same table, type:

tab collgrad race, col row

Stata will give us the following table:

. tab collgrad race, col row
+-------------------+
| Key               |
|-------------------|
|     frequency     |
|  row percentage   |
| column percentage |
+-------------------+
                 |               Race
College graduate |     White      Black      Other |     Total
-----------------+---------------------------------+----------
Not college grad |     1,217        480         17 |     1,714 
                 |     71.00      28.00       0.99 |    100.00 
                 |     74.34      82.33      65.38 |     76.31 
-----------------+---------------------------------+----------
    College grad |       420        103          9 |       532 
                 |     78.95      19.36       1.69 |    100.00 
                 |     25.66      17.67      34.62 |     23.69 
-----------------+---------------------------------+----------
           Total |     1,637        583         26 |     2,246 
                 |     72.89      25.96       1.16 |    100.00 
                 |    100.00     100.00     100.00 |    100.00 

Interpretation

  • The above table, the first numeric entry for each pair of variable represents frequency, the second one represents row percentage, and the third one represents column percentage.
  • For example, 1,217 indicates that we have 1,217 White non college graduate individuals in the dataset. 71.00 indicates that 71% of total non college graduates are White. 74.34 indicates that 74.34% of total White people in the dataset are not college graduate.

6. Customized Tables (tabstat)

tabstat is another command that provides summary statistics. Let's see how we can use this command to explore a dataset using our nlsw88 data.

First, get the data by typing:

sysuse nlsw88, clear

To get the tabstat, type the command name (tabstat) followed by the variable names and an argument (s) specifying the statistics we want to check. For instance, if we want the summary statistics for a list of variables - "age," "married," "collgrad," "south," "c_city," "union," and "wage" type:

tabstat age married collgrad south c_city union wage, s(mean semean median sd var skew k count sum range min max)

Stata will give us the following table:

   Stats |       age   married  collgrad     south    c_city     union      wage
---------+----------------------------------------------------------------------
    Mean |  39.15316  .6420303  .2368655  .4194123  .2916296  .2454739  7.766949
se(mean) |  .0645679   .010118  .0089731  .0104147  .0095926  .0099336  .1214451
     p50 |        39         1         0         0         0         0   6.27227
      SD |  3.060002  .4795099  .4252538  .4935728  .4546139  .4304825  5.755523
Variance |  9.363614  .2299298  .1808408  .2436141  .2066738  .1853151  33.12604
Skewness |  .2003234 -.5925296  1.237816  .3266212  .9168961   1.18283  3.096199
Kurtosis |  1.932389  1.351091   2.53219  1.106681  1.840698  2.399088  15.85446
       N |      2246      2246      2246      2246      2246      1878      2246
     Sum |     87938      1442       532       942       655       461  17444.57
   Range |        12         1         1         1         1         1  39.74164
     Min |        34         0         0         0         0         0  1.004952
     Max |        46         1         1         1         1         1  40.74659
--------------------------------------------------------------------------------

Interpretation

  • The above table displays the basic summary statistics (mean, median, standard, deviation, variance, skewness, etc.) for the selected variables we listed in the command.
  • For instance, the highlighted 39.2 indicates that the average age of the individuals in the dataset is 39.2 years. 
  • The se(mean) indicates the standard error of the mean. The standard error of the mean provides an estimate of how much the sample mean is likely to vary from the true population mean. The highlighted se(mean) value of 0.645679 for "age" indicates that the sample mean of age does not vary that much around the population mean.
  • The highlighted p50 value of 39 for "age" indicates the median age of the individuals in the dataset is 39. The SD = 3.06 indicates the standard deviation of the age variable. The Range = 12 indicates the difference between the minimum and maximum value of age of the individuals in the dataset.

If you are interested in getting the above statistics by "race" just add the option by(race) after the comma. For instance,

tabstat age married collgrad south c_city union wage, by (race) s(mean semean median sd var skew k count sum range min max)

Stata will give us the following table:

  race |       age   married  collgrad     south    c_city     union      wage
-------+----------------------------------------------------------------------
 White |  39.27245  .7025046  .2565669  .3457544  .2211362  .2232077  8.082999
       |  .0760678  .0113025  .0107977  .0117588  .0102605  .0113245  .1471846
       |        39         1         0         0         0         0  6.545891
       |  3.077691   .457296  .4368717  .4757589  .4151389  .4165504  5.955069
       |  9.472181  .2091196  .1908569  .2263466  .1723403  .1735143  35.46285
       |  .1514898 -.8859315  1.114778  .6486171  1.343883  1.329465   3.00474
       |  1.919862  1.784875   2.24273  1.420704  2.806021  2.767478  14.74577
       |      1637      1637      1637      1637      1637      1353      1637
       |     64289      1150       420       566       362       302  13231.87
       |        12         1         1         1         1         1  39.19313
       |        34         0         0         0         0         0  1.004952
       |        46         1         1         1         1         1  40.19808
-------+----------------------------------------------------------------------
 Black |  38.81132  .4699828  .1766724  .6397942   .490566  .3013972  6.844558
       |  .1234292  .0206883  .0158092  .0198991   .020722  .0205211  .2102342
       |        38         0         0         1         0         0  5.434783
       |  2.980246  .4995268  .3817187  .4804722  .5003403  .4593235  5.076187
       |  8.881865   .249527  .1457092  .2308536  .2503404   .210978  25.76767
       |  .3449691  .1202856  1.695517 -.5824029  .0377426  .8656266  3.516731
       |  2.029956  1.014469  3.874778  1.339193  1.001425  1.749309  21.15914
       |       583       583       583       583       583       501       583
       |     22627       274       103       373       286       151  3990.377
       |        11         1         1         1         1         1  39.59522
       |        34         0         0         0         0         0  1.151368
       |        45         1         1         1         1         1  40.74659
-------+----------------------------------------------------------------------
 Other |  39.30769  .6923077  .3461538  .1153846  .2692308  .3333333  8.550781
       |  .6367447  .0923077  .0951486  .0638971   .088712  .0982946  1.021653
       |        39         1         0         0         0         0  7.560383
       |  3.246774  .4706787  .4851645  .3258126  .4523443  .4815434   5.20943
       |  10.54154  .2215385  .2353846  .1061538  .2046154  .2318841  27.13816
       |  .0047392 -.8333333  .6467617  2.407717  1.040532  .7071068  1.428553
       |  1.622899  1.694444  1.418301  6.797101  2.082707       1.5  5.799663
       |        26        26        26        26        26        24        26
       |      1022        18         9         3         7         8  222.3203
       |        10         1         1         1         1         1  23.99913
       |        34         0         0         0         0         0   1.80602
       |        44         1         1         1         1         1  25.80515
-------+----------------------------------------------------------------------
 Total |  39.15316  .6420303  .2368655  .4194123  .2916296  .2454739  7.766949
       |  .0645679   .010118  .0089731  .0104147  .0095926  .0099336  .1214451
       |        39         1         0         0         0         0   6.27227
       |  3.060002  .4795099  .4252538  .4935728  .4546139  .4304825  5.755523
       |  9.363614  .2299298  .1808408  .2436141  .2066738  .1853151  33.12604
       |  .2003234 -.5925296  1.237816  .3266212  .9168961   1.18283  3.096199
       |  1.932389  1.351091   2.53219  1.106681  1.840698  2.399088  15.85446
       |      2246      2246      2246      2246      2246      1878      2246
       |     87938      1442       532       942       655       461  17444.57
       |        12         1         1         1         1         1  39.74164
       |        34         0         0         0         0         0  1.004952
       |        46         1         1         1         1         1  40.74659
------------------------------------------------------------------------------

For getting more help on tabstat, click here.

7. Customized Tables (table)

table is another useful command that help us getting tables in various dimensions and perspectives.  

One-way table

First, get the data by typing:

sysuse nlsw88, clear

To create a simple frequency table for a single categorical variable "race", type:

table race

Stata will give us the following table:

. table race
--------------------
        |  Frequency
--------+-----------
Race    |           
  White |      1,637
  Black |        583
  Other |         26
  Total |      2,246
--------------------

Two-way table

To create a two-way contingency table examining the relationship between two categorical variables "collgrad" and "race", type:

table collgrad race

Stata will give us the following table:

. table collgrad race
---------------------------------------------------
                   |               Race            
                   |  White   Black   Other   Total
-------------------+-------------------------------
College graduate   |                               
  Not college grad |  1,217     480      17   1,714
  College grad     |    420     103       9     532
  Total            |  1,637     583      26   2,246
---------------------------------------------------

 

To create a table of means for numeric variables "wage" and "hours" across the levels of categorical variable "collgrad", type:

table collgrad, statistic(mean wage hours)

Stata will give us the following table:

------------------------------------------------------
                   |  Hourly wage   Usual hours worked
-------------------+----------------------------------
College graduate   |                                  
  Not college grad |     6.910561             36.71888
  College grad     |     10.52606             38.82674
  Total            |     7.766949             37.21811
------------------------------------------------------

Multi-way table

To create a three dimension table for three categorical variables "collgrad", "race", and "union", type:

table collgrad race union

Stata will give us the following table:

----------------------------------------------
                   |        Union worker      
                   |  Nonunion   Union   Total
-------------------+--------------------------
College graduate   |                          
  Not college grad |                          
    Race           |                          
      White        |       792     194     986
      Black        |       298     114     412
      Other        |        11       5      16
      Total        |     1,101     313   1,414
  College grad     |                          
    Race           |                          
      White        |       259     108     367
      Black        |        52      37      89
      Other        |         5       3       8
      Total        |       316     148     464
  Total            |                          
    Race           |                          
      White        |     1,051     302   1,353
      Black        |       350     151     501
      Other        |        16       8      24
      Total        |     1,417     461   1,878
----------------------------------------------

 

To create a table containing mean values for various numerical variables ("age", "hours", "wage", and  "tenure") with respect to a categorical variables ("race"), type:

table race, statistic(mean age hours wage tenure)

Stata will give us the following table:

--------------------------------------------------------------------------------------
        |  Age in current year   Usual hours worked   Hourly wage   Job tenure (years)
--------+-----------------------------------------------------------------------------
Race    |                                                                             
  White |             39.27245             36.90398      8.082999             5.808236
  Black |             38.81132             38.12048      6.844558             6.501586
  Other |             39.30769             36.80769      8.550781             4.948718
  Total |             39.15316             37.21811      7.766949              5.97785
--------------------------------------------------------------------------------------

 

To create a table containing mean values for various numerical variables ("age", "hours", and "wage") with respect to multiple categorical variables ("race", "collgrad"), type:

table race collgrad , statistic(mean age hours wage)

Stata will give us the following table:

---------------------------------------------------------------------
                        |               College graduate             
                        |  Not college grad   College grad      Total
------------------------+--------------------------------------------
Race                    |                                            
  White                 |                                            
    Age in current year |          39.27034       39.27857   39.27245
    Usual hours worked  |           36.3347       38.55609   36.90398
    Hourly wage         |          7.318251       10.29895   8.082999
  Black                 |                                            
    Age in current year |          38.90417       38.37864   38.81132
    Usual hours worked  |          37.77406       39.72816   38.12048
    Hourly wage         |          5.875918       11.35861   6.844558
  Other                 |                                            
    Age in current year |          39.05882       39.77778   39.30769
    Usual hours worked  |          34.52941       41.11111   36.80769
    Hourly wage         |          6.938195       11.59678   8.550781
  Total                 |                                            
    Age in current year |          39.16569       39.11278   39.15316
    Usual hours worked  |          36.71888       38.82674   37.21811
    Hourly wage         |          6.910561       10.52606   7.766949
---------------------------------------------------------------------

Useful Resources

DSS Data Analysis Guides. Available at https://libguides.princeton.edu/c.php?g=1415215
 
Introduction to econometrics / James H. Stock, Mark W. Watson. 4th ed., Boston: Pearson Addison Wesley, 2019.
 
Introductory econometrics: A modern approach / Jeffrey M. Wooldridge. 6th ed., Cengage learning, 2015.

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.