Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Panel Data Analysis Using Stata: Fixed Effects and Random Effects

Estimation of basic fixed effects and random effects models using Stata

Fixed Effects and Random Effects

Panel Data

Concept

Panel data is a dataset in which the behavior of each individual or entity (e.g., country, state, industry) is observed at multiple points in time.

   Example:

country          year          y           x1          x2          x3
A                2018          13          6          0.5          26
A                2019          17          4          0.3          15
A                2020          12          7          0.9          18
B                2018          16          5          0.4          16
B                2019          11          3          0.5          19
B                2020          14          4          0.7          21
C                2018          11          8          0.8          14
C                2019          18          2          0.6          17
C                2020          10          5          0.2          21
  • Using panel data accounts for variables that we cannot observe or measure (e.g., individuals' innate characteristics, cultural factors, differences in business practice across companies). These variables are also called omitted variables.

  • Using panel data accounts for variables that change over time, but not across entities (e.g., national policies, federal regulations, international agreements).

Declaring Panel Data

- When we work with panel data in Stata, we need to declare that we have a panel dataset.

Use the following dataset:

 use https://dss.princeton.edu/training/Panel101_new.dta

For declaration, type

 xtset country year

. xtset country year
Panel variable: country (strongly balanced)
 Time variable: year, 2011 to 2020
         Delta: 1 unit

 

The note “(strongly balanced)” refers to the fact that all countries have data for all years. If, for example, a country does not have data for any year, then the data is unbalanced. Ideally you would want to have a balanced dataset, but this is not always the case. Nevertheless, you can still run the model.

NOTE: If you get the following error after using xtset:

string variables not allowed in varlist;
country is a string variable

 

You need to convert ‘country’ to numeric, type:

encode country, gen(country1)

Use ‘country1’ instead of ‘country’ in the xtset command

This guide discusses two basic techniques that we commonly use to analyze panel data: (i) fixed effects method, and (ii) random effects method.

Fixed Effects

Key Assumptions

  • The omitted variable is correlated with included regressor(s).
  • Unobserved characteristics of individuals (e.g., innate ability) or entities that are correlated with regressor(s) do not vary over time.

Concept

Fixed effects method utilizes panel data to control for (omitted) variables that differ across individuals or entities (e.g., states, country), but are constant over time.

When using FE, we assume that characteristics of an individual may impact or bias the predictor or outcome variables, and we need to control for this. This is the rationale behind the assumption of the correlation between an entity’s error term and predictor variables. FE removes the effect of those time-invariant characteristics, and therefore we can assess the net effect of the predictors on the outcome variable.

In fixed effects models, the slope coefficient of the population regression line is the same for all individuals or entities, but the intercept of the population regression line varies across individuals/entities (Stokes and Watson, 2019).

Estimation

This guide discusses two different ways to estimate fixed effects models: (i) within estimator,  (ii) dummy variable estimator . 

(i) Within Estimator

This is the more commonly used estimator for fixed effects models. This estimator is called the "within estimator", as it uses time variation within each cross-section. 

- Use the following dataset (ignore this step if you have already opened the dataset in the previous section)

use https://dss.princeton.edu/training/Panel101_new.dta

- Declare the dataset as a panel using xtset (ignore this step if you have already declared the dataset as a panel)

- Use the following command to estimate your fixed effects model

  xtreg y x1 x2, fe            Note: the use of fe option indicates that we are estimating a fixed effects model.

. xtreg y x1 x2, fe
Fixed-effects (within) regression               Number of obs     =         70
Group variable: country                         Number of groups  =          7
R-squared:                                      Obs per group:
     Within  = 0.0903                                         min =         10
     Between = 0.0546                                         avg =       10.0
     Overall = 0.0000                                         max =         10
                                                F(2,61)           =       3.03
corr(u_i, Xb) = -0.8561                         Prob > F          =     0.0557
------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   2.23e+09   1.13e+09     1.97   0.053    -2.86e+07    4.50e+09
          x2 |   2.05e+09   2.00e+09     1.02   0.310    -1.95e+09    6.06e+09
       _cons |   1.23e+08   7.99e+08     0.15   0.878    -1.48e+09    1.72e+09
-------------+----------------------------------------------------------------
     sigma_u |  3.070e+09
     sigma_e |  2.794e+09
         rho |  .54680874   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(6, 61) = 3.14                       Prob > F = 0.0095

The coefficient of x1 indicates how much of Y changes over time, on average per country, when x1 increases by one unit, holding all other variables constant.

The first highlighted p-value suggests whether x1 significantly affects the dependent variable (y). As the p value is < 0.10, the coefficient for x1 is significant at 10% level. 

The second highlighted p-value suggests whether the estimated model is statistically significant. As the p value is < 0.01, the model is statistically significant at 1% level. 

(ii) Dummy Variable Regression

When there are a small number of fixed effects to be estimated, it is convenient to just run dummy variable regression for a FE model.

- Use the following dataset (ignore this step if you have already opened the dataset for the previous section)

use https://dss.princeton.edu/training/Panel101_new.dta

- Declare the dataset as a panel using xtset (ignore this step if you have already declared the dataset as a panel)

- Use the following command to estimate your fixed effects model

reg y x1 x2 i.country

. reg y x1 x2 i.country

      Source |       SS           df       MS      Number of obs   =        70
-------------+----------------------------------   F(8, 61)        =      2.42
       Model |  1.5096e+20         8  1.8870e+19   Prob > F        =    0.0245
    Residual |  4.7634e+20        61  7.8088e+18   R-squared       =    0.2406
-------------+----------------------------------   Adj R-squared   =    0.1411
       Total |  6.2729e+20        69  9.0912e+18   Root MSE        =    2.8e+09
------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   2.23e+09   1.13e+09     1.97   0.053    -2.86e+07    4.50e+09
          x2 |   2.05e+09   2.00e+09     1.02   0.310    -1.95e+09    6.06e+09
             |
     country |
          B  |  -6.77e+09   4.88e+09    -1.39   0.171    -1.65e+10    2.99e+09
          C  |  -1.44e+09   1.96e+09    -0.74   0.464    -5.36e+09    2.47e+09
          D  |  -2.93e+09   5.24e+09    -0.56   0.578    -1.34e+10    7.55e+09
          E  |  -6.54e+09   5.10e+09    -1.28   0.204    -1.67e+10    3.65e+09
          F  |   6.14e+08   1.38e+09     0.44   0.659    -2.15e+09    3.38e+09
          G  |  -3.32e+08   2.12e+09    -0.16   0.876    -4.56e+09    3.90e+09
             |
       _cons |   2.61e+09   1.94e+09     1.34   0.184    -1.27e+09    6.49e+09
------------------------------------------------------------------------------

 

Notice that the estimated coefficients for x1 and x2 are the same for both the "Within Estimator" method and the "Dummy Variable Regression" method. 

Notes:

- Fixed effects do not work when lagged outcomes are included in the regression. Therefore, we do not use a lagged dependent variable as a regressor.

Random Effects

Key Assumption

  • The omitted variable is uncorrelated with included regressor(s).

Concept

If individual effects are strictly uncorrelated with the regressors, it may be appropriate to model the individual specific constant terms as randomly distributed across cross-sectional units. This view would be appropriate if we believe that sampled cross-sectional units were drawn from a large population. 

If you have reason to believe that differences across entities have some influence on your dependent variable, then you should use random effects. In a random effects model, you need to specify those individual characteristics that may or may not influence the predictor variables. The problem with this is that data on some variables (i.e., individual characteristics such as innate ability) may not be available, hence leading to omitted variable bias in the model.

An advantage of using random effects method is that you can include time invariant variables (e.g., geographical contiguity, distance between states) in your model. In the fixed effects model, these variables are absorbed by the intercept.

Estimation

- Use the following dataset

use https://dss.princeton.edu/training/Panel101_new.dta

- Declare the dataset as a panel using xtset  

- Use the following command to estimate your random effects model

xtreg y x1 x2, re        Note: the use of re option indicates that we are estimating a random effects model.

. xtreg y x1 x2, re
Random-effects GLS regression                   Number of obs     =         70
Group variable: country                         Number of groups  =          7
R-squared:                                      Obs per group:
     Within  = 0.0803                                         min =         10
     Between = 0.2333                                         avg =       10.0
     Overall = 0.0055                                         max =         10
                                                Wald chi2(2)      =       2.24
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.3261
------------------------------------------------------------------------------
           y | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   1.46e+09   9.78e+08     1.50   0.134    -4.53e+08    3.38e+09
          x2 |   2.44e+08   4.20e+08     0.58   0.562    -5.80e+08    1.07e+09
       _cons |   8.64e+08   8.48e+08     1.02   0.308    -7.98e+08    2.53e+09
-------------+----------------------------------------------------------------
     sigma_u |  1.070e+09
     sigma_e |  2.794e+09
         rho |  .12789303   (fraction of variance due to u_i)
------------------------------------------------------------------------------

The coefficient of x1 indicates how much of Y changes over time, on average per country, when x1 increases by one unit, holding all other variables constant.

The highlighted p-value suggests whether x1 significantly affects the dependent variable (y). As the p value is not  < 0.10, the coefficient for x1 is not significant at 10% level. 

Fixed Effects or Random Effects?

Hausman Test

- Use the Hausman test to decide whether to use a fixed effects or random effects model. 

- Procedures:

- Run a fixed effects model and save the estimates

- Run a random effects model and save the estimates

-  Perform the Hausman test

- Use the following Stata commands

xtreg y x1 x2, fe
estimates store fixed
xtreg y x1 x2, re
estimates store random
hausman fixed random

...

. hausman fixed random
                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |     fixed        random       Difference       Std. err.
-------------+----------------------------------------------------------------
          x1 |    2.23e+09     1.46e+09        7.69e+08        5.68e+08
          x2 |    2.05e+09     2.44e+08        1.81e+09        1.96e+09
------------------------------------------------------------------------------
                          b = Consistent under H0 and Ha; obtained from xtreg.
           B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
Test of H0: Difference in coefficients not systematic
    chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
            =   5.99
Prob > chi2 = 0.0500

 

- Decision rule: if the highlighted p value is < 0.05, use a fixed effects model. In this case, we should use a random effect model. 

References

DSS Online Training Section https://dss.princeton.edu/training/

Princeton DSS Libguides https://libguides.princeton.edu/dss

UCLA Resources https://stats.oarc.ucla.edu/stata/

Stata Resources https://www.stata.com/features/overview/linear-fixed-and-random-effects-models/

Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.

Baltagi, B. (2021). Econometric analysis of panel data (6th ed). Springer.

Bartels, B. (2008). "Beyond fixed versus random effects": a framework for improving substantive and statistical analysis of panel, time-series cross-sectional, and multilevel data. The Society for Political Methodology, 9, 1-43. Available at: https://home.gwu.edu/~bartels/cluster.pdf

Baum, C. F. (2006). An introduction to modern econometrics using Stata. Stata Press.  

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.

Greene, W. H. (2018). Econometric analysis (8th ed.). Pearson.

Hamilton, L. C. (2012). Statistics with Stata: version 12. Cengage Learning.

Hoechle, D. (2007). Robust standard errors for panel regressions with cross-sectional dependence. The stata journal, 7(3), 281-312. Available at: https://journals.sagepub.com/doi/pdf/10.1177/1536867X0700700301

Kohler, U., & Kreuter, F. (2012). Data analysis using Stata (3rd ed.). Stata Press.

Stock, J. H., & Watson, M. W. (2019). Introduction to econometrics (4th ed.). Pearson.

Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT Press.

Wooldridge, J. M. (2020). Introductory econometrics: a modern approach (7th ed). Cengage Learning.

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.