Research Guides: Factor Analysis in Stata: Getting Started with Factor Analysis

1. What is Factor Analysis?

When a dataset contains large number of variables, there is a possibility that many of these variables substantively overlap with each other. In this situation, we may need to reduce the number of variables in our dataset. Factor analysis can be a useful tool to apply in this regard. For example, in psychology research, we can reduce long personality test responses to a small number of personality traits by conducting a factor analysis.

Key objectives of factor analysis are:

(i) Getting a small set of variables (preferably uncorrelated) from a large set of variables (most of which are correlated with each other).

(ii) Creating indexes with variables that conceptually measure similar things.

There are two types of factor analysis.

Exploratory Factor Analysis: We use exploratory factor analysis when we do not have a predefined idea of the structure or how many dimensions there are in a set of variables.
Confirmatory Factor Analysis: We use confirmatory factor analysis when we want to test a specific hypothesis about the structure or the number of dimensions underlying a set of variables. For instance, we use confirmatory factor analysis if we think our data have two dimensions and we want to verify that.

Note: Sometimes we get confused about using factor analysis and principal component analysis (PCA) interchangeably as they are very similar in many ways. But keep in mind that there is a fundamental difference between them - PCA is a linear combination of variables and factor analysis is a measurement model of a latent variable (for more see here and here).

In this tutorial, we will show how to conduct different kinds of exploratory factor analysis using data from Meijers and Zaslove (2021).

2. A Simple Factor Analysis

Let's first see how to conduct a very basic factor analysis.

Load the following dataset:

use https://dss.princeton.edu/training/factor.dta

The dataset (constructed by Meijers and Zaslove, 2021) contains information for 250 political parties in 28 European countries. The authors measure populism in political parties using expert surveys. As populism is a multi-dimensional concept, the authors measure it with the help of five variables: manichean, indivisible, generalwill, peoplecentrism, and antielitism. By using simple factor analysis, we will identify the number of latent factor(s) among these five variables. To do that, use the following Stata command:

factor manichean indivisble generalwill peoplecentrism antielitism

Stata provides us with the following outputs:

Factor analysis/correlation Number of obs = 236
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 10

--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 3.96566 3.63925 0.9500 0.9500
Factor2 | 0.32641 0.30602 0.0782 1.0282
Factor3 | 0.02039 0.08281 0.0049 1.0331
Factor4 | -0.06241 0.01332 -0.0150 1.0181
Factor5 | -0.07573 . -0.0181 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(10) = 1370.74 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

-----------------------------------------------------------
Variable | Factor1 Factor2 Factor3 | Uniqueness
-------------+------------------------------+--------------
manichean | 0.8610 0.0035 0.1112 | 0.2464
indivisble | 0.8691 0.3273 -0.0166 | 0.1373
generalwill | 0.9222 0.2402 -0.0334 | 0.0907
peoplecent~m | 0.9005 -0.2553 -0.0784 | 0.1177
antielitism | 0.8987 -0.3105 0.0223 | 0.0955
-----------------------------------------------------------

Interpretation:

From the outputs, we see that the simple factor analysis command retains only three factors (Factor1, Factor2, and Factor3). It has dropped the other two factors (Factor4 and Factor5) because their Eigenvalues are negative. According to the mineigen (0) criterion, we should retain only those factors that have positive eigenvalues.
If the Uniqueness is high, then the corresponding variable is not well explained by the factors. Values > 0.6 are usually considered as high. As all of our Uniqueness values (see the second table) are much lower than 0.6, we can say that the (five) variables are sufficiently explained by the retained (three) factors.

3. Principal-component Factors Analysis

In the previous example, we showed principal-factor solution, where the communalities (defined as 1 - Uniqueness) were estimated using the squared multiple correlation coefficients. However, if we assume that there are no unique factors, we should use the "Principal-component factors" option (keep in mind that principal-component factors analysis and principal component analysis are not the same thing!). To do that let's load the following dataset:

use https://dss.princeton.edu/training/factor.dta, clear

- Use the following Stata codes:

factor manichean indivisble generalwill peoplecentrism antielitism, pcf

Note: after the factor command we list the variables from which we want to get a latent factor(s).

Stata provides us with the following outputs:

Factor analysis/correlation Number of obs = 236
Method: principal-component factors Retained factors = 1
Rotation: (unrotated) Number of params = 5

--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 4.12933 3.65862 0.8259 0.8259
Factor2 | 0.47071 0.22482 0.0941 0.9200
Factor3 | 0.24589 0.16335 0.0492 0.9692
Factor4 | 0.08254 0.01102 0.0165 0.9857
Factor5 | 0.07152 . 0.0143 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(10) = 1370.74 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.9008 | 0.1886
indivisble | 0.8876 | 0.2122
generalwill | 0.9322 | 0.1310
peoplecent~m | 0.9130 | 0.1664
antielitism | 0.9096 | 0.1726
---------------------------------------

Interpretation:

For Principal-component factors, Kaiser criterion suggests to retain the factors with eigenvalues greater than or equal to 1. In the first table, we see only Factor1 met this criterion. So, we retain Factor1 only.
Proportion in the first table shows the size of variance explained by each factor. Here, (0.8259*100=) 82.59% of total variation is explained by Factor1.
Factor loadings are the weights and correlation between each variable and the factor. The higher the load, the more relevant it is in defining the factor’s dimensionality. A negative value indicates an inverse impact on the factor. Here, Factor1 is retained because it has an eigenvalue of > 1.
Uniqueness is the variance that is 'unique' to the variable and not shared with other variables. The smaller the Uniqueness, the higher the relevance of the variable in the factor model. For example, only 18.86% of the variance in "manichean" is not shared with other variables (i.e., 81.14% of the variance in "manichean" is shared with other variables). Therefore, manichean is highly relevant in the factor model. Similarly, other variables are also highly relevant in the factor model indicating that all of them highly contributed to defining Factor1.

- After running factor we can now use the rotate command to get a clearer pattern of our factor model. To do this, type rotate

Stata provides us with the following outputs:

Factor analysis/correlation Number of obs = 236
Method: principal-component factors Retained factors = 1
Rotation: orthogonal varimax (Kaiser off) Number of params = 5

--------------------------------------------------------------------------
Factor | Variance Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 4.12933 . 0.8259 0.8259
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(10) = 1370.74 Prob>chi2 = 0.0000

Rotated factor loadings (pattern matrix) and unique variances

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.9008 | 0.1886
indivisble | 0.8876 | 0.2122
generalwill | 0.9322 | 0.1310
peoplecent~m | 0.9130 | 0.1664
antielitism | 0.9096 | 0.1726
---------------------------------------

Factor rotation matrix

-----------------------
| Factor1
-------------+---------
Factor1 | 1.0000
-----------------------

Notice that from the first table we can see more clearly that Factor1 explains 82.59% of the total observed variance in our factor model.

- We now know that Factor1 is the latent factor in our model, which can contain the information from the variables included to measure populism. We will now create a new variable in our dataset. To do that type:

predict factor1

Stata provides us with the following outputs:

. predict factor1
(option regression assumed; regression scoring)

Scoring coefficients (method = regression; based on varimax rotated factors)

------------------------
Variable | Factor1
-------------+----------
manichean | 0.21814
indivisble | 0.21495
generalwill | 0.22576
peoplecent~m | 0.22111
antielitism | 0.22028
------------------------

Notes:

Notice in the variable list in Stata window that a new variable factor1 has been generated, which we can use to run a regression of our choice.
The numbers in Factor1 column in the above table are the regression coefficients used to estimate the individual scores (per case/row) for factor1 variable in the dataset.
If we got two or more factors from our analysis, we should type the name of each of them to create a set of new variables.
If we got two or more factors from our analysis, we could also create indexes out of each cluster of variables.
We can name the variable as we wish. For example, instead of factor1, we can name the new variable as populism_factor. In that case, type the following command

predict populism_factor

4. Iterated Principal-factor Analysis

Iterated factor analysis is similar to principal-component factors analysis. However, the key difference between the two is that in iterated factor analysis, the solutions are iterated to obtain better estimates.

- To conduct iterated factor analysis, let's load the following dataset:

use https://dss.princeton.edu/training/factor.dta, clear

- Use the following Stata codes:

factor manichean indivisble generalwill peoplecentrism antielitism,ipf mineigen(1)

Notes:

After the factor command, we list the variable names from which we want to derive latent factor(s).
After ipf, by mineigen(1)we specify that that we want the factors having eigenvalues greater than 1 because Kaiser criterion suggests to retain the factors with eigenvalues greater than or equal to 1.

Stata provides us with the following outputs:

Factor analysis/correlation                      Number of obs    =        236
    Method: iterated principal factors           Retained factors =          1
    Rotation: (unrotated)                        Number of params =          5

    --------------------------------------------------------------------------
         Factor |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1 |      3.91391      3.66278            1.0000       1.0000
        Factor2 |      0.25113      0.23507            0.0642       1.0642
        Factor3 |      0.01606      0.14168            0.0041       1.0683
        Factor4 |     -0.12562      0.01596           -0.0321       1.0362
        Factor5 |     -0.14158            .           -0.0362       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated: chi2(10) = 1370.74 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    ---------------------------------------
        Variable | Factor1 |   Uniqueness
    -------------+----------+--------------
       manichean |   0.8712 |      0.2411
      indivisble |   0.8515 |      0.2750
     generalwill |   0.9227 |      0.1486
    peoplecent~m |   0.8913 |      0.2057
     antielitism |   0.8856 |      0.2157
    ---------------------------------------

Interpretation:

In case of iterated principal-factor, Kaiser criterion suggests to retain the factors with eigenvalues greater than or equal to 1. In the first table, we see only Factor1 met this criterion. So, we retain factor 1 only.
Proportion in the first table shows the size of variance explained by each factor. Here, 100% of total variation is explained by Factor1.
Factor loadings are the weights and correlation between each variable and the factor. The higher the load, the more relevant it is in defining the factor’s dimensionality. A negative value indicates an inverse impact on the factor.
Uniqueness is the variance that is 'unique' to the variable and not shared with other variables. The smaller the Uniqueness, the higher the relevance of the variable in the factor model. For example, only 24.11% of the variance in "manichean" is not shared with other variables (i.e., 75.89% of the variance in "manichean" is shared with other variables). Therefore, manichean is highly relevant in the factor model. Similarly, other items are also highly relevant in the factor model indicating that all the items highly contributed to defining Factor1.

- After running factor we can now use the rotate command to get a clearer pattern of our factor model. To do this, type: rotate

Stata provides us with the following outputs:

Factor analysis/correlation                      Number of obs    =        236
    Method: iterated principal factors           Retained factors =          1
    Rotation: orthogonal varimax (Kaiser off)    Number of params =          5

    --------------------------------------------------------------------------
         Factor |     Variance   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1 |      3.91391            .            1.0000       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated: chi2(10) = 1370.74 Prob>chi2 = 0.0000

Rotated factor loadings (pattern matrix) and unique variances

    ---------------------------------------
        Variable | Factor1 |   Uniqueness
    -------------+----------+--------------
       manichean |   0.8712 |      0.2411
      indivisble |   0.8515 |      0.2750
     generalwill |   0.9227 |      0.1486
    peoplecent~m |   0.8913 |      0.2057
     antielitism |   0.8856 |      0.2157
    ---------------------------------------

Factor rotation matrix

    -----------------------
                 | Factor1
    -------------+---------
         Factor1 | 1.0000
    -----------------------

The outputs are similar to those for the principal-component factors analysis. However, in this case, Factor1 explains 100% of the total observed variance in our factor model.

predict factor1

Stata provides us with the following outputs:

. predict factor1
(regression scoring assumed)

Scoring coefficients (method = regression; based on varimax rotated factors)

    ------------------------
        Variable | Factor1
    -------------+----------
       manichean | 0.18634
      indivisble | 0.05959
     generalwill | 0.40780
    peoplecent~m | 0.20162
     antielitism | 0.20724
    ------------------------

Notes:

Notice in the variable list in Stata window that a new variable factor1 has been generated, which we can use to run a regression of our choice.
The numbers in Factor1 column in the above table are the regression coefficients used to estimate the individual scores (per case/row) for factor1 variable in the dataset.
If we got two or more factors from our analysis, we should type the name of each of them to create a set of new variables.
If we got two or more factors from our analysis, we could also create indexes out of each cluster of variables.
We can name the variable as we wish. For example, instead of factor1, we can name the new variable as populism_factor. In that case, type the following command

predict populism_factor

5. References

DSS Data Analysis Guides https://libguides.princeton.edu/c.php?g=1415215

Meijers, M. J., & Zaslove, A. (2021). Measuring populism in political parties: appraisal of a new approach. Comparative political studies, 54(2), 372-407.

Princeton DSS Libguides https://libguides.princeton.edu/dss

Stata Manual for Factor Analysis https://www.stata.com/manuals13/mvfactor.pdf

UCLA Resources https://stats.oarc.ucla.edu/stata/output/factor-analysis/

Watkins, M. W. (2021). A Step-by-step Guide to Exploratory Factor Analysis with Stata. Routledge.

Wu, H. S. (2018). Introduction to Factor Analysis. CFDR Workshop Series. Available at https://www.bgsu.edu/content/dam/BGSU/college-of-arts-and-sciences/center-for-family-and-demographic-research/documents/Workshops/2018-Factor-Analysis.pdf

Data Consultant

Muhammad Al Amin

He/Him/His

Email Me

Contact:

Firestone Library, A-12-F.1

609-258-6051

Data Consultant

Yufei Qin

Email Me

Contact:

Firestone Library, A.12F.2

6092582519

Factor Analysis in Stata: Getting Started with Factor Analysis

Getting Started with Factor Analysis

Table of Contents

1. What is Factor Analysis?

2. A Simple Factor Analysis

3. Principal-component Factors Analysis

4. Iterated Principal-factor Analysis

5. References

1. What is Factor Analysis?

2. A Simple Factor Analysis

Factor analysis/correlation Number of obs = 236
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 10

Factor loadings (pattern matrix) and unique variances

3. Principal-component Factors Analysis

use https://dss.princeton.edu/training/factor.dta, clear

factor manichean indivisble generalwill peoplecentrism antielitism, pcf

Factor analysis/correlation Number of obs = 236
Method: principal-component factors Retained factors = 1
Rotation: (unrotated) Number of params = 5

Factor loadings (pattern matrix) and unique variances

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.9008 | 0.1886
indivisble | 0.8876 | 0.2122
generalwill | 0.9322 | 0.1310
peoplecent~m | 0.9130 | 0.1664
antielitism | 0.9096 | 0.1726
---------------------------------------

Rotated factor loadings (pattern matrix) and unique variances

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.9008 | 0.1886
indivisble | 0.8876 | 0.2122
generalwill | 0.9322 | 0.1310
peoplecent~m | 0.9130 | 0.1664
antielitism | 0.9096 | 0.1726
---------------------------------------

Factor rotation matrix

-----------------------
| Factor1
-------------+---------
Factor1 | 1.0000
-----------------------

. predict factor1
(option regression assumed; regression scoring)

Scoring coefficients (method = regression; based on varimax rotated factors)

------------------------
Variable | Factor1
-------------+----------
manichean | 0.21814
indivisble | 0.21495
generalwill | 0.22576
peoplecent~m | 0.22111
antielitism | 0.22028
------------------------

4. Iterated Principal-factor Analysis

use https://dss.princeton.edu/training/factor.dta, clear

Factor analysis/correlation                      Number of obs    =        236
    Method: iterated principal factors           Retained factors =          1
    Rotation: (unrotated)                        Number of params =          5

Factor loadings (pattern matrix) and unique variances

Factor analysis/correlation                      Number of obs    =        236
    Method: iterated principal factors           Retained factors =          1
    Rotation: orthogonal varimax (Kaiser off)    Number of params =          5

Rotated factor loadings (pattern matrix) and unique variances

Factor rotation matrix

-----------------------
                 | Factor1
    -------------+---------
         Factor1 | 1.0000
    -----------------------

. predict factor1
(regression scoring assumed)

Scoring coefficients (method = regression; based on varimax rotated factors)

------------------------
        Variable | Factor1
    -------------+----------
       manichean | 0.18634
      indivisble | 0.05959
     generalwill | 0.40780
    peoplecent~m | 0.20162
     antielitism | 0.20724
    ------------------------

5. References

Data Consultant

Data Consultant

Comments or Questions?

Factor Analysis in Stata: Getting Started with Factor Analysis

Getting Started with Factor Analysis

Table of Contents

1. What is Factor Analysis?

2. A Simple Factor Analysis

3. Principal-component Factors Analysis

4. Iterated Principal-factor Analysis

5. References

1. What is Factor Analysis?

2. A Simple Factor Analysis

Factor analysis/correlation Number of obs = 236 Method: principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 10

Factor loadings (pattern matrix) and unique variances

3. Principal-component Factors Analysis

use https://dss.princeton.edu/training/factor.dta, clear

factor manichean indivisble generalwill peoplecentrism antielitism, pcf

Factor analysis/correlation Number of obs = 236 Method: principal-component factors Retained factors = 1 Rotation: (unrotated) Number of params = 5

Factor loadings (pattern matrix) and unique variances

--------------------------------------- Variable | Factor1 | Uniqueness -------------+----------+-------------- manichean | 0.9008 | 0.1886 indivisble | 0.8876 | 0.2122 generalwill | 0.9322 | 0.1310 peoplecent~m | 0.9130 | 0.1664 antielitism | 0.9096 | 0.1726 ---------------------------------------

Rotated factor loadings (pattern matrix) and unique variances

--------------------------------------- Variable | Factor1 | Uniqueness -------------+----------+-------------- manichean | 0.9008 | 0.1886 indivisble | 0.8876 | 0.2122 generalwill | 0.9322 | 0.1310 peoplecent~m | 0.9130 | 0.1664 antielitism | 0.9096 | 0.1726 ---------------------------------------

Factor rotation matrix

----------------------- | Factor1 -------------+--------- Factor1 | 1.0000 -----------------------

. predict factor1 (option regression assumed; regression scoring)

Scoring coefficients (method = regression; based on varimax rotated factors)

------------------------ Variable | Factor1 -------------+---------- manichean | 0.21814 indivisble | 0.21495 generalwill | 0.22576 peoplecent~m | 0.22111 antielitism | 0.22028 ------------------------

4. Iterated Principal-factor Analysis

use https://dss.princeton.edu/training/factor.dta, clear

Factor analysis/correlation Number of obs = 236 Method: iterated principal factors Retained factors = 1 Rotation: (unrotated) Number of params = 5

Factor loadings (pattern matrix) and unique variances

--------------------------------------- Variable | Factor1 | Uniqueness -------------+----------+-------------- manichean | 0.8712 | 0.2411 indivisble | 0.8515 | 0.2750 generalwill | 0.9227 | 0.1486 peoplecent~m | 0.8913 | 0.2057 antielitism | 0.8856 | 0.2157 ---------------------------------------

Factor analysis/correlation Number of obs = 236 Method: iterated principal factors Retained factors = 1 Rotation: orthogonal varimax (Kaiser off) Number of params = 5

Rotated factor loadings (pattern matrix) and unique variances

--------------------------------------- Variable | Factor1 | Uniqueness -------------+----------+-------------- manichean | 0.8712 | 0.2411 indivisble | 0.8515 | 0.2750 generalwill | 0.9227 | 0.1486 peoplecent~m | 0.8913 | 0.2057 antielitism | 0.8856 | 0.2157 ---------------------------------------

Factor rotation matrix

----------------------- | Factor1 -------------+--------- Factor1 | 1.0000 -----------------------

. predict factor1 (regression scoring assumed)

Scoring coefficients (method = regression; based on varimax rotated factors)

------------------------ Variable | Factor1 -------------+---------- manichean | 0.18634 indivisble | 0.05959 generalwill | 0.40780 peoplecent~m | 0.20162 antielitism | 0.20724 ------------------------

5. References

Data Consultant

Data Consultant

Comments or Questions?

Factor analysis/correlation Number of obs = 236
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 10

Factor analysis/correlation Number of obs = 236
Method: principal-component factors Retained factors = 1
Rotation: (unrotated) Number of params = 5

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.9008 | 0.1886
indivisble | 0.8876 | 0.2122
generalwill | 0.9322 | 0.1310
peoplecent~m | 0.9130 | 0.1664
antielitism | 0.9096 | 0.1726
---------------------------------------

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.9008 | 0.1886
indivisble | 0.8876 | 0.2122
generalwill | 0.9322 | 0.1310
peoplecent~m | 0.9130 | 0.1664
antielitism | 0.9096 | 0.1726
---------------------------------------

-----------------------
| Factor1
-------------+---------
Factor1 | 1.0000
-----------------------

. predict factor1
(option regression assumed; regression scoring)

------------------------
Variable | Factor1
-------------+----------
manichean | 0.21814
indivisble | 0.21495
generalwill | 0.22576
peoplecent~m | 0.22111
antielitism | 0.22028
------------------------

Factor analysis/correlation Number of obs = 236
Method: iterated principal factors Retained factors = 1
Rotation: (unrotated) Number of params = 5

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.8712 | 0.2411
indivisble | 0.8515 | 0.2750
generalwill | 0.9227 | 0.1486
peoplecent~m | 0.8913 | 0.2057
antielitism | 0.8856 | 0.2157
---------------------------------------

Factor analysis/correlation Number of obs = 236
Method: iterated principal factors Retained factors = 1
Rotation: orthogonal varimax (Kaiser off) Number of params = 5

---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
manichean | 0.8712 | 0.2411
indivisble | 0.8515 | 0.2750
generalwill | 0.9227 | 0.1486
peoplecent~m | 0.8913 | 0.2057
antielitism | 0.8856 | 0.2157
---------------------------------------

-----------------------
| Factor1
-------------+---------
Factor1 | 1.0000
-----------------------

. predict factor1
(regression scoring assumed)

------------------------
Variable | Factor1
-------------+----------
manichean | 0.18634
indivisble | 0.05959
generalwill | 0.40780
peoplecent~m | 0.20162
antielitism | 0.20724
------------------------