Skip to Main Content

Loops in Stata: Conducting Repetitive Tasks

Using loops to handle repetitive tasks in Stata

Conducting Repetitive Tasks

1. Why Use Loops?

To process, manipulate, and analyze data in Stata, we sometimes need to do repetitive tasks. Examples include recoding a set of variables in the same manner, creating or renaming a series of variables, or repetitively recording values of a number of variables. 

- Using loops allows us to run the same codes once for repetitive work without typing them multiple times.  

- Using loops will keep your do-file concise.

This guide discusses the two most common loop techniques available in Stata. 

2. foreach - Loop over Items

We use  foreach command for looping over variables or items.  

Example 1

- Load the following dataset

 use https://dss.princeton.edu/training/loop-foreach.dta

- Suppose we want to convert some of the variables in the dataset into log form.

- If we follow the usual technique, we will have to convert each of them individually using the following commands -

gen pcgdp_log = ln(1+ pcgdp)

gen gdpgr_log = ln(1+gdpgr)

gen eduexp_log = ln(1+eduexp)

gen govtexp_log = ln(1+govtexp)

gen netoda_log = ln(1+netoda)

- Typing each of the above commands separately is a tedious task. Instead, we can use the loops option available in Stata, and use the foreach command as follows

foreach var of varlist pcgdp gdpgr eduexp govtexp netoda {
    gen `var'_log = ln(1+`var')
    }

Note:

- Open bracket  { appears on the same line as foreach

- Stata command (i.e., gen...) appears on a new line

- Close bracket  } appears on another line by itself

- Notice that new variables have been created in your dataset. 

Example 2

- Suppose we want to create a number of interaction variables by multiplying the polity variable with a number of selected variables (pcgdp, eduexp, govtexp).

- If we follow the usual technique, we will have to create each interaction separately by typing the following codes-

gen pcgdp_interaction = pcgdp*polity

gen eduexp_interactioneduexp*polity

gen govtexp_interactiongovtexp*polity

- However, we can generate the above interaction variables more easily with the help of foreach command. Type:

foreach var in pcgdp eduexp govtexp {
    gen `var'_interaction=`var'*polity
    }

- Notice that three interaction variables (pcgdp_interaction, eduexp_interaction, govtexp_interaction) have been created in the dataset.

Example 3

- Suppose you want to rename all the variables in your dataset by  adding  _NZ at the end of the variable names. 

- Use the following foreach command-

foreach var in * {
   rename `var' `var'_NZ
   }

- Notice that each variable now has _NZ at the end of its name.

3. forvalues - Loop over Consecutive Values

We use  forvalues command for looping over values or numbers.  

Example 1

- Load the following dataset

 use https://dss.princeton.edu/training/loop-forvalue.dta

- Suppose we want to report yearly summary statistics for some of the variables (polity pcgdp netoda) for selected number of years (e.g., 2000-2005).

- We can use Stata's loops option and type in the forvalues command like this: 

forvalues t = 2000/2005 {
      display `t'
      summarize polity pcgdp netoda if year == `t'
  }

Notes:

- Open bracket  { appears on the same line as forvalues

- Stata command (i.e., gen...) appears on a new line

- Close bracket  } appears on another line by itself

- The above codes generate the following Stata outputs showing yearly summary statistics for the designated variables.

2000
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      polity |          5         6.8     1.30384          5          8
       pcgdp |          5     6209.06    2852.131   2403.932   9253.968
      netoda |          5    1.43e+08    1.29e+08  -2.14e+07   2.67e+08
2001
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      polity |          5         6.8     1.30384          5          8
       pcgdp |          5    6223.012    2782.159   2463.709   9088.054
      netoda |          5    2.38e+08    1.94e+08   3.74e+07   5.34e+08
2002
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      polity |          5         6.8     1.30384          5          8
       pcgdp |          5     6261.52    2767.848   2493.008   8960.554
      netoda |          5    2.34e+08    2.24e+08   1.88e+07   5.98e+08
2003
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      polity |          5         6.8     1.30384          5          8
       pcgdp |          5    6341.312    2754.055    2476.07   8967.029
      netoda |          5    3.05e+08    4.09e+08   1.51e+07   1.02e+09
2004
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      polity |          5         6.8     1.30384          5          8
       pcgdp |          5    6603.666    2867.279   2512.777   9346.039
      netoda |          5    2.25e+08    2.31e+08   2.71e+07   6.26e+08
2005
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      polity |          5         6.8     1.30384          5          8
       pcgdp |          5    6726.322    2931.975   2462.347   9535.419
      netoda |          5    2.72e+08    2.64e+08   4.52e+07   7.28e+08

Example 2

- We can use the forvalues command to generate dummy variables for each decade. In our loop-forvalues dataset, we have fifteen years of data (2000-2014) spread across two decades.

- Let's generate dummy variables for each decade, and name the variables as decade2000 and decade2010.

- Use the following loops 

forvalues decade = 2000(10)2014 {
    generate decade`decade' = (int(year / 10) * 10 == `decade')
}

- Notice that the dataset now has two dummy variables named decade2000 and decade2010.    

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.