Skip to Main Content

Reshape in Stata: Restructuring Long/Wide Data using Stata

This guide discusses basic techniques to restructure data from long format to wide format and vice versa using Stata.

Restructuring Long/Wide Data using Stata

1. Wide and Long Data

Data are usually available either in "long" format or in "wide" format. This tutorial  discusses how to convert long format data to wide format, and wide format data to long format with the help of reshape command available in Stata.

Wide format data:    

Wide-form data are organized by the group identifier (e.g., individual, state, country), and all observations on a particular identifier are usually stored in a single row.

Example     

          country  gdp2020  gdp2021  trade2020  trade2021
           A        80       85       21         23
           B        35       40       10         11                            
           C        75       78       23         27 
 

Notice that in the above wide form data, observations for each identifier (i.e., country) are presented in a single row.

Long format data:

Long-form data are organized by the within-group identifier (e.g., year, month, date) , storing the observations for each identifier in multiple rows. 

Example                                                           
country  year   gdp  trade             
  A      2020    80     21                
  A      2021    85     23               
  B      2020    35     10
  B
      2021    40     11                
  C      2020    75     23               
  C      2021    78     27

Notice that in the above long form data, observations for each identifier (i.e., country) are presented according to the within-group identifier (i.e., year) and stored in multiple rows.

Remember, for panel data analysis, we need data in long format.

2. Reshape Wide to Long

Let's convert wide format data to long format.

Example 1

- Load the following dataset

use https://dss.princeton.edu/training/widelong-1.dta

Part of the loaded wide format dataset looks like this

- To convert the above wide form data to long form, type:

reshape long gdp trade, i(country) j(year)

   Notes: 

  • After reshape long,type the name of the variable(s) - here we have two variables in the dataset (e.g., gdp and trade)

  • i indicates the group identifier (e.g., individual, state, country, or any other entity).

  • j indicates the within-group identifier (e.g., year, month, date). 

- Stata will give you the following message to show how it has converted the data from wide to long format

. reshape long gdp trade, i(country) j(year)
(note: j = 2017 2018 2019 2020 2021)
Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        3   ->      15
Number of variables                  11   ->       4
j variable (5 values)                     ->   year
xij variables:
            gdp2017 gdp2018 ... gdp2021   ->   gdp
      trade2017 trade2018 ... trade2021   ->   trade
-----------------------------------------------------------------------------

- The converted long form dataset looks like this

Example 2

- Load the following foreign aid dataset

use https://dss.princeton.edu/training/widelong-2.dta

The loaded wide format dataset looks like this

Notice that the first row of the dataset does not contain any substantive variable name (they are named as A, B, C ...).

First, let's assign a name for the "id". We'll do this by renaming column A as recipient. Type:

rename A recipient

Second,  let's assign a variable name by typing aid before each year:

rename B aid2004
rename C aid2005
rename D aid2006
rename E aid2007
rename F aid2008
rename G aid2009
rename H aid2010
rename I aid2011
rename J aid2012
rename K aid2013
rename L aid2014
rename M aid2015
rename N aid2016
rename O aid2017

After renaming the variables, the dataset looks like this:

Now the dataset is ready for reshaping. Type the following codes:

reshape long aid, i(recipient) j(year)

- Stata will give you the following message to show how it has converted the data from wide to long format

. reshape long aid, i(recipient) j(year)
(note: j = 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017)
Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                      177   ->    2478
Number of variables                  15   ->       3
j variable (14 values)                    ->   year
xij variables:
            aid2004 aid2005 ... aid2017   ->   aid
-----------------------------------------------------------------------------

- The converted long form dataset looks like this

- If you type summarize command to get the summary statistics, you will find blank entries for aid.

. summarize
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   recipient |          0
        year |      2,478      2010.5    4.031943       2004       2017
         aid |          0

- To solve the issue, we need to destring the aid variable. Type:

destring aid, replace ignore("..")

Notice that the resulting summary statistics now contain information about the aid variable

. summarize
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   recipient |          0
        year |      2,478      2010.5    4.031943       2004       2017
         aid |      2,304    3.94e+14    1.79e+15          1   9.98e+15

3. Reshape Long to Wide

Let's convert long format data to wide format.

Example 1

- Load the following dataset

use https://dss.princeton.edu/training/longwide-1.dta

The loaded long form dataset looks like this

- To convert the above long form data to wide form, type:

reshape wide gdp, i(country) j(year)

  Notes: 

  • After reshape wide, type the name of the variable(s) - here we have one variable in the dataset (e.g., gdp).
  • i indicates the group identifier (e.g., individual, state, country, or any other entity).
  • j indicates the within-group identifier (e.g., year, month, date). 

- Stata will give you the following message to show how it has converted the data from long to wide format

. reshape wide gdp, i(country) j(year)
(note: j = 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021)
Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                     3900   ->     260
Number of variables                   3   ->      16
j variable (15 values)             year   ->   (dropped)
xij variables:
                                    gdp   ->   gdp2007 gdp2008 ... gdp2021
-----------------------------------------------------------------------------

The converted dataset looks like this

Example 2

- Load the following dataset

use https://dss.princeton.edu/training/longwide-2.dta

The dataset looks like this

- To convert the dataset to wide format, we first need to change the date variable. Type the following Stata codes:

tostring month year, replace
gen date = year + "_0" + month
replace date = year + "_" + month
drop year month
order id date

- The data will look like this

- To convert the above long form data to wide form, type:

reshape wide return interest, i(id) j(date) str

        Notes:

  • After reshape wide, type the name of the variable(s) - here we have two variables in the dataset (e.g., return and interest)
  • i indicates the group identifier (e.g., individual, state, country, or any other entity).
  • j indicates the within-group identifier (e.g., year, month, date). 

- Stata will give you the following message to show how it has converted the data from long to wide format

. reshape wide return interest, i(id) j(date) str
(note: j = 1998_11 1998_12 1999_1 1999_10 1999_11 1999_12 1999_2 1999_3 1999_4 1999_5 1999_6 1999_7 1999_8 1999_9
>  2000_1 2000_10 2000_11 2000_12 2000_2 2000_3 2000_4 2000_5 2000_6 2000_7 2000_8 2000_9 2001_1 2001_10 2001_11
> 2001_12 2001_2 2001_3 2001_4 2001_5 2001_6 2001_7 2001_8 2001_9 2002_1 2002_10 2002_11 2002_12 2002_2 2002_3 20
> 02_4 2002_5 2002_6 2002_7 2002_8 2002_9 2003_1 2003_10 2003_11 2003_12 2003_2 2003_3 2003_4 2003_5 2003_6 2003_
> 7 2003_8 2003_9 2004_1 2004_10 2004_11 2004_12 2004_2 2004_3 2004_4 2004_5 2004_6 2004_7 2004_8 2004_9 2005_1 2
> 005_10 2005_11 2005_12 2005_2 2005_3 2005_4 2005_5 2005_6 2005_7 2005_8 2005_9 2007_1 2007_10 2007_11 2007_2 20
> 07_3 2007_4 2007_5 2007_6 2007_7 2007_8 2007_9)
Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      802   ->      25
Number of variables                   4   ->     195
j variable (97 values)             date   ->   (dropped)
xij variables:
                                 return   ->   return1998_11 return1998_12 ... return2007_9
                               interest   ->   interest1998_11 interest1998_12 ... interest2007_9
-----------------------------------------------------------------------------

- The converted long form dataset looks like this

Notes:

- Type help reshape in Stata for more help. 

- If you have a very large dataset, and you want to restructure it more efficiently, follow the instructions provided here.

Data Consultant

Profile Photo
Muhammad Al Amin
He/Him/His
Contact:
Firestone Library, A-12-F.1
609-258-6051

Data Consultant

Profile Photo
Yufei Qin
Contact:
Firestone Library, A.12F.2
6092582519

Comments or Questions?

If you have questions or comments about this guide or method, please email data@Princeton.edu.