Skip to Main Content

Research Data Management at Princeton

An overview of best practices for managing research data

Data Sharing Vs. Open Data

Many funders and journals have data sharing requirements and others call for open data. Managing your research data throughout the project can help ensure that either of these goals can be met.

If you are working with a specific funder or journal it is important to check their exact requirements.

Data Sharing:

Data Sharing encompasses the spectrum from making data available upon specific request to depositing data in an open and publicly accessible repository. It is important to know specifically what is required by a funder, journal, or institution. For example the Department of Energy's Statement on Digital Data Management defines Data Sharing as "...making data available to people other than those who have generated them. Examples of data sharing range from bilateral communications with colleagues, to providing, free unrestricted access to the public through, for example, a web-based platform."

Open Data:

In general, Open Data is data that is deposited in an open, publicly accessible repository. Specifically, the Open Knowledge Foundation summarizes Open Data as "A piece of data or content is open if anyone is free to use, reuse, and redistribute it- subject only to the requirement to attribute and/or share-alike". The full definition includes eleven detailed points that address issues such as access, reuse, redistribution, licensing, technological restrictions and more.

The Panton Principles are a set of recommendations for making research data open. To support the position on open data, the Panton Principles declare:

Science is based on building on, reusing and openly criticisng the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.

By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.

Additional Resources

Preservation

Preservation is done at the completion of a project. Ideally the preservation strategy reflects long term thinking and should not be the same as how the data were stored during the project. Things to consider when developing a plan for preservation include:

  • What do you need to keep?
  • What are the funder or journal requirements?
  • How long does the data need to be preserved?
  • Who is responsible for the data at the end of the project?
  • Does the funder or journal specify a repository?
  • Is there sufficient documentation that anyone can use your data without your assistance, including software needed and file structures?
  • Are file formats open and sustainable?
  • If you are not depositing in to a repository, what is the shelf-life of the hardware and when will data need to be migrated.
  • Per U.S. Office of Management and Budget Circular A-110, data must be retained at least 3 years post-project (But >6 years is better)

Additional Resources

Future File Usability

Thinking about future file usability will help ensure data are usable and can be shared in the future.Things to consider include:

  • Is the file format open or closed?
  • Is a specific software package required to use the data?
  • Do multiple files comprise the data file structure?
  • Will you be able to open the file 10 years from now?

Try to select a consistent file format that can be read well into the future that is open, has documented standards, unencrypted,  and uncompressed. 

Data Repositories

Data Repositories

Data repositories can help provide long term preservation. They provide persistent unique identifiers and information to aid data citation. Using a data repository can help increase discoverability. There are three main types of data repositories available:

  • Disciplinary Repositories - to check for a repository in your discipline try searching in re3data.org (the Registry of Research Data Repositories)
  • General Repositories: Figshare, Zenodo
  • Institutional Repositories

Princeton University Data Repositories

DataSpace at Princeton - A digital repository meant for both archiving and publicly disseminating digital data which are the result of research, academic, or administrative work performed by members of the Princeton University community. The DataSpace Help Documentation provides more in-depth information about the repository structure, the submission process and supported file formats. Members of the Princeton University community interested in submitting content, should contact: Mark Ratliff, Digital Repository Architect, phone: (609) 258-0228

DSS (Data and Statistical Services) from the Princeton University Library – Provides access to internal and external data sets in the social sciences and related fields. Access and use is restricted to current members of Princeton University. For information about depositing data into the DSS, please contact Bobray Bordelon,  Bordelon@princeton.edu https://dss.princeton.edu/

PUMAdb (Princeton University MicroArray database) – Stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition PUMAdb provides interfaces for data retrieval, analysis and visualization. http://puma.princeton.edu/index.shtml

*Note Some departments already have an existing DataSpace Community or data archive. Check with your department's Computing Support to see if there are any local options available.