Long term data archiving

Why retain data?

At the end of a research project, you may be tempted to clear out everything you've generated in order to start fresh with a new investigation. Or you may know immediately that you want to hang on to the data you've collected for reuse in a new project, especially if the data were expensive or difficult to generate. Sometimes your funding agency may stipulate a specific retention period for any data resulting from your research, even if you can't share the data with other researchers. A few questions to consider when selecting an archival strategy include:

  • How long am I required to retain the data?
  • Will anyone else need to access the data in its archival state?
  • How often will I need to return to the data?

Generally, we recommend researchers follow the 3-2-1 rule of thumb when developing local archival strategies. We suggest you consider retaining:

  • 3 copies of important files
  • on 2 different storage media
  • at least 1 of which is in the cloud or off-site.

The RFS can help you develop a plan for data retention and redundancy. Please contact us to get started! 

What do I need to retain?

The answer to this question will vary among research projects, but given the extraordinary amounts of data modern research can generate, having a strategy at the outset of your investigation for curating the relevant research outputs can be extremely helpful.

When determining what to retain, consider:

  • Retaining all data that are necessary to support your research findings
  • De-duplicating data that may not be unique – this is especially true of data that may have already been formally published in a data repository; local secondary copies of these data are generally not necessary
  • Regenerating output files as necessary – sometimes it's only essential to keep raw data and analysis scripts
  • Discarding tuning or testing results generated during initial development of computational or analysis workflows

Where can I archive my data?

Formally publishing your data in a data repository can be a good strategy for long-term archiving of data that can be broadly shared with the world, and even sometimes for data that can only be shared under certain conditions. The NC State Libraries can help you find an appropriate repository for your data.

For data that cannot be broadly shared in a data repository but must be retained in order to comply with funder expectations, there are a few available options. We recommend exploring NC State's OIT Research Storage, which can facilitate retention of data for up to 10 years after the close of the grant.

If you need to archive more data than can be accommodated at no charge on OIT Research Storage, public cloud storage (e.g. Amazon Glacier) may be a cost-effective option, especially where the data are expected to remain at rest. 

The RFS can help you establish a workflow for data archiving and retention. Please contact us to get started!

Available services or resources

NC State Research Storage

Learn more about data repositories