Metadata and dataset documentation

Metadata and documentation are two of the most important tools to help ensure your data are discoverable, understandable and reusable. Proper documentation for your data can enhance the transparency and reproducibility of your research.

What is metadata?

Metadata is structured information describing a resource, may include characteristics of the dataset such as its title, creator, and description, and can help other researchers to understand more about the data and whether it may be of use.

What is a metadata standard?

Metadata standards often emerge when there is a need among a user community to hold a common understanding of how to best and most usefully describe resources. They often take the form of formal documents or schemas that are maintained and governed by well-known or professional organizations in the user field. Metadata standards may ultimately be ratified by national or international bodies like NISO (National Information Standards Organization) or ISO (International Organization for Standardization).

Funding agencies increasingly encourage researchers to cite specific metadata standards when developing plans for managing and sharing their data. Some common examples of metadata standards include Dublin Core, DataCite, Data Documentation Initiative (DDI), and Darwin Core. The Digital Curation Centre provides a helpful guide to discipline-specific metadata schemas.

The Libraries can help you identify a metadata standard appropriate to your work and think through what kinds of metadata your project will require. Contact us to get started with a consultation.

Data documentation

Descriptive metadata that may be associated with, for example, a repository record, is not a substitute for adequate dataset documentation, such as README files, codebooks, data dictionaries, or methodology reports. Dataset documentation is meant to serve as a more comprehensive record of the methodology, coding decisions, measurement tools and analytic processes that make it possible for others to correctly interpret and replicate your work.

README Files

Including a README file with your data is a good way to help yourself and other users understand your data and how to use it. To craft a useful README file, consider describing

  • your overall project in general terms (e.g., an abstract for your project)
  • an outline of the project's overall structure
  • how you chose to name your files
  • a simplified data inventory giving the general scope of what is included in the dataset

Cornell Data Services provides an excellent guide to creating README-style documentation, including a template that can be downloaded and adapted for your own data. 

Codebooks

Tabular data – data that exist in tables containing rows of observations and columns of variables – are often best described using a codebook, which describes each variable, including the type of data contained in that variable (e.g., numeric or text), the acceptable range of values, and a description of any codes for missing values for each variable. The Codebook Cookbook, developed by a researcher at McGill University's medical school, provides in-depth guidance on generating useful codebooks.

Contact us to learn more about documenting your data 

Available services or resources

NC State University Libraries