Describe data (metadata)
Why should I describe my data?
It is good practice to describe your research data as this informs interpretation, verification and re-analysis of the data as well as sharing. Good data documentation is a small upfront investment of time that can save you and other researchers using your data significant amounts of time in the long run. It prepares your data for long term re-use, preservation, and ongoing citation.
What is data description/documentation?
Data description, or documentation captures information that is not contained within the data, yet is vital to its accurate interpretation. Descriptors are also called metadata which literally means 'data about data.' Appropriate descriptors will vary across disciplines and are vital to the data/collection's accurate interpretation.
Some examples include:
- how, when and where the data was created or collected
- information on any relevant standards utilised and equipment used
- details or descriptions of the structure or organisation of the data, e.g. available files, formats, naming conventions, explanations of codes or abbreviations, etc.
- content of each parameter or field, the allowable range of values, typical accuracy and/or resolution, levels of confidence, and the units used
- glossaries, vocabularies, data dictionaries, codebooks, lab notes
- what transformations, processing or gap-filling has been done to the data
- any instructions or explanatory notes for potential users that may assist with interpretation of the data
- methods of analysis
- software code
- technical requirements associated with access or re-use, e.g. read me files, requirements for hardware, software, platforms, etc.
When should I describe my data?
Data documentation should begin as early as possible in the research project. Even before the first data has been collected, structures and naming conventions for organising the data can be established so they're in place once the data arrives.
Data documentation should be reviewed throughout the research project, when new datasets are collected or derived, or when methodologies, instruments or equipment, or conventions change.
Examples of metadata standards
A metadata record is often encoded to a standard or schema (structure) that defines the kinds of information allowed in the record. Dublin Core is one standard which defines a number of elements such as title, creator, subject and description that can be used to describe just about any resource. While Dublin Core is a broadly applicable metadata standard, many other metadata standards exist targeting particular research disciplines, for example the Minimum Information Standards which is used in bioinformatics. When considering how best to document your data it is advisable to investigate if any metadata standards exist for your specific field of research.
Metadata can generally be classified as collection level or item level. A metadata description for a collection will typically contain high level information about the coverage of the items within a collection. An item level description, meanwhile, will contain quite specific information regarding the conditions under which the item (say a sensor reading) was acquired or collected. Collection level metadata facilitates data discovery, while item level metadata is invaluable in enabling its reuse.
Where can I find more information about data documentation and metadata?
Please take a look at the following resources:
Attribution: The content on this page is based in part on the University of Newcastle Libguide on Data Management Planning. It is used with permission.