Skip to Main Content

Data Management Guide: Create a Data Management Plan

What is a Data Management Plan?

A Data Management Plan (DMP) is a two page document required by funding agencies that describes what you are going to do with your data during and after your research project. According to the 2013 White House Office of Science and Technology Policy memo, data is defined as "the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”
The Data Management Planning guide provides common sections required in granting agencies DMPs, although format might vary based on funding announcement. Each section contains questions to help you identify your data needs, and supplies examples for how this can be approached.

Additional Resources

WMU Intellectual Property Policy describes the obligations and rights of the researchers at WMU

DMPTool is a website providing guidance on writing DMPs that meet funder requirements

Example DMS Plans is a directory created by the NIH DMSP Guidance Working Group that compiles published and template DMPs from various disciplines and funding agencies.

Request a review of your draft DMP by submitting a webform or contacting your data librarian directly.

Data Management Planning Guide

DMP Section Sample Language

State data types and file formats

Consider: size and accessibility

What are the types of data produced and their file formats?
How many files do you anticipate generating and how large are they?
Is access to data dependent on software/hardware/version?     

The proposed research will include 5 MRI images from 30 participants, for a total of 150 DICOM image files around 600MB in size. Self-reported demographics and health surveys will also be collected on paper and compiled into an Excel spreadsheet.           

Determine documentation format and standards

Consider: standard terminology and file-naming practices

What are the scientific standards and structured metadata used by your discipline [metadata schema]?
Will you be coding data [codebook]? Is it important to record relationships and variable names [data dictionary] or file relationships [README]?
How will data be organized within a directory?

A data-dictionary will be used to define the variables, provide metrics, and explain coding decisions. A README file will also be used to explain project purpose and state connections between project files.

Define roles and responsibilities

Consider: who does what and with what frequency

Where is data backed up in the short term?
What additional information is needed to make data meaningful?
What mechanisms are in place for reducing errors, ensuring consistency, reporting or dealing with outdated files?
Who is responsible and on what schedule?

The Principle Investigator (PI) is responsible for the deposit, maintenance and management of the data. Graduate student lab members will work with the PI on providing data documentation. Active data storage will occur on the departmental server. The lab manager will be responsible for regularly scheduled data backups to an external hard drive on a weekly basis to prevent data loss in the case of system failures.

Establish dissemination and sharing policies

Consider: obligation and audience

Are you under any obligation to share your data (funder or journal)?
Where do you plan to deposit your data and will others in your field be able to find it?
Are your data sensitive? If so, can they be de-identified or do they require restricted access?
How will data be linked to documentation, articles, code, scholarship in other places?
NOTE: Data sharing needs to written into the consent form if you are conducting human subjects research

Any datasets and accompanying documentation generated under this project will be deposited into Zenodo (https://zenodo.org) for long-term preservation. Zenodo is an open access repository that specializes in preserving software and issues DOIs, which will be included in each resulting publication. Code used for data processing and analysis will be made publicly available through GitHub (https://github.com), a web-based platform. Any reports, presentations, manuscripts, and other documents that record research outputs generated under this project will be deposited into ScholarWorks (https://scholarworks.wmich.edu/), Western Michigan's institutional repository.

Plan for preservation and archiving

Consider: accessibility in the long term

How stable is your long-term storage choice and what guarantees does it provide?
What file format transformation needs to occur to maintain access to the data (de-identification, open format)?
How long will you retain the data after the study closes and does destruction need to occur?

After study completion, proprietary files will be converted to open formats to maximize data reuse. Excel files will be converted to .csv and field notes will be scanned into .pdfs. Data resulting from this research will be shared via the generalist repository Dryad, which provides metadata, persistent identifiers (i.e., DOIs), and long-term access. Data will be made available as soon as possible or at the time of associated publication under the CC0 license. Dryad datasets are backed up to Merritt, the UC’s CoreTrustSeal-certified digital repository, for long-term storage and accessibility. Procedures in place to ensure dataset preservation include storage of data files in multiple geographic locations, regular audits for fixity and authenticity, and succession plans in the event of repository closure. Consistent with WMU policy, data will be retained for three years after study closure.

Bonus: Funding

Consider: additional costs for inclusion in grant proposal

Will you be purchasing secondary data?
Do you need funding for a dedicated lab manager, active storage, data curation, or to cover data deposit fees?
What software or hardware to you need to collect or analyze your data?