Managing and sharing your research data

With the quantity of digital data produced by research projects increasing exponentially, data management has become a real challenge for research organizations.
Managing data effectively is vital to ensure that the information can be retrieved, secured, used and shared.

The basics of data management

Research data: what is it?

According to the OECD, research data are factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. Documents (laboratory notebooks, preliminary analyses, draft scientific documents, personal correspondence, etc.) and physical objects (bacterial strains, lab animals, etc.) are therefore not considered as research data.

 

The data lifecycle

The data lifecycle has six broad stages: creation or collection, processing, analysis, preservation, access and reuse. Each stage of the cycle involves data management measures. These measures are represented on the data lifecycle diagram opposite (click on the image to enlarge it).

 

Image adapted from the data lifecycle published by the UK Data Service: https://ukdataservice.ac.uk/learning-hub/research-data-management/

The FAIR principles: Findable, Accessible, Interoperable, Reusable

The FAIR Principles correspond to guidelines whose primary aim is to improve the reuse of research data. They were published in Scientific Data in 2016. Each letter of the FAIR acronym is associated with best practices that should be followed to make data reusable, whether it is shared or not. Data can therefore be "FAIR" without being freely accessible.

Find here an explanation of each item of the FAIR Principles

FAIR data principles by SangyaPundir, CC BY 4.0 license

Expectations at institutional, national and international levels

The international research community is rallying to ensure the preservation, sharing and reuse of scientific research outputs. This global movement involving researchers, policy makers and funders aims to improve the quality, integrity and reproducibility of research.

The Institut Pasteur's policy on the management and sharing of research data and software code

This Policy sets out the Institut Pasteur's guidelines on the management and sharing of research data and software code. It aims to facilitate the sharing and reuse of data and software code according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.

It summarises the best practices to be implemented throughout the research process and refers to fact sheets that give scientists the operational resources they need to implement these best practices.

This Policy was developed as part of a collaborative, transversal project led by the CeRIS and the Data Management Core Facility.

Contact: rdm-policy@pasteur.fr

 

A national policy

Research data is at the heart of two national policies: the National Plan for Open Science and the Data, Algorithms and Source Codes Policy. Their ambition: to ensure that the data produced by French public research is progressively structured in line with FAIR principles, preserved and, where possible, made open.

A requirement of funding bodies

Over the past few years, research funders (European Commission, ANR, NIH, Wellcome Trust and many others) have been implementing new requirements to ensure that the data produced as part of the projects they fund are well preserved, reusable, and when possible, open.

A closer look at the European Commission's requirements:

European Commission requirements - CeRIS - Institut Pasteur

For more information on funder's requirements, consult the CeRIS fact sheet

 

Drawing up a data management plan

What is a data management plan?

The data management plan (DMP) is a synthetic document that helps to organize and anticipate all stages of the data lifecycle. It can be drawn up for data that is intended to be shared, as well as for data that will remain in restricted or closed access. It is a living document that needs to be regularly updated.

It covers the following aspects:

Drawing up a data management plan - CeRIS - Institut Pasteur

A DMP can be set up for a research project or for a research entity.

For a research project, the DMP enables to:

  • Plan / anticipate: think at the beginning of the project about the data to be generated or collected, the legal and security issues to be taken into account, and the steps to be taken at each stage of the project.
     
  • Harmonize practices: agree between partners on how to organize, describe, store and share data... Define who is responsible for what, what tools are used, etc.
     
  • Build up a summary document of project data: the final version of the DMP, at the end of the project, serves as a reference document describing the project data, where it is stored, how it can be accessed, and so on. This will facilitate the re-use of project data.

For a research entity, the DMP enables to:

  • Exchange ideas as a team on how each member organizes and manages their data. Benefit from the experience of others on certain subjects.
     
  • Harmonize practices: agree on how the different types of data produced or collected within the entity should be managed, define rules and best practices: security rules for managing certain types of sensitive data, recommendations on preferred formats for storing and sharing data, rules when someone leaves the entity...
     
  • Build up a reference document on best practices to be followed within the entity, and thus guide newcomers on how to manage and share their data.

How to draw up a DMP in practice?

The CeRIS supports Institut Pasteur researchers in the drafting of their DMP, by providing them with documents and offering to review and comment on their DMP.

Sharing your research data

To share your research data with a wide audience, the most effective solution is to deposit it in a data repository.

What is a data repository?

A repository is an online service for the collection, description, preservation, discovery and dissemination of scientific data. There are over 1,700 data repositories in the life sciences (according to Re3data), and they can be categorized into two broad types:

  • disciplinary or thematic repositories (imaging, chemistry, neuroscience, proteomics, etc.);
  • general or multi-disciplinary repositories, open to all types of data.

How to choose a data repository?

First recommendation: find out whether a suitable disciplinary repository exists.

To identify a disciplinary repository that might be suitable for one's data type or research theme and analyze its characteristics, two directories are recommended:

Registry of research data repository (Re3data) - Bibliothèque du CeRIS - Institut Pasteur

re3data is a comprehensive directory indexing over 3,000 data repositories in all disciplines. Each repository is described by precise, high-quality metadata.

Fairsharing - Biblbiothèque du CeRIS - Institut Pasteur

The Databases section of FAIRsharing provides additional information on each repository, such as the funders/publishers that recommend it, the standards used, etc.

To help you analyze the characteristics of a repository and check that it meets your needs and the FAIR principles, the CeRIS provides Institut Pasteur researchers with an analysis grid listing the questions to ask yourself before making your choice.

Download the data repository analysis grid proposed by the CeRIS

If there is no suitable disciplinary repository: the choice is for a general repository

In this case, the CeRIS recommends that Pasteurian researchers deposit their data in the Institut Pasteur space on the national repository Recherche Data Gouv..

How are sharing practices evolving at the Institut Pasteur?

The "research data" section of the French Open Science Monitor aims to measure changes in data-sharing practices in France.

According to the Institut Pasteur's Open Science Monitor, among Pasteurian publications published in 2021 that mention data production, 35% mention data sharing. In comparison, this proportion is 22% at the national level (all fields).

Learn more

Questions & Answers

Where can I learn about research data management online?

The ELIXIR Research Data Management Kit (RDMkit) is an online guide containing good data management practices applicable to life sciences research projects. Developed and managed by people who work every day with life science data, the RDMkit has guidelines, information, and pointers to help you with problems throughout the data's life cycle.

What is the point of drawing up a data management plan if it is not compulsory?

Drawing up a data management plan before beginning your project is a way of asking yourself the right questions and adopting best data management practices. Well-managed data are data that are easy to retrieve and reuse, described precisely by metadata, secure and permanent. If the journal you are publishing an article in asks you to deposit the accompanying data in a warehouse, you can rest assured that your metadata are already prepared and all you have to do is transfer them to the various fields. You can also easily make your data accessible and visible by publishing them in a data paper.

Is there a search engine that I can use to search for data in different repositories?

There are several data search engines:

  • DataMed provides access to various types of data in the biomedical field. It currently covers 76 repositories and offers a powerful advanced search.

  • Omics Discovery Index allows you to search for datasets in the fields of genomics, proteomics, transcriptomics and metabolomics. It also offers advanced search functions (by organism, by disease, etc.).

  • Elsevier DataSearch covers more diverse scientific fields. It can be used to access datasets from a more limited number of repositories but also some supplementary data.

  • Google Dataset Search is the least efficient. It offers a basic search and very few features.

 

Contacts

 

Open Science newsletter

Every two weeks, the Open Science newsletter will provide you information and shed light on developments, challenges and new practices in three key areas of Open Science: scientific publishing in the age of Open Access, data and software management and sharing, research evaluation and planning.

 

Back to top