Orchestrating data flows throughout their whole life cycle

Konogan Bourhy; Laurent Bouri; Christophe Bruley; Olivier Collin; Frédéric De Lamotte; Thomas Denecker; Marie-Dominique Devignes; Jean-François Dufayard; Alban Gaignard; Nadia Goué; Gildas Le Corguille; Paulette Lieby; Hervé Ménager; Imane Messak; Hamid Ouahioune; Claire Rioualen; Thomas Rosnet; Baptiste Rousseau; Julien Seiler; Jacques van Helden

Poster De Conférence Année : 2024

Orchestrating data flows throughout their whole life cycle

Orchestrer les flux de données tout au long de leur cycle de vie

(1) , (2) , (3) , (1) , (2, 4) , (2) , (5) , (2, 4) , (6) , (7) , (2, 8) , (2) , (2, 9) , (2) , (2) , (2) , (2) , (2) , (10, 2) , (2)

1
2
3
4
5
6
7
8
9
10

Konogan Bourhy

Fonction : Auteur

Plateforme bioinformatique GenOuest [Rennes]

Laurent Bouri

Fonction : Auteur

Institut Français de Bioinformatique

Christophe Bruley

Fonction : Auteur

Laboratoire Biosciences et bioingénierie pour la santé

Olivier Collin

Fonction : Auteur
PersonId : 1483
IdHAL : olivier-collin
ORCID : 0000-0002-8959-8402
IdRef : 147370140

Plateforme bioinformatique GenOuest [Rennes]

Frédéric De Lamotte

Fonction : Auteur

Institut Français de Bioinformatique

Développement Adaptatif du Riz [AGAP]

Thomas Denecker

Fonction : Auteur

Institut Français de Bioinformatique

Marie-Dominique Devignes

Fonction : Auteur

Université de Lorraine

Jean-François Dufayard

Fonction : Auteur

Institut Français de Bioinformatique

Développement Adaptatif du Riz [AGAP]

Alban Gaignard

Fonction : Auteur

Institut du Thorax [Nantes]

Nadia Goué

Fonction : Auteur

Plateforme Auvergne Bioinformatique

Gildas Le Corguille

Fonction : Auteur

Institut Français de Bioinformatique

ABiMS - Informatique et bioinformatique = Analysis and Bioinformatics for Marine Science

Paulette Lieby

Fonction : Auteur

Institut Français de Bioinformatique

Hervé Ménager

Fonction : Auteur
PersonId : 184612
IdHAL : herve-menager
ORCID : 0000-0002-7552-1009
IdRef : 120930595

Institut Français de Bioinformatique

Hub Bioinformatique et Biostatistique - Bioinformatics and Biostatistics HUB

Imane Messak

Fonction : Auteur

Institut Français de Bioinformatique

Hamid Ouahioune

Fonction : Auteur

Institut Français de Bioinformatique

Claire Rioualen

Fonction : Auteur

Institut Français de Bioinformatique

Thomas Rosnet

Fonction : Auteur

Institut Français de Bioinformatique

Baptiste Rousseau

Fonction : Auteur

Institut Français de Bioinformatique

Julien Seiler

Fonction : Auteur
PersonId : 1393906

Institut de Génétique et de Biologie Moléculaire et Cellulaire

Institut Français de Bioinformatique

Jacques van Helden

Fonction : Auteur

Institut Français de Bioinformatique

Résumé

Most life science domains rely on the massive production of data via different technologies (sequencing, imaging, proteomics, metabolomics, phenomics, …). In this context, data management becomes critical to enhance the value of the data to its full extent. This situation translates into regulatory obligations for research projects, with incentives for researchers to adopt best practice in data management. The FAIR principles, which aim at making the data Findable, Accessible, Interoperable, and Reusable, are a cornerstone of best practice. They are widely promoted at the political level, but they require appropriate software tools to be put into practice. We present here a general schema describing the successive steps for seamless data management from the conception of a research project until result publication and data deposition in relevant repositories. Data management plans (DMPs), often perceived as a tedious administrative requirement, could turn into much-appreciated support for researchers if they were adapted to fit their concrete needs (e.g. identifying datasets for reuse, reprocessing, etc.). This would require user-friendly tools to handle DMPs, which should not only serve as landmarks at the onset, mid-term, and termination of a project but offer interactive interfaces enabling their continuous follow-up throughout the project, and designed modularly to cope with the diversity of data combined in a project. DMPs should also promote the adoption of international standards and specifications established by communities of experts for each data type, which become an absolute requirement to deposit data in international repositories. Validating the compliance and quality of the metadata along the project’s life will save the painful efforts necessary to recollect mandatory information at the moment of data submission, months or years after their generation. Metadata standardization should rely on domain-specific ontologies such as EDAM, to describe the data types, formats, and operations, as well as general purpose lightweight ontologies (e.g. Schema.org/Bioschemas, DC-Terms, DCAT) to increase findability and reuse. Data management software environments should also handle provenance metadata such as the location of primary and secondary data generated at each step of a project, to optimize storage usage, avoid data loss, and ensure data securing without excessive duplication. It should also include user-friendly interfaces to simplify data and metadata submission to international repositories, and to automate their indexing in national and international catalogs. Beyond the description of these conceptual requirements, we will present the software tools and standards required to accompany each event associated with the data and metadata (production, curation, storage, archiving, …). Some of these tools have been developed by the Institut Français de Bioinformatique or by the European bioinformatics infrastructure ELIXIR. Other tools are under development or have to be developed to achieve a comprehensive software environment supporting the orchestration of data flows throughout their life cycle.

Mots clés

Data management FAIR Open Science NNCR Madbot FAIR-Checker EDAM DMP

Domaines

Sciences du Vivant [q-bio]

Fichier principal

poster_IFB_orchestration flux de données_JOBIM_2024.pdf (1.59 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Julien Seiler : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04618277

Soumis le : jeudi 20 juin 2024-09:58:56

Dernière modification le : lundi 24 juin 2024-09:41:51

Dates et versions

hal-04618277 , version 1 (20-06-2024)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Identifiants

HAL Id : hal-04618277 , version 1

Citer

Konogan Bourhy, Laurent Bouri, Christophe Bruley, Olivier Collin, Frédéric De Lamotte, et al.. Orchestrating data flows throughout their whole life cycle: Key stages and partners for success. JOBIM 2024 - Journées ouvertes en biologie, informatique, et mathématiques, Jun 2024, Toulouse, France. , 2024. ⟨hal-04618277⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM CIRAD PASTEUR CEA UNIV-RENNES1 UGA PRES_CLERMONT CNRS INRIA INSA-RENNES IGBMC IRISA CENTRALESUPELEC UNIV-LORRAINE UR1-MATH-STIC UNIV-PARIS-SACLAY UR1-UFR-ISTIC UNIV-MONTPELLIER SITE-ALSACE UNIV-RENNES SORBONNE-UNIVERSITE IRIG CEA-GRE SU-SCIENCES FR2424 INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER ANR UR1-MATH-NUM AGAP SBR FRANCE-GENOMIQUE AUBI BIOINFO_BIOSTAT_HUB ABIMS

0 Consultations

0 Téléchargements

Orchestrating data flows throughout their whole life cycle

Orchestrer les flux de données tout au long de leur cycle de vie

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager