Semantic Management of Data from Biodiversity and Ecosystem Studies: Toward an Integrated Workflow from Collection to Publication. Application to Plankton Data from Lake Geneva - Archive ouverte HAL Access content directly
Conference Papers Year : 2021

Semantic Management of Data from Biodiversity and Ecosystem Studies: Toward an Integrated Workflow from Collection to Publication. Application to Plankton Data from Lake Geneva

(1) , (2) , (3) , (4) , (1) , (1)
1
2
3
4

Abstract

Biodiversity is a key player in ecosystem characteristics and dynamics. Acting as a driver, it also results from ecosystem functioning. Understanding this complex interplay between biological and physical components is one of the main current challenges in the context of land use changes and climate warming. The acquisition of knowledge on biodiversity requires multidisciplinary approaches and mobilises numerous research teams. Data are collected or computed in large quantity but are most often poorly standardised and therefore heterogeneous. In this context the development of semantic interoperability is a major challenge for the sharing and reuse of these data. This objective is implemented within the framework of the AnaEE (Analysis and Experimentation on Ecosystems) Research Infrastructure dedicated to experimentation on ecosystems and biodiversity. A distributed Information System (IS) is developed, based on the semantic interoperability of its components using common vocabularies (AnaeeThes thesaurus and OBOE-based ontology extended for disciplinary needs) for modelling the studied system. This modelling covers the measured variables including biodiversity, as well as the different components of the experimental or observational context, from sensor to plot and network. Driven by the ontology, the approach relies on the atomic decomposition of each of the components into observed entities, their characteristics and qualifiers, their units or naming standards. The modelling of the system allows the semantic annotation of relational databases or flat files for the production of URIs based graph databases. A first pipeline automates the annotation process and the production of the semantic data. A second pipeline is devoted to the exploitation of these semantic data by generating i) metadata records formatted according to the geospatial extension for the Data Catalog Vocabulary standard and the ISO 19139 standard, and ii) Network Common Data Form data files. The implementation of this integrated semantic management of data is presented here for phytoand zoo-plankton data collected from water columns in Lake Geneva over a 30 years period, as well as for environmental data about water temperature and nutrients. The work carried out contributes to the development and use of semantic vocabularies within the biodiversity and ecology research community, leading to semantically enriched metadata records and interoperable data sets. The genericity of the tools make them usable in different contexts of data production, management and ontologies involved in semantic modelling.
Fichier principal
Vignette du fichier
paper11-s4biodiv.pdf (2.57 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03579553 , version 1 (22-06-2022)

Identifiers

  • HAL Id : hal-03579553 , version 1

Cite

Christian Pichot, Damien Maurice, Ghislaine Monet, Rachid Yahiaoui, Philippe Clastre, et al.. Semantic Management of Data from Biodiversity and Ecosystem Studies: Toward an Integrated Workflow from Collection to Publication. Application to Plankton Data from Lake Geneva.  Joint Ontology Workshops 2021 Episode VII: The Bolzano Summer of Knowledge, JOWO, Sep 2021, Bolzano, Italy. http://ceur-ws.org/Vol-2969/paper11-s4biodiv.pdf. ⟨hal-03579553⟩
163 View
6 Download

Share

Gmail Facebook Twitter LinkedIn More