Semantic Management of Data from Biodiversity and Ecosystem Studies: Toward an Integrated Workflow from Collection to Publication. Application to Plankton Data from Lake Geneva
Résumé
Biodiversity is a key player in ecosystem characteristics and dynamics. Acting as a driver, it
also results from ecosystem functioning. Understanding this complex interplay between
biological and physical components is one of the main current challenges in the context of land
use changes and climate warming. The acquisition of knowledge on biodiversity requires
multidisciplinary approaches and mobilises numerous research teams. Data are collected or
computed in large quantity but are most often poorly standardised and therefore heterogeneous.
In this context the development of semantic interoperability is a major challenge for the sharing
and reuse of these data. This objective is implemented within the framework of the AnaEE
(Analysis and Experimentation on Ecosystems) Research Infrastructure dedicated to
experimentation on ecosystems and biodiversity. A distributed Information System (IS) is
developed, based on the semantic interoperability of its components using common
vocabularies (AnaeeThes thesaurus and OBOE-based ontology extended for disciplinary
needs) for modelling the studied system. This modelling covers the measured variables
including biodiversity, as well as the different components of the experimental or observational
context, from sensor to plot and network. Driven by the ontology, the approach relies on the
atomic decomposition of each of the components into observed entities, their characteristics
and qualifiers, their units or naming standards. The modelling of the system allows the semantic
annotation of relational databases or flat files for the production of URIs based graph databases.
A first pipeline automates the annotation process and the production of the semantic data. A
second pipeline is devoted to the exploitation of these semantic data by generating i) metadata
records formatted according to the geospatial extension for the Data Catalog Vocabulary
standard and the ISO 19139 standard, and ii) Network Common Data Form data files. The
implementation of this integrated semantic management of data is presented here for phytoand zoo-plankton data collected from water columns in Lake Geneva over a 30 years period,
as well as for environmental data about water temperature and nutrients. The work carried out
contributes to the development and use of semantic vocabularies within the biodiversity and
ecology research community, leading to semantically enriched metadata records and
interoperable data sets. The genericity of the tools make them usable in different contexts of
data production, management and ontologies involved in semantic modelling.
Origine : Fichiers produits par l'(les) auteur(s)