Psylve - A Text-to-Ontology Information Extraction Framework for the Occurrence Distribution of Plant Pathogen Vectors - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Access content directly
Master Thesis Year : 2022

Psylve - A Text-to-Ontology Information Extraction Framework for the Occurrence Distribution of Plant Pathogen Vectors

Elisa Lubrini
  • Function : Author
  • PersonId : 1162279

Abstract

Diseases due to insect-borne plant pathogens have a large negative effect on the world’s agricultural industry. An effective way to anticipate disease outbreaks can be to infer risk maps of vector introduction and spread from known occurrence data. However, compiling this type of data manually is time consuming and laborious, especially due to the recent spike in publicly available data. To address this issue, this work describes attempts at facilitating researchers’ workflows by using approaches to automate the extraction of vector related information from literature. To carry out this automation, we developed PsylVe, a solution initially targeted at psyllid vectors that encompasses document recollection, Natural Language Processing (NLP) and Knowledge Representation (KR) techniques. PsylVe includes a working NLP pipeline, and a fully documented methodology. The NLP pipeline is based on the adaptation of an existing pipeline, Omnicrobe, on microbial biodiversity that bears many similarities with epidemic events. We conducted a quantitative (precision, recall, and F1-score) and qualitative (six qualitative criteria for text mining pipeline evaluations) evaluation of results obtained with PsylVe and compared them to a manually compiled dataset of observations on Cacopsylla pruni responsible for the spread of a pathogenic bacterium in fruit tree orchards in Europe. From the outset, we designed the PsylVe Framework to be transferable to other plant disease vectors, as well as human and animal diseases. We have also designed an application for the extraction of texts from PDF documents and an original formal ontology that enables the representation of the data and of the knowledge on vector-borne diseases. Various projects in the MaIAGE department of INRAE have already started integrating the PsylVe framework in their workflow and concrete plans to develop it further were made in order to expand its usage to new biological domains.
Fichier principal
Vignette du fichier
2022_Lubrini_Elisa_Rapport M2 - PsylVe.pdf (3.01 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03771980 , version 1 (07-09-2022)

Licence

Attribution - NonCommercial - NoDerivatives

Identifiers

  • HAL Id : hal-03771980 , version 1

Cite

Elisa Lubrini. Psylve - A Text-to-Ontology Information Extraction Framework for the Occurrence Distribution of Plant Pathogen Vectors. Biodiversity and Ecology. 2022. ⟨hal-03771980⟩
64 View
112 Download

Share

Gmail Facebook X LinkedIn More