Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue International Journal of Intelligent Information and Database Systems Année : 2022

Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications

Résumé

This article presents an ontological and terminological resource guided process for targeted extraction of scientific experimental data. Our method relies on the scientific publication representation (SciPuRe) describing the extracted data through ontological, lexical and structural (using segments in the scientific documents) features. Relevance scores based on these features are computed to rank the results and filter out the numerous false positives. Linear and sequential combinations of these scores are presented and evaluated. Experiments were carried out on a corpus of 50 English language scientific papers in the food packaging field. They revealed that article segment are an effective criterion for filtering out a majority of the quantitative entity false positives using lexical scores. Moreover the best symbolic entity extraction results were obtained with a sequential combinations of semantic and lexical scores. These results enable the ranking of entities by relevance and the filtering of false positive results.
Fichier principal
Vignette du fichier
IJIIDS-64444_final.pdf (837.1 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03616243 , version 1 (15-11-2023)

Identifiants

Citer

Martin Lentschat, Patrice Buche, Juliette Dibie-Barthelemy, Mathieu Roche. Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications. International Journal of Intelligent Information and Database Systems, 2022, 15 (1), pp.78. ⟨10.1504/IJIIDS.2022.120146⟩. ⟨hal-03616243⟩
106 Consultations
7 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More